Monday, 29 May 2017

Why Developers are learning Big Data Hadoop

IT World is making exponential growth with Big Data Analytics being employed by government agencies and private companies. The industry is witnessing rapid growth driven by increased demand for Big Data and predictive analytics solutions by industries such as BFSI, retail, telecom and healthcare.

This anticipated growth and use of big data is nurturing and enhancing to translate into financial opportunities and job creation while also making India the largest data analytics market in the world. For professionals, who are skilled in Big Data Analytics, there is an ocean of opportunities out there.


What is Big Data?

Big Data is a collection of data sets so large and complex that it becomes difficult to process
using on-hand database management tools.

Source

Social Networking Sites – Facebook , Twitter , LinkedIn etc.
Blogs and user forums - StackOverFlow , Quora etc.
Public site – Wiki , IMDb
Video sites – YouTube , Daily Motion etc.
Online shopping sites - FlipKart , Snapdeal etc.
Banks, Share Market and Sensor Data

The 3 Vs

Volume - Big Data comes in large scale. Its in TB even PB.
Velocity - Data Flown continues (Batch , Real Time , Historic etc.)
Variety - Big Data includes structured, semi- structured and un-structured data.

What is Hadoop?

üHadoop is a Java-based programming framework that supports the processing of
      large data sets in a distributed computing environment.
üIt runs applications on large clusters built of commodity hardware.
üOpen source framework.
üEfficient and reliable.
üManageable and heal self.
üHardware cost effective.




Who developed Hadoop?




2005: Doug Cutting and  Michael J. Cafarella developed Hadoop to support distribution for the Nutch search engine project. The project was funded by Yahoo.

2006: Yahoo gave the project to Apache Software Foundation.


What are Hadoop Core Components?

HDFS - Hadoop Distributed File System - (Storage)
üHDFS is responsible for storing data on the cluster.
üData files are split into blocks and distributed across multiple nodes in the cluster.

MapReduce - (Processing)
üMapReduce is a programming model for processing large data sets with a parallel,
üdistributed algorithm on a cluster.
üConsists of two phases: Map and then Reduce.
üSplits a task across processors, process the tasks and then combine at Reduce phase.

YARN - Yet Another Resource Negotiator (Cluster Management)
üA framework for job scheduling and cluster resource management.

Hadoop Cluster is a set of machines running HDFS and MapReduce.
üIndividual machines are known as nodes.
üA cluster can have as few as one node, as many as several thousands.
üMore nodes = better performance.




To Know more about Big Data Hadoop or in case of any query, you can contact us on 

Ph: +91- 9999201478
Website: www.gyansetu.in




No comments:

Post a Comment