IT World is making exponential growth with Big Data Analytics being employed by government agencies and private companies. The industry is witnessing rapid growth driven by increased demand for Big Data and predictive analytics solutions by industries such as BFSI, retail, telecom and healthcare.
This anticipated growth and use of big data is nurturing and enhancing to translate into financial opportunities and job creation while also making India the largest data analytics market in the world. For professionals, who are skilled in Big Data Analytics, there is an ocean of opportunities out there.
What is Big Data?
Big Data is a collection of data sets so
large and complex that it becomes difficult to process
using on-hand
database management tools.
Source
Social Networking Sites – Facebook , Twitter ,
LinkedIn etc.
Blogs and user forums - StackOverFlow , Quora etc.
Public site – Wiki , IMDb
Video sites – YouTube , Daily Motion etc.
Online shopping sites - FlipKart , Snapdeal etc.
Banks, Share Market and Sensor Data
The 3 Vs
Volume - Big Data comes in large scale. Its
in TB even PB.
Velocity - Data Flown continues (Batch , Real Time
, Historic etc.)
Variety - Big Data includes structured, semi-
structured and un-structured data.
What is Hadoop?
üHadoop is
a Java-based programming framework that supports the processing of
large data sets in
a distributed computing environment.
üIt runs applications on large clusters built of commodity hardware.
üOpen source framework.
üEfficient and reliable.
üManageable and heal self.
üHardware cost effective.
Who developed Hadoop?
2005:
Doug Cutting and Michael J. Cafarella
developed Hadoop to support distribution for the Nutch search engine project. The project was funded by Yahoo.
2006:
Yahoo gave the project to Apache Software Foundation.
What are Hadoop Core Components?
HDFS - Hadoop Distributed File System -
(Storage)
üHDFS is responsible for storing data on the
cluster.
üData files are split into blocks and
distributed across multiple nodes in the cluster.
MapReduce - (Processing)
üMapReduce is
a programming model for processing large data sets with a parallel,
üdistributed algorithm on a
cluster.
üConsists of two phases: Map and then Reduce.
üSplits a task across processors, process the
tasks and then combine at Reduce phase.
YARN -
Yet Another Resource Negotiator (Cluster Management)
üA framework for job scheduling and cluster resource management.
Hadoop
Cluster is a set of machines running HDFS and MapReduce.
üIndividual machines are known as nodes.
üA cluster can have as few as one node, as many as several thousands.
üMore nodes = better performance.
To Know more about Big Data Hadoop or in case of any query, you can contact us on
Ph: +91- 9999201478
Website: www.gyansetu.in




No comments:
Post a Comment