ORPP logo
Image from Google Jackets

Hadoop Essentials : Delve into the Key Concepts of Hadoop and Get a Thorough Understanding of the Hadoop Ecosystem.

By: Contributor(s): Material type: TextTextPublisher: Birmingham : Packt Publishing, Limited, 2015Copyright date: ©2015Edition: 1st edDescription: 1 online resource (194 pages)Content type:
  • text
Media type:
  • computer
Carrier type:
  • online resource
ISBN:
  • 9781784390464
Subject(s): Genre/Form: Additional physical formats: Print version:: Hadoop EssentialsDDC classification:
  • 004.36
LOC classification:
  • QA76.9.D5 -- .A243 2015eb
Online resources:
Contents:
Cover -- Copyright -- Credits -- About the Author -- Acknowledgments -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Introduction to Big Data and Hadoop -- V's of big data -- Volume -- Velocity -- Variety -- Understanding big data -- NoSQL -- Types of NoSQL databases -- Analytical database -- Who is creating the big data? -- Big data use cases -- Big data use case patterns -- Big data as a storage pattern -- Big data as a data transformation pattern -- Big data for a data analysis pattern -- Big data for data in a real-time pattern -- Big data for a low latency caching pattern -- Hadoop -- Hadoop history -- Description -- Advantages of Hadoop -- Uses of Hadoop -- Hadoop ecosystem -- Apache Hadoop -- Hadoop distributions -- Pillars of Hadoop-HDFS, MapReduce, and YARN -- Data access components - Hive and Pig -- Data storage component - HBase -- Data ingestion in Hadoop- Sqoop and Flume -- Streaming and real-time analysis - Storm and Spark -- Summary -- Chapter 2: Hadoop Ecosystem -- Traditional systems -- Database trend -- Hadoop use cases -- Hadoop basic data flow -- Hadoop integration -- The Hadoop ecosystem -- Distributed filesystem -- HDFS -- Distributed programming -- NoSQL databases -- Apache HBase -- Data ingestion -- Service Programming -- Apache YARN -- Apache Zookeeper -- Scheduling -- Data analytics and machine learning -- System management -- Apache Ambari -- Summary -- Chapter 3: Pillars of Hadoop - HDFS, MapReduce, and YARN -- HDFS -- Features of HDFS -- HDFS Architecture -- NameNode -- DataNode -- Checkpoint NameNode or Secondary NameNode -- BackupNode -- Data storage in HDFS -- Read pipeline -- Write pipeline -- Rack awareness -- Advantages of rack awareness in HDFS -- HDFS Federation -- Limitations of HDFS 1.0 -- The benefit of HDFS Federation -- HDFS ports -- HDFS commands -- MapReduce.
MapReduce architecture -- JobTracker -- TaskTracker -- Serialization data types -- Writable interface -- WritableComparable interface -- MapReduce example -- The MapReduce process -- Mapper -- Shuffle and sorting -- Reducer -- Speculative execution -- FileFormats -- InputFormats -- RecordReader -- OutputFormats -- RecordWriter -- Writing a MapReduce program -- Mapper code -- Reducer code -- Driver code -- Auxiliary steps -- Combiner -- Partitioner -- YARN -- YARN Architecture -- ResourceManager -- NodeManager -- ApplicationMaster -- Applications powered by YARN -- Summary -- Chapter 4: Data Access Components - Hive and Pig -- Need of a data processing tool on Hadoop -- Pig -- Pig data types -- Pig architecture -- The logical plan -- The physical plan -- The MapReduce plan -- Pig modes -- Grunt shell -- Input data -- Loading data -- Dump -- Store -- Filter -- Group By -- Limit -- Aggregation -- Cogroup -- DESCRIBE -- EXPLAIN -- ILLUSTRATE -- Hive -- Hive architecture -- Metastore -- Query compiler -- Execution engine -- Data types and schemas -- Installing Hive -- Starting Hive Shell -- HiveQL -- DDL (Data Definition Language) operations -- DML (Data Manipulation Language) operations -- SQL operation -- Built-in functions -- Custom UDF (User Defined Functions) -- Managing tables (external versus managed) -- SerDe -- Partitioning -- Bucketing -- Summary -- Chapter 5: Storage Component - HBase -- An Overview of HBase -- Advantages of HBase -- Architecture of HBase -- MasterServer -- RegionServer -- WAL -- BlockCache -- Regions -- MemStore -- Zookeeper -- HBase data model -- Logical components of data model -- ACID properties -- CAP theorem -- Schema design -- Write pipeline -- Read pipeline -- Compaction -- Compaction policy -- Minor compaction -- Major compaction -- Splitting -- Pre-Splitting -- Auto Splitting -- Forced Splitting -- Commands -- help.
Create -- List -- Put -- Scan -- Get -- Disable -- Drop -- HBase Hive integration -- Performance tuning -- Compression -- Filters -- Counters -- HBase co-processors -- Summary -- Chapter 6: Data Ingestion in Hadoop - Sqoop and Flume -- Data sources -- Challenges in data ingestion -- Sqoop -- Connectors and drivers -- Sqoop 1 architecture -- Limitation of Sqoop 1 -- Sqoop 2 architecture -- Imports -- Exports -- Apache Flume -- Reliability -- Flume architecture -- Multitier topology -- Flume Master -- Flume Nodes -- Components in Agent -- Channels -- Examples of configuring Flume -- Single agent example -- Multiple flow in an agent -- Configuring a multi-agent setup -- Summary -- Chapter 7: Streaming and Real-time Analysis - Storm and Spark -- An introduction to Storm -- Features of Storm -- Physical architecture of Storm -- Data architecture of Storm -- Storm topology -- Storm on YARN -- Topology configuration example -- Spouts -- Bolts -- Topology -- An introduction to Spark -- Features of Spark -- Spark framework -- Spark SQL -- GraphX -- MLib -- Spark streaming -- Spark architecture -- Directed Acyclic Graph engine -- Resilient Distributed Dataset -- Physical architecture -- Operations in Spark -- Transformations -- Actions -- Spark example -- Summary -- Index.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
No physical items for this record

Cover -- Copyright -- Credits -- About the Author -- Acknowledgments -- About the Reviewers -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: Introduction to Big Data and Hadoop -- V's of big data -- Volume -- Velocity -- Variety -- Understanding big data -- NoSQL -- Types of NoSQL databases -- Analytical database -- Who is creating the big data? -- Big data use cases -- Big data use case patterns -- Big data as a storage pattern -- Big data as a data transformation pattern -- Big data for a data analysis pattern -- Big data for data in a real-time pattern -- Big data for a low latency caching pattern -- Hadoop -- Hadoop history -- Description -- Advantages of Hadoop -- Uses of Hadoop -- Hadoop ecosystem -- Apache Hadoop -- Hadoop distributions -- Pillars of Hadoop-HDFS, MapReduce, and YARN -- Data access components - Hive and Pig -- Data storage component - HBase -- Data ingestion in Hadoop- Sqoop and Flume -- Streaming and real-time analysis - Storm and Spark -- Summary -- Chapter 2: Hadoop Ecosystem -- Traditional systems -- Database trend -- Hadoop use cases -- Hadoop basic data flow -- Hadoop integration -- The Hadoop ecosystem -- Distributed filesystem -- HDFS -- Distributed programming -- NoSQL databases -- Apache HBase -- Data ingestion -- Service Programming -- Apache YARN -- Apache Zookeeper -- Scheduling -- Data analytics and machine learning -- System management -- Apache Ambari -- Summary -- Chapter 3: Pillars of Hadoop - HDFS, MapReduce, and YARN -- HDFS -- Features of HDFS -- HDFS Architecture -- NameNode -- DataNode -- Checkpoint NameNode or Secondary NameNode -- BackupNode -- Data storage in HDFS -- Read pipeline -- Write pipeline -- Rack awareness -- Advantages of rack awareness in HDFS -- HDFS Federation -- Limitations of HDFS 1.0 -- The benefit of HDFS Federation -- HDFS ports -- HDFS commands -- MapReduce.

MapReduce architecture -- JobTracker -- TaskTracker -- Serialization data types -- Writable interface -- WritableComparable interface -- MapReduce example -- The MapReduce process -- Mapper -- Shuffle and sorting -- Reducer -- Speculative execution -- FileFormats -- InputFormats -- RecordReader -- OutputFormats -- RecordWriter -- Writing a MapReduce program -- Mapper code -- Reducer code -- Driver code -- Auxiliary steps -- Combiner -- Partitioner -- YARN -- YARN Architecture -- ResourceManager -- NodeManager -- ApplicationMaster -- Applications powered by YARN -- Summary -- Chapter 4: Data Access Components - Hive and Pig -- Need of a data processing tool on Hadoop -- Pig -- Pig data types -- Pig architecture -- The logical plan -- The physical plan -- The MapReduce plan -- Pig modes -- Grunt shell -- Input data -- Loading data -- Dump -- Store -- Filter -- Group By -- Limit -- Aggregation -- Cogroup -- DESCRIBE -- EXPLAIN -- ILLUSTRATE -- Hive -- Hive architecture -- Metastore -- Query compiler -- Execution engine -- Data types and schemas -- Installing Hive -- Starting Hive Shell -- HiveQL -- DDL (Data Definition Language) operations -- DML (Data Manipulation Language) operations -- SQL operation -- Built-in functions -- Custom UDF (User Defined Functions) -- Managing tables (external versus managed) -- SerDe -- Partitioning -- Bucketing -- Summary -- Chapter 5: Storage Component - HBase -- An Overview of HBase -- Advantages of HBase -- Architecture of HBase -- MasterServer -- RegionServer -- WAL -- BlockCache -- Regions -- MemStore -- Zookeeper -- HBase data model -- Logical components of data model -- ACID properties -- CAP theorem -- Schema design -- Write pipeline -- Read pipeline -- Compaction -- Compaction policy -- Minor compaction -- Major compaction -- Splitting -- Pre-Splitting -- Auto Splitting -- Forced Splitting -- Commands -- help.

Create -- List -- Put -- Scan -- Get -- Disable -- Drop -- HBase Hive integration -- Performance tuning -- Compression -- Filters -- Counters -- HBase co-processors -- Summary -- Chapter 6: Data Ingestion in Hadoop - Sqoop and Flume -- Data sources -- Challenges in data ingestion -- Sqoop -- Connectors and drivers -- Sqoop 1 architecture -- Limitation of Sqoop 1 -- Sqoop 2 architecture -- Imports -- Exports -- Apache Flume -- Reliability -- Flume architecture -- Multitier topology -- Flume Master -- Flume Nodes -- Components in Agent -- Channels -- Examples of configuring Flume -- Single agent example -- Multiple flow in an agent -- Configuring a multi-agent setup -- Summary -- Chapter 7: Streaming and Real-time Analysis - Storm and Spark -- An introduction to Storm -- Features of Storm -- Physical architecture of Storm -- Data architecture of Storm -- Storm topology -- Storm on YARN -- Topology configuration example -- Spouts -- Bolts -- Topology -- An introduction to Spark -- Features of Spark -- Spark framework -- Spark SQL -- GraphX -- MLib -- Spark streaming -- Spark architecture -- Directed Acyclic Graph engine -- Resilient Distributed Dataset -- Physical architecture -- Operations in Spark -- Transformations -- Actions -- Spark example -- Summary -- Index.

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

to post a comment.

© 2024 Resource Centre. All rights reserved.