ORPP logo
Image from Google Jackets

Big Data : Concepts, Warehousing, and Analytics.

By: Contributor(s): Material type: TextTextPublisher: Milton : River Publishers, 2020Copyright date: ©2020Edition: 1st edDescription: 1 online resource (315 pages)Content type:
  • text
Media type:
  • computer
Carrier type:
  • online resource
ISBN:
  • 9781000794038
Subject(s): Genre/Form: Additional physical formats: Print version:: Big DataDDC classification:
  • 5.7
LOC classification:
  • QA76.9.B45 .S268 2020
Online resources:
Contents:
Cover -- Half Title -- Series Page -- Title Page -- Copyright Page -- Dedication -- Table of Contents -- List of Figures -- List of Tables -- The Authors -- Acknowledgments -- Foreword -- Notation -- 1: Introduction -- 1.1. Objectives of this Book -- 1.2. Intended Audience -- 1.3. Book Structure -- 2: Big Data Concepts, Techniques, and Technologies -- 2.1. Big Data Relevance -- 2.2. Big Data Characteristics -- 2.3. Big Data Challenges -- 2.3.1. Big Data General Dilemmas -- 2.3.2. Challenges in the Big Data Life Cycle -- 2.3.3. Big Data in Secure, Private, and Monitored Environments -- 2.3.4. Organizational Change -- 2.4. Techniques for Big Data Solutions -- 2.4.1. Big Data Life Cycle and Requirements -- 2.4.1.1. General Steps to Process and Analyze Big Data -- 2.4.1.2. Architectural and Infrastructural Requirements -- 2.4.2. The Lambda Architecture -- 2.4.3. Towards Standardization: The NIST Reference Architecture -- 2.5. Big Data Technologies -- 2.5.1. Hadoop and Related Projects -- 2.5.2. Landscape of Distributed SQL Engines -- 2.5.3. Other Technologies for Big Data Analytics -- 3: OLTP-Oriented Databases for Big Data Environments -- 3.1. NoSQL and NewSQL: An Overview -- 3.2. NoSQL Databases -- 3.2.1. Key-Value Databases -- 3.2.1.1. Overview -- 3.2.1.2. Redis -- 3.2.2. Column-Oriented Databases -- 3.2.2.1. Overview -- 3.2.2.2. HBase -- 3.2.2.3. From Relational Models to HBase Data Models -- 3.2.3. Document-Oriented Databases -- 3.2.3.1. Overview -- 3.2.3.2. MongoDB -- 3.2.4. Graph Databases -- 3.2.4.1. Overview -- 3.2.4.2. Neo4j -- 3.3. NewSQL Databases and Translytical Databases -- 4: OLAP-Oriented Databases for Big Data Environments -- 4.1. Hive: The De Facto SQL-on-Hadoop Engine -- 4.1.1. Data Storage Formats -- 4.1.1.1. Text File -- 4.1.1.2. Sequence File -- 4.1.1.3. RCFile -- 4.1.1.4. ORC File -- 4.1.1.5. Avro File -- 4.1.1.6. Parquet.
4.1.2. Partitions and Buckets -- 4.2. From Dimensional Models to Tabular Models -- 4.2.1. Primary Data Tables -- 4.2.2. Derived Data Tables -- 4.3. Optimizing OLAP workloads with Druid -- 5: Design and Implementation of Big Data Warehouses -- 5.1. Big Data Warehousing: An Overview -- 5.2. Model of Logical Components and Data Flows -- 5.2.1. Data Provider and Data Consumer -- 5.2.2. Big Data Application Provider -- 5.2.3. Big Data Framework Provider -- 5.2.3.1. Messaging/Communications, Resource Management, and Infrastructures -- 5.2.3.2. Processing -- 5.2.3.3. Storage: Data Organization and Distribution -- 5.2.4. System Orchestrator and Security, Privacy, and Management -- 5.3. Model of Technological Infrastructure -- 5.4. Method for Data Modeling -- 5.4.1. Analytical Objects and their Related Concepts -- 5.4.2. Joining, Uniting, and Materializing Analytical Objects -- 5.4.3. Dimensional Big Data with Outsourced Descriptive Families -- 5.4.4. Data Modeling Best Practices -- 5.4.4.1. Using Null Values -- 5.4.4.2. Date, Time, and Spatial Objects vs. Separate Temporal and Spatial Attributes -- 5.4.4.3. Immutable vs. Mutable Records -- 5.4.5. Data Modeling Advantages and Disadvantages -- 6: Big Data Warehouses Modeling: From Theory to Practice -- 6.1. Multinational Bicycle Wholesale and Manufacturing -- 6.1.1. Fully Flat or Fully Dimensional Data Models -- 6.1.2. Nested Attributes -- 6.1.3. Streaming and Random Access on Mutable Analytical Objects -- 6.2. Brokerage Firm -- 6.2.1. Unnecessary Complementary Analytical Objects and Update Problems -- 6.2.1.1. The Traditional Way of Handling SCD-Like Scenarios -- 6.2.1.2. A New Way of Handling SCD-Like Scenarios -- 6.2.2. Joining Complementary Analytical Objects -- 6.2.3. Data Science Models and Insights as a Core Value -- 6.2.4. Partition Keys for Streaming and Batch Analytical Objects -- 6.3. Retail.
6.3.1. Simpler Data Models: Dynamic Partitioning Schemas -- 6.3.2. Considerations for Spatial Objects -- 6.3.3. Analyzing Non-Existing Events -- 6.3.4. Wide Descriptive Families -- 6.3.5. The Need for Joins in Data CPE Workloads -- 6.4. Code Version Control System -- 6.5. A Global Database of Society - The GDELT Project -- 6.6. Air Quality -- 7: Fueling Analytical Objects in Big Data Warehouses -- 7.1. From Traditional Data Warehouses -- 7.2. From OLTP NoSQL Databases -- 7.3. From Semi-Structured Data Sources -- 7.4. From Streaming Data Sources -- 7.5. Using Data Science Models -- 7.5.1. Data Mining/Machine Learning Models for Structured Data -- 7.5.2. Text Mining, Image Mining, and Video Mining Models -- 8: Evaluating the Performance of Big Data Warehouses -- 8.1. The SSB+ Benchmark -- 8.1.1. Data Model and Queries -- 8.1.2. System Architecture and Infrastructure -- 8.2. Batch OLAP -- 8.2.1. Comparing Flat Analytical Objects with Star Schemas -- 8.2.2. Improving Performance with Adequate Data Partitioning -- 8.2.3. The Impact of Dimensions' Size in Star Schemas -- 8.2.4. The Impact of Nested Structures in Analytical Objects -- 8.2.5. Drill Across Queries and Window and Analytics Functions -- 8.3. Streaming OLAP -- 8.3.1. The Impact of Data Volume in the Streaming Storage Component -- 8.3.2. Considerations for Effective and Efficient Streaming OLAP -- 8.4. SQL-on-Hadoop Systems under Multi-User Environments -- 9: Big Data Warehousing in Smart Cities -- 9.1. Logical Components, Data Flows, and Technological Infrastructure -- 9.1.1. SusCity Architecture -- 9.1.2. SusCity Infrastructure -- 9.2. SusCity Data Model -- 9.2.1. Buildings Characteristics as an Outsourced Descriptive Family -- 9.2.2. Nested Structures in Analytical Objects -- 9.3. The Inter-Storage Pipeline -- 9.4. The SusCity Data Visualization Platform -- 9.4.1. City's Energy Consumption.
9.4.2. City's Energy Grid Simulations -- 9.4.3. Buildings' Performance Analysis and Simulation -- 9.4.4. Mobility Patterns Analysis -- 10: Conclusion -- 10.1. Synopsis of the Book -- 10.2. Contributions to the State of the Art -- References -- Index.
Summary: This book addresses models and methods for designing and implementing Big Data Systems to support mixed and complex decision processes, giving special attention to BDWs as a way of efficiently storing and processing batch or streaming data for structured or semi-structured analytical problems.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
No physical items for this record

Cover -- Half Title -- Series Page -- Title Page -- Copyright Page -- Dedication -- Table of Contents -- List of Figures -- List of Tables -- The Authors -- Acknowledgments -- Foreword -- Notation -- 1: Introduction -- 1.1. Objectives of this Book -- 1.2. Intended Audience -- 1.3. Book Structure -- 2: Big Data Concepts, Techniques, and Technologies -- 2.1. Big Data Relevance -- 2.2. Big Data Characteristics -- 2.3. Big Data Challenges -- 2.3.1. Big Data General Dilemmas -- 2.3.2. Challenges in the Big Data Life Cycle -- 2.3.3. Big Data in Secure, Private, and Monitored Environments -- 2.3.4. Organizational Change -- 2.4. Techniques for Big Data Solutions -- 2.4.1. Big Data Life Cycle and Requirements -- 2.4.1.1. General Steps to Process and Analyze Big Data -- 2.4.1.2. Architectural and Infrastructural Requirements -- 2.4.2. The Lambda Architecture -- 2.4.3. Towards Standardization: The NIST Reference Architecture -- 2.5. Big Data Technologies -- 2.5.1. Hadoop and Related Projects -- 2.5.2. Landscape of Distributed SQL Engines -- 2.5.3. Other Technologies for Big Data Analytics -- 3: OLTP-Oriented Databases for Big Data Environments -- 3.1. NoSQL and NewSQL: An Overview -- 3.2. NoSQL Databases -- 3.2.1. Key-Value Databases -- 3.2.1.1. Overview -- 3.2.1.2. Redis -- 3.2.2. Column-Oriented Databases -- 3.2.2.1. Overview -- 3.2.2.2. HBase -- 3.2.2.3. From Relational Models to HBase Data Models -- 3.2.3. Document-Oriented Databases -- 3.2.3.1. Overview -- 3.2.3.2. MongoDB -- 3.2.4. Graph Databases -- 3.2.4.1. Overview -- 3.2.4.2. Neo4j -- 3.3. NewSQL Databases and Translytical Databases -- 4: OLAP-Oriented Databases for Big Data Environments -- 4.1. Hive: The De Facto SQL-on-Hadoop Engine -- 4.1.1. Data Storage Formats -- 4.1.1.1. Text File -- 4.1.1.2. Sequence File -- 4.1.1.3. RCFile -- 4.1.1.4. ORC File -- 4.1.1.5. Avro File -- 4.1.1.6. Parquet.

4.1.2. Partitions and Buckets -- 4.2. From Dimensional Models to Tabular Models -- 4.2.1. Primary Data Tables -- 4.2.2. Derived Data Tables -- 4.3. Optimizing OLAP workloads with Druid -- 5: Design and Implementation of Big Data Warehouses -- 5.1. Big Data Warehousing: An Overview -- 5.2. Model of Logical Components and Data Flows -- 5.2.1. Data Provider and Data Consumer -- 5.2.2. Big Data Application Provider -- 5.2.3. Big Data Framework Provider -- 5.2.3.1. Messaging/Communications, Resource Management, and Infrastructures -- 5.2.3.2. Processing -- 5.2.3.3. Storage: Data Organization and Distribution -- 5.2.4. System Orchestrator and Security, Privacy, and Management -- 5.3. Model of Technological Infrastructure -- 5.4. Method for Data Modeling -- 5.4.1. Analytical Objects and their Related Concepts -- 5.4.2. Joining, Uniting, and Materializing Analytical Objects -- 5.4.3. Dimensional Big Data with Outsourced Descriptive Families -- 5.4.4. Data Modeling Best Practices -- 5.4.4.1. Using Null Values -- 5.4.4.2. Date, Time, and Spatial Objects vs. Separate Temporal and Spatial Attributes -- 5.4.4.3. Immutable vs. Mutable Records -- 5.4.5. Data Modeling Advantages and Disadvantages -- 6: Big Data Warehouses Modeling: From Theory to Practice -- 6.1. Multinational Bicycle Wholesale and Manufacturing -- 6.1.1. Fully Flat or Fully Dimensional Data Models -- 6.1.2. Nested Attributes -- 6.1.3. Streaming and Random Access on Mutable Analytical Objects -- 6.2. Brokerage Firm -- 6.2.1. Unnecessary Complementary Analytical Objects and Update Problems -- 6.2.1.1. The Traditional Way of Handling SCD-Like Scenarios -- 6.2.1.2. A New Way of Handling SCD-Like Scenarios -- 6.2.2. Joining Complementary Analytical Objects -- 6.2.3. Data Science Models and Insights as a Core Value -- 6.2.4. Partition Keys for Streaming and Batch Analytical Objects -- 6.3. Retail.

6.3.1. Simpler Data Models: Dynamic Partitioning Schemas -- 6.3.2. Considerations for Spatial Objects -- 6.3.3. Analyzing Non-Existing Events -- 6.3.4. Wide Descriptive Families -- 6.3.5. The Need for Joins in Data CPE Workloads -- 6.4. Code Version Control System -- 6.5. A Global Database of Society - The GDELT Project -- 6.6. Air Quality -- 7: Fueling Analytical Objects in Big Data Warehouses -- 7.1. From Traditional Data Warehouses -- 7.2. From OLTP NoSQL Databases -- 7.3. From Semi-Structured Data Sources -- 7.4. From Streaming Data Sources -- 7.5. Using Data Science Models -- 7.5.1. Data Mining/Machine Learning Models for Structured Data -- 7.5.2. Text Mining, Image Mining, and Video Mining Models -- 8: Evaluating the Performance of Big Data Warehouses -- 8.1. The SSB+ Benchmark -- 8.1.1. Data Model and Queries -- 8.1.2. System Architecture and Infrastructure -- 8.2. Batch OLAP -- 8.2.1. Comparing Flat Analytical Objects with Star Schemas -- 8.2.2. Improving Performance with Adequate Data Partitioning -- 8.2.3. The Impact of Dimensions' Size in Star Schemas -- 8.2.4. The Impact of Nested Structures in Analytical Objects -- 8.2.5. Drill Across Queries and Window and Analytics Functions -- 8.3. Streaming OLAP -- 8.3.1. The Impact of Data Volume in the Streaming Storage Component -- 8.3.2. Considerations for Effective and Efficient Streaming OLAP -- 8.4. SQL-on-Hadoop Systems under Multi-User Environments -- 9: Big Data Warehousing in Smart Cities -- 9.1. Logical Components, Data Flows, and Technological Infrastructure -- 9.1.1. SusCity Architecture -- 9.1.2. SusCity Infrastructure -- 9.2. SusCity Data Model -- 9.2.1. Buildings Characteristics as an Outsourced Descriptive Family -- 9.2.2. Nested Structures in Analytical Objects -- 9.3. The Inter-Storage Pipeline -- 9.4. The SusCity Data Visualization Platform -- 9.4.1. City's Energy Consumption.

9.4.2. City's Energy Grid Simulations -- 9.4.3. Buildings' Performance Analysis and Simulation -- 9.4.4. Mobility Patterns Analysis -- 10: Conclusion -- 10.1. Synopsis of the Book -- 10.2. Contributions to the State of the Art -- References -- Index.

This book addresses models and methods for designing and implementing Big Data Systems to support mixed and complex decision processes, giving special attention to BDWs as a way of efficiently storing and processing batch or streaming data for structured or semi-structured analytical problems.

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

to post a comment.

© 2024 Resource Centre. All rights reserved.