Big Data : Concepts, Warehousing, and Analytics.

By:

Santos, Maribel Yasmina

Contributor(s):

Costa, Carlos

Material type: Text

text

Media type:

computer

Carrier type:

online resource

ISBN:

9781000794038

Subject(s):

Big data

Genre/Form:

Electronic books.

Additional physical formats: Print version:: Big DataDDC classification:

LOC classification:

QA76.9.B45 .S268 2020

Online resources:

Click to View

Contents:

Cover -- Half Title -- Series Page -- Title Page -- Copyright Page -- Dedication -- Table of Contents -- List of Figures -- List of Tables -- The Authors -- Acknowledgments -- Foreword -- Notation -- 1: Introduction -- 1.1. Objectives of this Book -- 1.2. Intended Audience -- 1.3. Book Structure -- 2: Big Data Concepts, Techniques, and Technologies -- 2.1. Big Data Relevance -- 2.2. Big Data Characteristics -- 2.3. Big Data Challenges -- 2.3.1. Big Data General Dilemmas -- 2.3.2. Challenges in the Big Data Life Cycle -- 2.3.3. Big Data in Secure, Private, and Monitored Environments -- 2.3.4. Organizational Change -- 2.4. Techniques for Big Data Solutions -- 2.4.1. Big Data Life Cycle and Requirements -- 2.4.1.1. General Steps to Process and Analyze Big Data -- 2.4.1.2. Architectural and Infrastructural Requirements -- 2.4.2. The Lambda Architecture -- 2.4.3. Towards Standardization: The NIST Reference Architecture -- 2.5. Big Data Technologies -- 2.5.1. Hadoop and Related Projects -- 2.5.2. Landscape of Distributed SQL Engines -- 2.5.3. Other Technologies for Big Data Analytics -- 3: OLTP-Oriented Databases for Big Data Environments -- 3.1. NoSQL and NewSQL: An Overview -- 3.2. NoSQL Databases -- 3.2.1. Key-Value Databases -- 3.2.1.1. Overview -- 3.2.1.2. Redis -- 3.2.2. Column-Oriented Databases -- 3.2.2.1. Overview -- 3.2.2.2. HBase -- 3.2.2.3. From Relational Models to HBase Data Models -- 3.2.3. Document-Oriented Databases -- 3.2.3.1. Overview -- 3.2.3.2. MongoDB -- 3.2.4. Graph Databases -- 3.2.4.1. Overview -- 3.2.4.2. Neo4j -- 3.3. NewSQL Databases and Translytical Databases -- 4: OLAP-Oriented Databases for Big Data Environments -- 4.1. Hive: The De Facto SQL-on-Hadoop Engine -- 4.1.1. Data Storage Formats -- 4.1.1.1. Text File -- 4.1.1.2. Sequence File -- 4.1.1.3. RCFile -- 4.1.1.4. ORC File -- 4.1.1.5. Avro File -- 4.1.1.6. Parquet.

4.1.2. Partitions and Buckets -- 4.2. From Dimensional Models to Tabular Models -- 4.2.1. Primary Data Tables -- 4.2.2. Derived Data Tables -- 4.3. Optimizing OLAP workloads with Druid -- 5: Design and Implementation of Big Data Warehouses -- 5.1. Big Data Warehousing: An Overview -- 5.2. Model of Logical Components and Data Flows -- 5.2.1. Data Provider and Data Consumer -- 5.2.2. Big Data Application Provider -- 5.2.3. Big Data Framework Provider -- 5.2.3.1. Messaging/Communications, Resource Management, and Infrastructures -- 5.2.3.2. Processing -- 5.2.3.3. Storage: Data Organization and Distribution -- 5.2.4. System Orchestrator and Security, Privacy, and Management -- 5.3. Model of Technological Infrastructure -- 5.4. Method for Data Modeling -- 5.4.1. Analytical Objects and their Related Concepts -- 5.4.2. Joining, Uniting, and Materializing Analytical Objects -- 5.4.3. Dimensional Big Data with Outsourced Descriptive Families -- 5.4.4. Data Modeling Best Practices -- 5.4.4.1. Using Null Values -- 5.4.4.2. Date, Time, and Spatial Objects vs. Separate Temporal and Spatial Attributes -- 5.4.4.3. Immutable vs. Mutable Records -- 5.4.5. Data Modeling Advantages and Disadvantages -- 6: Big Data Warehouses Modeling: From Theory to Practice -- 6.1. Multinational Bicycle Wholesale and Manufacturing -- 6.1.1. Fully Flat or Fully Dimensional Data Models -- 6.1.2. Nested Attributes -- 6.1.3. Streaming and Random Access on Mutable Analytical Objects -- 6.2. Brokerage Firm -- 6.2.1. Unnecessary Complementary Analytical Objects and Update Problems -- 6.2.1.1. The Traditional Way of Handling SCD-Like Scenarios -- 6.2.1.2. A New Way of Handling SCD-Like Scenarios -- 6.2.2. Joining Complementary Analytical Objects -- 6.2.3. Data Science Models and Insights as a Core Value -- 6.2.4. Partition Keys for Streaming and Batch Analytical Objects -- 6.3. Retail.

6.3.1. Simpler Data Models: Dynamic Partitioning Schemas -- 6.3.2. Considerations for Spatial Objects -- 6.3.3. Analyzing Non-Existing Events -- 6.3.4. Wide Descriptive Families -- 6.3.5. The Need for Joins in Data CPE Workloads -- 6.4. Code Version Control System -- 6.5. A Global Database of Society - The GDELT Project -- 6.6. Air Quality -- 7: Fueling Analytical Objects in Big Data Warehouses -- 7.1. From Traditional Data Warehouses -- 7.2. From OLTP NoSQL Databases -- 7.3. From Semi-Structured Data Sources -- 7.4. From Streaming Data Sources -- 7.5. Using Data Science Models -- 7.5.1. Data Mining/Machine Learning Models for Structured Data -- 7.5.2. Text Mining, Image Mining, and Video Mining Models -- 8: Evaluating the Performance of Big Data Warehouses -- 8.1. The SSB+ Benchmark -- 8.1.1. Data Model and Queries -- 8.1.2. System Architecture and Infrastructure -- 8.2. Batch OLAP -- 8.2.1. Comparing Flat Analytical Objects with Star Schemas -- 8.2.2. Improving Performance with Adequate Data Partitioning -- 8.2.3. The Impact of Dimensions' Size in Star Schemas -- 8.2.4. The Impact of Nested Structures in Analytical Objects -- 8.2.5. Drill Across Queries and Window and Analytics Functions -- 8.3. Streaming OLAP -- 8.3.1. The Impact of Data Volume in the Streaming Storage Component -- 8.3.2. Considerations for Effective and Efficient Streaming OLAP -- 8.4. SQL-on-Hadoop Systems under Multi-User Environments -- 9: Big Data Warehousing in Smart Cities -- 9.1. Logical Components, Data Flows, and Technological Infrastructure -- 9.1.1. SusCity Architecture -- 9.1.2. SusCity Infrastructure -- 9.2. SusCity Data Model -- 9.2.1. Buildings Characteristics as an Outsourced Descriptive Family -- 9.2.2. Nested Structures in Analytical Objects -- 9.3. The Inter-Storage Pipeline -- 9.4. The SusCity Data Visualization Platform -- 9.4.1. City's Energy Consumption.

9.4.2. City's Energy Grid Simulations -- 9.4.3. Buildings' Performance Analysis and Simulation -- 9.4.4. Mobility Patterns Analysis -- 10: Conclusion -- 10.1. Synopsis of the Book -- 10.2. Contributions to the State of the Art -- References -- Index.

Summary: This book addresses models and methods for designing and implementing Big Data Systems to support mixed and complex decision processes, giving special attention to BDWs as a way of efficiently storing and processing batch or streaming data for structured or semi-structured analytical problems.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

No physical items for this record

This book addresses models and methods for designing and implementing Big Data Systems to support mixed and complex decision processes, giving special attention to BDWs as a way of efficiently storing and processing batch or streaming data for structured or semi-structured analytical problems.

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

to post a comment.