Handbook of Big Data Technologies.
Material type:
- text
- computer
- online resource
- 9783319493404
- 005.7
- TK7885-7895
Intro -- Foreword -- Preface -- Contents -- Part I Fundamentals of Big Data Processing -- Big Data Storage and Data Models -- 1 Storage Models -- 1.1 Block-Based Storage -- 1.2 File-Based Storage -- 1.3 Object-Based Storage -- 1.4 Comparison of Storage Models -- 2 Data Models -- 2.1 NoSQL (Not only SQL) -- 2.2 Relational-Based -- 2.3 Summary of Data Models -- References -- Big Data Programming Models -- 1 MapReduce -- 1.1 Features -- 1.2 Examples -- 2 Functional Programming -- 2.1 Features -- 2.2 Example Frameworks -- 3 SQL-Like -- 3.1 Features -- 3.2 Examples -- 4 Actor Model -- 4.1 Features -- 4.2 Examples -- 5 Statistical and Analytical -- 5.1 Features -- 5.2 Examples -- 6 Dataflow-Based -- 6.1 Features -- 6.2 Examples -- 7 Bulk Synchronous Parallel -- 7.1 Features -- 7.2 Examples -- 8 High Level DSL -- 8.1 Pig Latin -- 8.2 Crunch/FlumeJava -- 8.3 Cascading -- 8.4 Dryad LINQ -- 8.5 Trident -- 8.6 Green Marl -- 8.7 Asterix Query Language (AQL) -- 8.8 IBM Jaql -- 9 Discussion and Conclusion -- References -- Programming Platforms for Big Data Analysis -- 1 Introduction -- 2 Requirements of Big Data Programming Support -- 3 Classification of Programming Platforms -- 3.1 Data Source -- 3.2 Processing Technique -- 4 Major Existing Programming Platforms -- 4.1 Data Parallel Programming Platforms -- 4.2 Graph Parallel Programming Platforms -- 4.3 Task Parallel Platforms -- 4.4 Stream Processing Programming Platforms -- 5 A Unifying Framework -- 5.1 Comparison of Existing Programming Platforms -- 5.2 Need for Unifying Framework -- 5.3 MatrixMap Framework -- 6 Conclusion and Future Directions -- References -- Big Data Analysis on Clouds -- 1 Introduction -- 2 Introducing Cloud Computing -- 2.1 Basic Concepts -- 2.2 Cloud Service Distribution and Deployment Models -- 3 Cloud Solutions for Big Data -- 3.1 Microsoft Azure -- 3.2 Amazon Web Services.
3.3 OpenNebula -- 3.4 OpenStack -- 4 Systems for Big Data Analytics in the Cloud -- 4.1 MapReduce -- 4.2 Spark -- 4.3 Mahout -- 4.4 Hunk -- 4.5 Sector/Sphere -- 4.6 BigML -- 4.7 Kognitio Analytical Platform -- 4.8 Data Analysis Workflows -- 4.9 NoSQL Models for Data Analytics -- 4.10 Visual Analytics -- 4.11 Big Data Funding Projects -- 4.12 Historical Review -- 4.13 Summary -- 5 Research Trends -- 6 Conclusions -- References -- Data Organization and Curation in Big Data -- 1 Big Data Indexing Techniques -- 1.1 Overview -- 1.2 Record-Level Non-adaptive Indexing -- 1.3 Record-Level Adaptive Indexing -- 1.4 Split-Level Indexing -- 1.5 Hadoop-RDBMS Hybrid Indexing -- 2 Data Organization and Layout Techniques -- 2.1 Overview -- 2.2 Result Materialization and Caching Techniques -- 2.3 Pre-processing and Colocation Techniques -- 2.4 None Row-Oriented Storage Layouts -- 3 Non-traditional Workloads in Big Data -- 3.1 Overview -- 3.2 Techniques for Recurring Workloads -- 3.3 Techniques for Fast Online Analytics -- 4 Curation and Metadata Management in Big Data -- 4.1 Overview -- 4.2 Execution-Centric Metadata Approach -- 4.3 Provenance-Centric Metadata Approach -- 4.4 Data-Centric Metadata Approach -- 5 Conclusion -- References -- Big Data Query Engines -- 1 Introduction -- 1.1 MPP Query Engines -- 1.2 Hadoop Query Engines -- 1.3 Chapter Organization -- 2 Massively Parallel Query Engines -- 2.1 Teradata -- 2.2 Greenplum -- 2.3 Vertica -- 3 Hadoop Query Engines -- 3.1 MapReduce -- 3.2 Hive -- 3.3 Spark -- 4 SQL on Hadoop -- 4.1 HAWQ -- 4.2 Impala -- 4.3 Presto -- 5 Query Optimization -- 5.1 Research Problems -- 5.2 Orca -- 5.3 Catalyst -- 5.4 V2Opt -- 5.5 Impala Query Optimizer -- 6 Query Execution -- 6.1 Research Problems -- 6.2 Hadoop-Based Execution Engines -- 6.3 Parallel Databases Execution Engines -- 6.4 Code Generation -- 7 Summary -- References.
Large-Scale Data Stream Processing Systems -- 1 Introduction -- 1.1 Stream Processing and Its Precursors -- 1.2 Large-Scale Data Stream Processing on Commodity Clusters -- 1.3 Distinctive Features of Data Stream Processing Systems -- 1.4 Chapter Overview -- 2 Programming Models -- 2.1 Programming with Streams -- 2.2 Lower-Level Dataflow Programming -- 2.3 Functional APIs -- 2.4 Stream Windows -- 3 System Support for Distributed Data Streaming -- 3.1 An Analysis of Large-Scale Stream Processing Systems -- 3.2 Execution Models -- 3.3 Processing Guarantees Upon Failure -- 3.4 Flow Control -- 3.5 Execution Plan Optimisations -- 4 Case Study: Stream Processing with Apache Flink -- 4.1 The Apache Flink Stack -- 4.2 The Apache Flink System Architecture -- 4.3 Lightweight Asynchronous Snapshots -- 5 Applications, Trends and Open Challenges -- 5.1 Graph Stream Processing -- 5.2 Online Learning -- 5.3 Complex Event Processing -- 6 Conclusions and Outlook -- References -- Part II Semantic Big Data Management -- Semantic Data Integration -- 1 An Important Challenge -- 1.1 Linked Data -- 1.2 Ontologies -- 1.3 Ontology and Data Alignment -- 2 Current State-of-the-Art -- 2.1 Interactive and Collaborative Approaches -- 2.2 Visualizing the Data Integration Process -- 2.3 Integrating Geospatial Data -- 2.4 Integrating Biomedical Data -- 3 The Path Forward -- 3.1 Moving Beyond 1-to-1 Equivalence Mappings -- 3.2 Advancing Alignment Evaluation -- 3.3 Contextualizing Alignments -- References -- Linked Data Management -- 1 Introduction -- 2 Background Information -- 3 Native Linked Data Stores -- 3.1 Quadruple Systems -- 3.2 Index Permuted Stores -- 3.3 Graph-Based Systems -- 4 Provenance for Linked Data -- 4.1 Provenance Representations -- 4.2 Provenance in Data Management Systems -- References -- Non-native RDF Storage Engines -- 1 Introduction.
2 Storing Linked Data Using Relational Databases -- 2.1 Statement Table -- 2.2 Optimizing Data Storage -- 2.3 Property Tables -- 2.4 Query Execution -- 3 No-SQL Stores -- 4 Massively Parallel Processing for Linked Data -- 4.1 Data Storage and Partitioning -- 4.2 Query Execution -- References -- Exploratory Ad-Hoc Analytics for Big Data -- 1 Exploratory Analytics for Big Data -- 1.1 Requirements -- 1.2 Architecture Overview -- 2 A Top-K Entity Augmentation System -- 2.1 Motivation and Challenges -- 2.2 Requirements -- 2.3 Top-k Consistent Entity Augmentation -- 2.4 Related Work -- 3 DrillBeyond -- Processing Open World SQL -- 3.1 Motivation and Challenges -- 3.2 Requirements -- 3.3 The DrillBeyond System -- 3.4 Processing Multi-result Queries -- 3.5 Related Work -- 4 Summary and Future Work -- 4.1 Future Work -- References -- Pattern Matching Over Linked Data Streams -- 1 Overview -- 2 Linked Data Dissemination System -- 2.1 System Overview -- 2.2 TP-Automata for Single Triple Pattern Query Matching -- 2.3 CTP-Automata for Conjunctive Triple Pattern Query Matching -- 3 Experimental Evaluation -- 3.1 Experimental Setup -- 3.2 Evaluation of TP-Automata -- 3.3 Evaluation of CTP-Automata -- 3.4 Limitations -- 4 Related Work -- 5 Summary -- References -- Searching the Big Data: Practices and Experiences in Efficiently Querying Knowledge Bases -- 1 Introduction -- 2 Background -- 2.1 Knowledge Base Preliminary -- 3 The Framework of Cache-Based Knowledge Base Querying -- 4 Similar Queries Suggestion -- 4.1 Query Distance Calculation -- 4.2 Feature Modeling -- 5 Cache Replacement -- 5.1 Modified Simple Exponential Smoothing -- 5.2 Replacement Algorithms -- 6 Implementation and Experimental Evaluation -- 6.1 Setup -- 6.2 Performance of Cache Replacement Algorithm -- 6.3 Comparison of Feature Modeling Approaches.
6.4 Performance Comparison with the State-of-the-Art Work -- 6.5 Experimental Conclusion -- 7 Related Work -- 7.1 Semantic Caching -- 7.2 Query Suggestion -- 8 Discussion and Conclusion -- References -- Part III Big Graph Analytics -- Management and Analysis of Big Graph Data: Current Systems and Open Challenges -- 1 Introduction -- 2 Graph Databases -- 2.1 Recent Graph Database Systems -- 2.2 Graph Data Models -- 2.3 Query Language Support -- 3 Graph Processing -- 3.1 General Architecture -- 3.2 Think Like a Vertex -- 3.3 Think Like a Graph -- 4 Graph Dataflow Systems -- 4.1 Apache Flink -- 4.2 Apache Flink Gelly -- 4.3 Comparison to Other Graph Dataflow Frameworks -- 5 Gradoop -- 5.1 Architecture -- 5.2 Extended Property Graph Model -- 6 Comparison -- 7 Current Research and Open Challenges -- 7.1 Graph Data Allocation and Partitioning -- 7.2 Benchmarking and Evaluation of Graph Data Systems -- 7.3 Analysis of Dynamic Graphs -- 7.4 Graph-Based Data Integration and Knowledge Graphs -- 7.5 Interactive Graph Analytics -- 8 Conclusions and Outlook -- References -- Similarity Search in Large-Scale Graph Databases -- 1 Introduction -- 2 Preliminaries -- 3 The Pruning-Verification Framework -- 4 State-of-the-Art Approaches -- 4.1 A Tree-Based Approach: K-Adjacent Tree -- 4.2 A Star-Based Approach: SEGOS -- 4.3 A Path-Based Approach: GSimJoin -- 4.4 A Partition-Based Approach: Pars -- 5 Future Research Directions -- 5.1 New GED Bounds and Search Algorithms -- 5.2 Rich Semantics of Similarity Search -- 5.3 Graph Query Formulation and Understanding -- 6 Summary -- References -- Big-Graphs: Querying, Mining, and Beyond -- 1 Introduction -- 2 Graph Data Models -- 2.1 RDF -- 2.2 Property Graph -- 3 Pattern Matching Techniques Over Big-Graphs -- 3.1 SQL and NoSQL Approaches -- 3.2 Keyword Search -- 3.3 Graph Matching Query -- 3.4 Graph Query by Example.
4 Mining Techniques Over Big-Graphs.
Description based on publisher supplied metadata and other sources.
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
There are no comments on this title.