Data Architecture : Big Data, Data Warehouse and Data Vault.

By:

Inmon, W. H

Contributor(s):

Linstedt, Daniel

Material type: Text

text

Media type:

computer

Carrier type:

online resource

ISBN:

9780128020913

Subject(s):

Big data

Genre/Form:

Electronic books.

Additional physical formats: Print version:: Data Architecture: a Primer for the Data ScientistLOC classification:

QA76.9.D37 -- .I566 2015eb

Online resources:

Click to View

Contents:

Cover -- Title Page -- Copyright -- Dedication -- Contents -- Preface -- About the authors -- 1.1 - Corporate data -- The Totality of Data Across the Corporation -- Dividing Unstructured Data -- Business Relevancy -- Big Data -- The Great Divide -- The Continental Divide -- The Complete Picture -- 1.2 - The data infrastructure -- Two Types of Repetitive Data -- Repetitive Structured Data -- Repetitive Big Data -- The Two Infrastructures -- What's being Optimized? -- Comparing the Two Infrastructures -- 1.3 - The "great divide" -- Classifying Corporate Data -- The "Great Divide" -- Repetitive Unstructured Data -- Nonrepetitive Unstructured Data -- Different Worlds -- 1.4 - Demographics of corporate data -- 1.5 - Corporate data analysis -- 1.6 - The life cycle of data - understanding data over time -- 1.7 - A brief history of data -- Paper Tape and Punch Cards -- Magnetic Tapes -- Disk Storage -- Database Management System -- Coupled Processors -- Online Transaction Processing -- Data Warehouse -- Parallel Data Management -- Data Vault -- Big Data -- The Great Divide -- 2.1 - A brief history of big data -- An Analogy - Taking the High Ground -- Taking the High Ground -- Standardization with the 360 -- Online Transaction Processing -- Enter Teradata and Massively Parallel Processing -- Then Came Hadoop and Big Data -- IBM and Hadoop -- Holding the High Ground -- 2.2 - What is big data? -- Another Definition -- Large Volumes -- Inexpensive Storage -- The Roman Census Approach -- Unstructured Data -- Data in Big Data -- Context in Repetitive Data -- Nonrepetitive Data -- Context in Nonrepetitive Data -- 2.3 - Parallel processing -- 2.4 - Unstructured data -- Textual Information Everywhere -- Decisions Based on Structured Data -- The Business Value Proposition -- Repetitive and Nonrepetitive Unstructured Information -- Ease of Analysis.

Contextualization -- Some Approaches to Contextualization -- MapReduce -- Manual Analysis -- 2.5 - Contextualizing repetitive unstructured data -- Parsing Repetitive Unstructured Data -- Recasting the Output Data -- 2.6 - Textual disambiguation -- From Narrative into an Analytical Database -- Input into Textual Disambiguation -- Mapping -- Input/Output -- Document Fracturing/Named Value Processing -- Preprocessing a Document -- Emails - A Special Case -- Spreadsheets -- Report Decompilation -- 2.7 - Taxonomies -- Data Models and Taxonomies -- Applicability of Taxonomies -- What is a Taxonomy? -- Taxonomies in Multiple Languages -- Dynamics of Taxonomies and Textual Disambiguation -- Taxonomies and Textual Disambiguation - Separate Technologies -- Different Types of Taxonomies -- Taxonomies - Maintenance Over Time -- 3.1 - A brief history of data warehouse -- Early Applications -- Online Applications -- Extract Programs -- 4GL Technology -- Personal Computers -- Spreadsheets -- Integrity of Data -- Spider-Web Systems -- The Maintenance Backlog -- The Data Warehouse -- To an Architected Environment -- To the CIF -- DW 2.0 -- 3.2 - Integrated corporate data -- Many Applications -- Looking Across the Corporation -- More Than One Analyst -- ETL Technology -- The Challenges of Integration -- The Benefits of a Data Warehouse -- The Granular Perspective -- 3.3 - Historical data -- 3.4 - Data marts -- Granular Data -- Relational Database Design -- The Data Mart -- Key Performance Indicators -- The Dimensional Model -- Combining the Data Warehouse and Data Marts -- 3.5 - The operational data store -- Online Transaction Processing on Integrated Data -- The Operational Data Store -- ODS and the Data Warehouse -- ODS Classes -- External Updates into the ODS -- The ODS/Data Warehouse Interface -- 3.6 - What a data warehouse is not.

A Simple Data Warehouse Architecture -- Online High-Performance Transaction Processing in the Data Warehouse -- Integrity of Data -- The Data Warehouse Workload -- Statistical Processing from the Data Warehouse -- The Frequency of Statistical Processing -- The Exploration Warehouse -- 4.1 - Introduction to data vault -- Data Vault 2.0 Modeling -- Data Vault 2.0 Methodology Defined -- Data Vault 2.0 Architecture -- Data Vault 2.0 Implementation -- Business Benefits of Data Vault 2.0 -- Data Vault 1.0 -- 4.2 - Introduction to data vault modeling -- A Data Vault Model Concept -- Data Vault Model Defined -- Components of a Data Vault Model -- Business keys -- Data Vault and Data Warehousing -- Translating to Data Vault Modeling -- Data Restructure -- Basic Rules of Data Vault Modeling -- Why We Need Many-to-Many Link Structures -- Hash keys Instead of Sequence Numbers -- 4.3 - Introduction to data vault architecture -- Data Vault 2.0 Architecture -- How NoSQL Fits into the Architecture -- Data Vault 2.0 Architecture Objectives -- Data Vault 2.0 Modeling Objective -- Hard and Soft Business Rules -- Managed SSBI and the Architecture -- 4.4 - Introduction to data vault methodology -- Data Vault 2.0 Methodology Overview -- CMMI and Data Vault 2.0 Methodology -- CMMI Versus Agility -- Project Management Practices and SDLC Versus CMMI and Agile -- Six Sigma and Data Vault 2.0 Methodology -- Total Quality Management -- 4.5 - Introduction to data vault implementation -- Implementation Overview -- The Importance of Patterns -- Reengineering and Big Data -- Virtualize Our Data Marts -- Managed Self-Service BI -- 5.1 - The operational environment - a short history -- Commercial Uses of the Computer -- The First Applications -- Ed Yourdon and the Structured Revolution -- System Development Life Cycle -- Disk Technology -- Enter the Database Management System.

Response Time and Availability -- Corporate Computing Today -- 5.2 - The standard work unit -- Elements of Response Time -- An Hourglass Analogy -- The Racetrack Analogy -- Your Vehicle Runs as Fast as the Vehicle in Front of It -- The Standard Work Unit -- The Service Level Agreement -- 5.3 - Data modeling for the structured environment -- The Purpose of the Road Map -- Granular Data Only -- The Entity Relationship Diagram -- The DIS -- Physical Database Design -- Relating the Different Levels of the Data Model -- An Example of the Linkage -- Generic Data Models -- Operational Data Models and Data Warehouse Data Models -- 5.4 - Metadata -- Typical Metadata -- The Repository -- Using Metadata -- Analytical Uses of Metadata -- Looking at Multiple Systems -- The Lineage of Data -- Comparing Existing Systems to Proposed Systems -- 5.5 - Data governance of structured data -- A Corporate Activity -- Motivations for Data Governance -- Repairing Data -- Granular, Detailed Data -- Documentation -- Data Stewardship -- 6.1 - A brief history of data architecture -- 6.2 - Big Data/existing systems interface -- The Big Data/Existing Systems Interface -- The Repetitive Raw Big Data/Existing Systems Interface -- Exception-Based Data -- The Nonrepetitive Raw Big Data/Existing Systems Interface -- Into the Existing Systems Environment -- The "Context-Enriched" Big Data Environment -- Analyzing Structured Data/Unstructured Data Together -- 6.3 - The data warehouse/operational environment interface -- The Operational/Data Warehouse Interface -- The Classical ETL Interface -- The Operational Data Store/ETL Interface -- The Staging Area -- Changed Data Capture -- Inline Transformation -- ELT Processing -- 6.4 - Data architecture - a high-level perspective -- A High-Level Perspective -- Redundancy -- The System of Record -- Different Communities.

7.1 - Repetitive analytics - some basics -- Different Kinds of Analysis -- Looking for Patterns -- Heuristic Processing -- The Sandbox -- The "Normal" Profile -- Distillation, Filtering -- Subsetting Data -- Filtering Data -- Repetitive Data and Context -- Linking Repetitive Records -- Log Tape Records -- Analyzing Points of Data -- Data Over Time -- 7.2 - Analyzing repetitive data -- Log Data -- Active/Passive Indexing of Data -- Summary/Detailed Data -- Metadata in Big Data -- Linking Data -- 7.3 - Repetitive analysis -- Internal, External Data -- Universal Identifiers -- Security -- Filtering, Distillation -- Archiving Results -- Metrics -- 8.1 - Nonrepetitive data -- Inline Contextualization -- Taxonomy/Ontology Processing -- Custom Variables -- Homographic Resolution -- Acronym Resolution -- Negation Analysis -- Numeric Tagging -- Date Tagging -- Date Standardization -- List Processing -- Associative Word Processing -- Stop Word Processing -- Word Stemming -- Document Metadata -- Document Classification -- Proximity Analysis -- Functional Sequencing within Textual ETL -- Internal Referential Integrity -- Preprocessing, Postprocessing -- 8.2 - Mapping -- 8.3 - Analytics from nonrepetitive data -- Call Center Information -- Medical Records -- 9.1 - Operational analytics -- Transaction Response Time -- 10.1 - Operational analytics -- 11.1 - Personal analytics -- 12.1 - A composite data architecture -- Glossary -- Index.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

No physical items for this record

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

to post a comment.