Fortino, Andres.

Data Mining and Predictive Analytics for Business Decisions : A Case Study Approach. - 1st ed. - 1 online resource (291 pages)

Cover -- Title Page -- Copyright -- Dedication -- Contents -- Preface -- Acknowledgments -- Chapter 1: Data Mining and Business -- Data Mining Algorithms and Activities -- Data is the New Oil -- Data-Driven Decision-Making -- Business Analytics and Business Intelligence -- Algorithmic Technologies Associated with Data Mining -- Data Mining and Data Warehousing -- Case Study 1.1: Business Applications of Data Mining -- Case A - Classification -- Case B - Regression -- Case C - Anomaly Detection -- Case D - Time Series -- Case E - Clustering -- Reference -- Chapter 2: The Data Mining Process -- Data Mining as a Process -- Exploration -- Analysis -- Interpretation -- Exploitation -- Selecting a Data Mining Process -- The CRISP-DM Process Model -- Business Understanding -- Data Understanding -- Data Preparation -- Modeling -- Evaluation -- Deployment -- Selecting Data Analytics Languages -- The Choices for Languages -- References -- Chapter 3: Framing Analytical Questions -- How Does CRISP-DM Define the Business and Data Understanding Step? -- The World of the Business Data Analyst -- How Does Data Analysis Relate to Business Decision-Making? -- How Do We Frame Analytical Questions? -- What Are the Characteristics of Well-framed Analytical Questions? -- Exercise 3.1 - Framed Questions About the Titanic Disaster -- Case Study 3.1 - The San Francisco Airport Survey -- Case Study 3.2 - Small Business Administration Loans -- References -- Chapter 4: Data Preparation -- How Does CRISP-DM Define Data Preparation? -- Steps in Preparing the Data Set for Analysis -- Data Sources and Formats -- What is Data Shaping? -- The Flat-File Format -- Application of Tools for Data Acquisition and Preparation -- Exercise 4.1 - Shaping the Data File -- Exercise 4.2 - Cleaning the Data File -- Ensuring the Right Variables are Included. Using SQL to Extract the Right Data Set from Data Warehouses -- Case Study 4.1: Cleaning and Shaping the SFO Survey Data Set -- Case Study 4.2: Shaping the SBA Loans Data Set -- Case Study 4.3: Additional SQL Queries -- Reference -- Chapter 5: Descriptive Analysis -- Getting a Sense of the Data Set -- Describe the Data Set -- Explore the Data Set -- Verify the Quality of the Data Set -- Analysis Techniques to Describe the Variables -- Exercise 5.1 - Descriptive Statistics -- Distributions of Numeric Variables -- Correlation -- Exercise 5.2 - Descriptive Analysis of the Titanic Disaster Data -- Case Study 5.1: Describing the SFO Survey Data Set -- Solution Using R -- Solution Using Python -- Case Study 5.2: Describing the SBA Loans Data Set -- Solution Using R -- Solution Using Python -- Reference -- Chapter 6: Modeling -- What is a Model? -- How Does CRISP-DM Define Modeling? -- Selecting the Modeling Technique -- Modeling Assumptions -- Generate Test Design -- Design of Model Testing -- Build the Model -- Parameter Setting -- Models -- Model Assessment -- Where Do Models Reside in a Computer? -- The Data Mining Engine -- The Model -- Data Sources and Outputs -- Traditional Data Sources -- Static Data Sources -- Real-Time Data Sources -- Analytic Outputs -- Model Building -- Step 1: Framing Questions -- Step 2: Selecting the Machine -- Step 3: Selecting Known Data -- Step 4: Training the Machine -- Step 5: Testing the Model -- Step 6: Deploying the Model -- Step 7: Collecting New Data -- Step 8: Updating the Model -- Step 9: Learning - Repeat Steps 7 and 8 -- Step 10: Recommending Answers to the User -- Reference -- Chapter 7: Predictive Analytics with Regression Models -- What is Supervised Learning? -- Regression to the Mean -- Linear Regression -- Simple Linear Regression -- The R-squared Coefficient -- The Use of the p-value of the Coefficients. Strength of the Correlation Between Two Variables -- Exercise 7.1 - Using SLR Analysis to Understand Franchise Advertising -- Multivariate Linear Regression -- Preparing to Build the Multivariate Model -- Exercise 7.2 - Using Multivariate Linear Regression to Model Franchise Sales -- Logistic Regression -- What is Logistic Regression? -- Exercise 7.3 - PassClass Case Study -- Multivariate Logistic Regression -- Exercise 7.4 - MLR Used to Analyze the Results of a Database Marketing Initiative -- Where is Logistic Regression Used? -- Comparing Linear and Logistic Regressions for Binary Outcomes -- Case Study 7.1: Linear Regression Using the SFO Survey Data Set -- Solution in R -- Solution in Python -- Case Study 7.2: Linear Regression Using the SBA Loans Data Set -- Solution in R -- Solution in Python -- Case Study 7.3: Logistic Regression Using the SFO Survey Data Set -- Solution in R -- Solution in Python -- Case Study 7.4: Logistic Regression Using the SBA Loans Data Set -- Solution in R -- Solution in Python -- Chapter 8: Classification -- Classification with Decision Trees -- Building a Decision Tree -- Exercise 8.1 - The Iris Data Set -- The Problem with Decision Trees -- Classification with Random Forest -- Using a Random Forest Model -- Exercise 8.2 - The Iris Data Set -- Classification with Naïve Bayes -- Exercise 8.3 - The HIKING Data Set -- Computing the Conditional Probabilities -- Case Study 8.1: Classification with the SFO Survey Data Set -- Solution in R -- Solution in Python -- Case Study 8.2: Classification with the SBA Loans Data Set -- Solution in R -- Solution in Python -- Case Study 8.3: Classification with the Florence Nightingale Data Set -- Solution in Python -- Reference -- Chapter 9: Clustering -- What is Unsupervised Machine Learning? -- What is Clustering Analysis? -- Applying Clustering to Old Faithful Eruptions. Examples of Applications of Clustering Analysis -- A Simple Clustering Example Using Regression -- Hierarchical Clustering -- Applying Hierarchical Clustering to Old Faithful Eruptions -- Exercise 9.1 - Hierarchical Clustering and the Iris Data Set -- K-Means Clustering -- How Does the K-Means Algorithm Compute Cluster Centroids? -- Applying K-Means Clustering to Old Faithful Eruptions -- Exercise 9.2 - K-Means Clustering and the Iris Data Set -- Hierarchical vs. K-Means Clustering -- Case Study 9.1: Clustering with the SFO Survey Data Set -- Solution in R -- Solution in Python -- Case Study 9.2: Clustering with the SBA Loans Data Set -- Solution in R -- Solution in Python -- Chapter 10: Time Series Forecasting -- What is a Time Series? -- Time Series Analysis -- Types of Time Series Analysis -- What is Forecasting? -- Exercise 10.1 - Analysis of the US and China GDP Data Set -- Case Studies -- Case Study 10.1: Time Series Analysis of the SFO Survey Data Set -- Solution in Excel -- Case Study 10.2: Time Series Analysis of the SBA Loans Data set -- Solution in R -- Solution in Python -- Case Study 10.3: Time Series Analysis of a Nest Data Set -- Solution in Python -- Reference -- Chapter 11: Feature Selection -- Using the Covariance Matrix -- Factor Analysis -- When to Use Factor Analysis -- First Step in FA - Correlation -- FA for Exploratory Analysis -- Selecting the Number of Factors - The Scree Plot -- Example 11.1: Restaurant Feedback -- Factor Interpretation -- Summary Activities to Perform a Factor Analysis -- Case Study 11.1: Variable Reduction with the SFO Survey Data Set -- Solution in R -- Solution in Python -- Case Study 11.2: Hunting Diamonds -- Solution in R -- Solution in Python -- Chapter 12: Anomaly Detection -- What is an Anomaly? -- What is an Outlier? -- The Case Studies for the Exercises in Anomaly Detection. Anomaly Detection by Standardization - A Single Numerical Variable -- Exercise 12.1 - Outliers in the Airline Delays Data Set - Z-Score -- Anomaly Detection by Quartiles - Tukey Fences - With a Single Variable -- Comparing Z-scores and Tukey Fences -- Exercise 12.2 - Outliers in the Airline Delays Data Set - Tukey Fences -- Anomaly Detection by Category - A Single Variable -- Exercise 12.3 - Outliers in the Airline Delays Data Set - Categorical -- Anomaly Detection by Clustering - Multiple Variables -- Exercise 12.4 - Outliers in the Airline Delays Data Set - Clustering -- Anomaly Detection Using Linear Regression by Residuals - Multiple Variables -- Exercise 12.5 - Outliers in the Airline Delays Data Set - Residuals -- Case Study 12.1: Outliers in the SFO Survey Data Set -- Solution in R -- Solution in Python -- Case Study 12.2: Outliers in the SBA Loans Data Set -- Solution in R -- Solution in Python -- References -- Chapter 13: Text Data Mining -- What is Text Data Mining? -- What are Some Examples of Text-Based Analytical Questions? -- Tools for Text Data Mining -- Sources and Formats of Text Data -- Term Frequency Analysis -- How Does It Apply to Text Business Data Analysis? -- Exercise 13.1 - Case Study Using a Training Survey Data Set -- Word Frequency Analysis Using R -- Keyword Analysis -- Exercise 13.2 - Case Study Using Data Set D: Résumé and Job Description -- Keyword Word Analysis in Voyant -- Term Frequency Analysis in R -- Visualizing Text Data -- Exercise 13.3 - Case Study Using the Training Survey Data Set -- Visualizing the Text Using Excel -- Visualizing the Text Using Voyant -- Visualizing the Text Using R -- Text Similarity Scoring -- What is Text Similarity Scoring? -- Exercise 13.4 - Case Study Using the Occupation Description Data Set -- Analysis Using an Online Text Similarity Scoring Tool. Similarity Scoring Analysis Using R.

ISBN: 9781683926740

Index Terms--Genre/Form:
Electronic books.