ORPP logo
Image from Google Jackets

Data Mining for the Social Sciences : An Introduction.

By: Contributor(s): Material type: TextTextPublisher: Berkeley : University of California Press, 2015Copyright date: ©2015Edition: 1st edDescription: 1 online resource (265 pages)Content type:
  • text
Media type:
  • computer
Carrier type:
  • online resource
ISBN:
  • 9780520960596
Subject(s): Genre/Form: Additional physical formats: Print version:: Data Mining for the Social SciencesDDC classification:
  • 006.3/12
LOC classification:
  • H61.3 -- .A88 2015eb
Online resources:
Contents:
Cover -- Title -- Copyright -- Contents -- Acknowledgments -- PART 1. CONCEPTS -- 1. What Is Data Mining? -- The Goals of This Book -- Software and Hardware for Data Mining -- Basic Terminology -- 2. Contrasts with the Conventional Statistical Approach -- Predictive Power in Conventional Statistical Modeling -- Hypothesis Testing in the Conventional Approach -- Heteroscedasticity as a Threat to Validity in Conventional Modeling -- The Challenge of Complex and Nonrandom Samples -- Bootstrapping and Permutation Tests -- Nonlinearity in Conventional Predictive Models -- Statistical Interactions in Conventional Models -- Conclusion -- 3. Some General Strategies Used in Data Mining -- Cross-Validation -- Overfitting -- Boosting -- Calibrating -- Measuring Fit: The Confusion Matrix and ROC Curves -- Identifying Statistical Interactions and Effect Heterogeneity in Data Mining -- Bagging and Random Forests -- The Limits of Prediction -- Big Data Is Never Big Enough -- 4. Important Stages in a Data Mining Project -- When to Sample Big Data -- Building a Rich Array of Features -- Feature Selection -- Feature Extraction -- Constructing a Model -- PART 2. WORKED EXAMPLES -- 5. Preparing Training and Test Datasets -- The Logic of Cross-Validation -- Cross-Validation Methods: An Overview -- 6. Variable Selection Tools -- Stepwise Regression -- The LASSO -- VIF Regression -- 7. Creating New Variables Using Binning and Trees -- Discretizing a Continuous Predictor -- Continuous Outcomes and Continuous Predictors -- Binning Categorical Predictors -- Using Partition Trees to Study Interactions -- 8. Extracting Variables -- Principal Component Analysis -- Independent Component Analysis -- 9. Classifiers -- K-Nearest Neighbors -- Naive Bayes -- Support Vector Machines -- Optimizing Prediction across Multiple Classifiers -- 10. Classification Trees -- Partition Trees.
Boosted Trees and Random Forests -- 11. Neural Networks -- 12. Clustering -- Hierarchical Clustering -- K-Means Clustering -- Normal Mixtures -- Self-Organized Maps -- 13. Latent Class Analysis and Mixture Models -- Latent Class Analysis -- Latent Class Regression -- Mixture Models -- 14. Association Rules -- Conclusion -- Bibliography -- Notes -- Index -- A -- B -- C -- D -- E -- F -- G -- H -- I -- J -- K -- L -- M -- N -- O -- P -- R -- S -- T -- U -- V -- W -- X -- Y -- Z.
Summary: We live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Additionally, powerful algorithms are capable of churning through seas of data to uncover patterns. Providing a simple and accessible introduction to data mining, Paul Attewell and David B. Monaghan discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists. The authors also empower social scientists to tap into these new resources and incorporate data mining methodologies in their analytical toolkits. Data Mining for the Social Sciences demystifies the process by describing the diverse set of techniques available, discussing the strengths and weaknesses of various approaches, and giving practical demonstrations of how to carry out analyses using tools in various statistical software packages.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
No physical items for this record

Cover -- Title -- Copyright -- Contents -- Acknowledgments -- PART 1. CONCEPTS -- 1. What Is Data Mining? -- The Goals of This Book -- Software and Hardware for Data Mining -- Basic Terminology -- 2. Contrasts with the Conventional Statistical Approach -- Predictive Power in Conventional Statistical Modeling -- Hypothesis Testing in the Conventional Approach -- Heteroscedasticity as a Threat to Validity in Conventional Modeling -- The Challenge of Complex and Nonrandom Samples -- Bootstrapping and Permutation Tests -- Nonlinearity in Conventional Predictive Models -- Statistical Interactions in Conventional Models -- Conclusion -- 3. Some General Strategies Used in Data Mining -- Cross-Validation -- Overfitting -- Boosting -- Calibrating -- Measuring Fit: The Confusion Matrix and ROC Curves -- Identifying Statistical Interactions and Effect Heterogeneity in Data Mining -- Bagging and Random Forests -- The Limits of Prediction -- Big Data Is Never Big Enough -- 4. Important Stages in a Data Mining Project -- When to Sample Big Data -- Building a Rich Array of Features -- Feature Selection -- Feature Extraction -- Constructing a Model -- PART 2. WORKED EXAMPLES -- 5. Preparing Training and Test Datasets -- The Logic of Cross-Validation -- Cross-Validation Methods: An Overview -- 6. Variable Selection Tools -- Stepwise Regression -- The LASSO -- VIF Regression -- 7. Creating New Variables Using Binning and Trees -- Discretizing a Continuous Predictor -- Continuous Outcomes and Continuous Predictors -- Binning Categorical Predictors -- Using Partition Trees to Study Interactions -- 8. Extracting Variables -- Principal Component Analysis -- Independent Component Analysis -- 9. Classifiers -- K-Nearest Neighbors -- Naive Bayes -- Support Vector Machines -- Optimizing Prediction across Multiple Classifiers -- 10. Classification Trees -- Partition Trees.

Boosted Trees and Random Forests -- 11. Neural Networks -- 12. Clustering -- Hierarchical Clustering -- K-Means Clustering -- Normal Mixtures -- Self-Organized Maps -- 13. Latent Class Analysis and Mixture Models -- Latent Class Analysis -- Latent Class Regression -- Mixture Models -- 14. Association Rules -- Conclusion -- Bibliography -- Notes -- Index -- A -- B -- C -- D -- E -- F -- G -- H -- I -- J -- K -- L -- M -- N -- O -- P -- R -- S -- T -- U -- V -- W -- X -- Y -- Z.

We live in a world of big data: the amount of information collected on human behavior each day is staggering, and exponentially greater than at any time in the past. Additionally, powerful algorithms are capable of churning through seas of data to uncover patterns. Providing a simple and accessible introduction to data mining, Paul Attewell and David B. Monaghan discuss how data mining substantially differs from conventional statistical modeling familiar to most social scientists. The authors also empower social scientists to tap into these new resources and incorporate data mining methodologies in their analytical toolkits. Data Mining for the Social Sciences demystifies the process by describing the diverse set of techniques available, discussing the strengths and weaknesses of various approaches, and giving practical demonstrations of how to carry out analyses using tools in various statistical software packages.

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

to post a comment.

© 2024 Resource Centre. All rights reserved.