ORPP logo
Image from Google Jackets

Big Data Meets Survey Science : A Collection of Innovative Methods.

By: Contributor(s): Material type: TextTextSeries: Wiley Series in Survey Methodology SeriesPublisher: Newark : John Wiley & Sons, Incorporated, 2020Copyright date: ©2020Edition: 1st edDescription: 1 online resource (787 pages)Content type:
  • text
Media type:
  • computer
Carrier type:
  • online resource
ISBN:
  • 9781118976333
Subject(s): Genre/Form: Additional physical formats: Print version:: Big Data Meets Survey ScienceDDC classification:
  • 001.433
LOC classification:
  • QA76.9.B45 .B54 2021
Online resources:
Contents:
Cover -- Title Page -- Copyright -- Contents -- List of Contributors -- Introduction -- Acknowledgments -- References -- Section 1 The New Survey Landscape -- Chapter 1 Why Machines Matter for Survey and Social Science Researchers: Exploring Applications of Machine Learning Methods for Design, Data Collection, and Analysis -- 1.1 Introduction -- 1.2 Overview of Machine Learning Methods and Their Evaluation -- 1.3 Creating Sample Designs and Constructing Sampling Frames Using Machine Learning Methods -- 1.3.1 Sample Design Creation -- 1.3.2 Sample Frame Construction -- 1.3.3 Considerations and Implications for Applying Machine Learning Methods for Creating Sampling Frames and Designs -- 1.3.3.1 Considerations About Algorithmic Optimization -- 1.3.3.2 Implications About Machine Learning Model Error -- 1.3.3.3 Data Type Considerations and Implications About Data Errors -- 1.4 Questionnaire Design and Evaluation Using Machine Learning Methods -- 1.4.1 Question Wording -- 1.4.2 Evaluation and Testing -- 1.4.3 Instrumentation and Interviewer Training -- 1.4.4 Alternative Data Sources -- 1.5 Survey Recruitment and Data Collection Using Machine Learning Methods -- 1.5.1 Monitoring and Interviewer Falsification -- 1.5.2 Responsive and Adaptive Designs -- 1.6 Survey Data Coding and Processing Using Machine Learning Methods -- 1.6.1 Coding Unstructured Text -- 1.6.2 Data Validation and Editing -- 1.6.3 Imputation -- 1.6.4 Record Linkage and Duplicate Detection -- 1.7 Sample Weighting and Survey Adjustments Using Machine Learning Methods -- 1.7.1 Propensity Score Estimation -- 1.7.2 Sample Matching -- 1.8 Survey Data Analysis and Estimation Using Machine Learning Methods -- 1.8.1 Gaining Insights Among Survey Variables -- 1.8.2 Adapting Machine Learning Methods to the Survey Setting.
1.8.3 Leveraging Machine Learning Algorithms for Finite Population Inference -- 1.9 Discussion and Conclusions -- References -- Further Reading -- Chapter 2 The Future Is Now: How Surveys Can Harness Social Media to Address Twenty‐first Century Challenges -- 2.1 Introduction -- 2.2 New Ways of Thinking About Survey Research -- 2.3 The Challenge with … Sampling People -- 2.3.1 The Social Media Opportunities -- 2.3.1.1 Venue‐Based, Time‐Space Sampling -- 2.3.1.2 Respondent‐Driven Sampling -- 2.3.2 Outstanding Challenges -- 2.4 The Challenge with … Identifying People -- 2.4.1 The Social Media Opportunity -- 2.4.2 Outstanding Challenges -- 2.5 The Challenge with … Reaching People -- 2.5.1 The Social Media Opportunities -- 2.5.1.1 Tracing -- 2.5.1.2 Paid Social Media Advertising -- 2.5.2 Outstanding Challenges -- 2.6 The Challenge with … Persuading People to Participate -- 2.6.1 The Social Media Opportunities -- 2.6.1.1 Paid Social Media Advertising -- 2.6.1.2 Online Influencers -- 2.6.2 Outstanding Challenges -- 2.7 The Challenge with … Interviewing People -- 2.7.1 Social Media Opportunities -- 2.7.1.1 Passive Social Media Data Mining -- 2.7.1.2 Active Data Collection -- 2.7.2 Outstanding Challenges -- 2.8 Conclusion -- References -- Chapter 3 Linking Survey Data with Commercial or Administrative Data for Data Quality Assessment -- 3.1 Introduction -- 3.2 Thinking About Quality Features of Analytic Data Sources -- 3.2.1 What Is the Purpose of the Data Linkage? -- 3.2.2 What Kind of Data Linkage for What Analytic Purpose? -- 3.3 Data Used in This Chapter -- 3.3.1 NSECE Household Survey -- 3.3.2 Proprietary Research Files from Zillow -- 3.3.3 Linking the NSECE Household Survey with Zillow Proprietary Datafiles -- 3.3.3.1 Nonuniqueness of Matches -- 3.3.3.2 Misalignment of Units of Observation -- 3.3.3.3 Ability to Identify Matches.
3.3.3.4 Identifying Matches -- 3.3.3.5 Implications of the Linking Process for Intended Analyses -- 3.4 Assessment of Data Quality Using the Linked File -- 3.4.1 What Variables in the Zillow Datafile Are Most Appropriate for Use in Substantive Analyses Linked to Survey Data? -- 3.4.2 How Did Different Steps in the Survey Administration Process Contribute to Representativeness of the NSECE Survey Data? -- 3.4.3 How Well Does the Linked Datafile Represent the Overall NSECE Dataset (Including Unlinked Records)? -- 3.5 Conclusion -- References -- Further Reading -- Section 2 Total Error and Data Quality -- Chapter 4 Total Error Frameworks for Found Data -- 4.1 Introduction -- 4.2 Data Integration and Estimation -- 4.2.1 Source Datasets -- 4.2.2 The Integration Process -- 4.2.3 Unified Dataset -- 4.3 Errors in Datasets -- 4.4 Errors in Hybrid Estimates -- 4.4.1 Error‐Generating Processes -- 4.4.2 Components of Bias, Variance, and Mean Squared Error -- 4.4.3 Illustrations -- 4.4.4 Error Mitigation -- 4.4.4.1 Sample Recruitment Error -- 4.4.4.2 Data Encoding Error -- 4.5 Other Error Frameworks -- 4.6 Summary and Conclusions -- References -- Chapter 5 Measuring the Strength of Attitudes in Social Media Data -- 5.1 Introduction -- 5.2 Methods -- 5.2.1 Data -- 5.2.1.1 European Social Survey Data -- 5.2.1.2 Reddit 2016 Data -- 5.2.1.3 Reddit Survey -- 5.2.1.4 Reddit 2018 Data -- 5.2.2 Analysis -- 5.2.2.1 Missingness -- 5.2.2.2 Measurement -- 5.2.2.3 Coding -- 5.3 Results -- 5.3.1 Overall Comparisons -- 5.3.2 Missingness -- 5.3.3 Measurement -- 5.3.4 Coding -- 5.4 Summary -- 5.B.1 Political Ideology -- 5.A 2016 German ESS Questions Used in Analysis -- 5.B Search Terms Used to Identify Topics in Reddit Posts(2016 and 2018) -- 5.B.1 Political Ideology -- 5.B.2 Interest in Politics -- 5.B.3 Gay Rights -- 5.B.4 EU -- 5.B.5 Immigration -- 5.B.6 Climate.
5.C Example of Coding Steps Used to Identify Topics andAssign Sentiment in Reddit Submissions (2016 and 2018) -- References -- Chapter 6 Attention to Campaign Events: Do Twitter and Self‐Report Metrics Tell the Same Story? -- 6.1 What Can Social Media Tell Us About Social Phenomena? -- 6.2 The Empirical Evidence to Date -- 6.3 Tweets as Public Attention -- 6.4 Data Sources -- 6.5 Event Detection -- 6.6 Did Events Peak at the Same Time Across Data Streams? -- 6.7 Were Event Words Equally Prominent Across Data Streams? -- 6.8 Were Event Terms Similarly Associated with Particular Candidates? -- 6.9 Were Event Trends Similar Across Data Streams? -- 6.10 Unpacking Differences Between Samples -- 6.11 Conclusion -- References -- Chapter 7 Improving Quality of Administrative Data: A Case Study with FBI's National Incident‐Based Reporting System Data -- 7.1 Introduction -- 7.2 The NIBRS Database -- 7.2.1 Administrative Crime Statistics and the History of NIBRS Data -- 7.2.2 Construction of the NIBRS Dataset -- 7.3 Data Quality Improvement Based on the Total Error Framework -- 7.3.1 Data Quality Assessment for Using Row-Column-Cell Framework -- 7.3.1.1 Phase I: Evaluating Each Data Table -- 7.3.1.2 Row Errors -- 7.3.1.3 Column Errors -- 7.3.1.4 Cell Errors -- 7.3.1.5 Row-Column-Cell Errors Impacting NIBRS -- 7.3.1.6 Phase II: Evaluating the Integrated Data -- 7.3.1.7 Errors in Data Integration Process -- 7.3.1.8 Coverage Errors Due to Nonreporting Agencies -- 7.3.1.9 Nonresponse Errors in the Incident Data Table Due to Unreported Incident Reports -- 7.3.1.10 Invalid, Unknown, and Missing Values Within the Incident Reports -- 7.3.2 Improving Data Quality via Sampling, Weighting, and Imputation -- 7.3.2.1 Sample‐Based Method to Improve Data Representativeness at the Agency Level -- 7.3.2.2 Statistical Weighting to Adjust for Coverage Errors at the Agency Level.
7.3.2.3 Imputation to Compensate for Unreported Incidents and Missing Values in the Incident Reports -- 7.4 Utilizing External Data Sources in Improving Data Quality of the Administrative Data -- 7.4.1 Understanding the External Data Sources -- 7.4.1.1 Data Quality Assessment of External Data Sources -- 7.4.1.2 Producing Population Counts at the Agency Level Through Auxiliary Data -- 7.4.2 Administrative vs. Survey Data for Crime Statistics -- 7.4.3 A Pilot Study on Crime in the Bakken Region -- 7.5 Summary and Future Work -- References -- Chapter 8 Performance and Sensitivities of Home Detection on Mobile Phone Data -- 8.1 Introduction -- 8.1.1 Mobile Phone Data and Official Statistics -- 8.1.2 The Home Detection Problem -- 8.2 Deploying Home Detection Algorithms to a French CDR Dataset -- 8.2.1 Mobile Phone Data -- 8.2.2 The French Mobile Phone Dataset -- 8.2.3 Defining Nine Home Detection Algorithms -- 8.2.4 Different Observation Periods -- 8.2.5 Summary of Data and Setup -- 8.3 Assessing Home Detection Performance at Nationwide Scale -- 8.3.1 Ground Truth Data -- 8.3.2 Assessing Performance and Sensitivities -- 8.3.2.1 Correlation with Ground Truth Data -- 8.3.2.2 Ratio and Spatial Patterns -- 8.3.2.3 Temporality and Sensitivity -- 8.4 Results -- 8.4.1 Relations between HDAs' User Counts and Ground Truth -- 8.4.2 Spatial Patterns of Ratios Between User Counts and Population Counts -- 8.4.3 Temporality of Correlations -- 8.4.4 Sensitivity to the Duration of Observation -- 8.4.5 Sensitivity to Criteria Choice -- 8.5 Discussion and Conclusion -- References -- Section 3 Big Data in Official Statistics -- Chapter 9 Big Data Initiatives in Official Statistics -- 9.1 Introduction -- 9.2 Some Characteristics of the Changing Survey Landscape -- 9.3 Current Strategies to Handle the Changing Survey Landscape -- 9.3.1 Training Staff.
9.3.2 Forming Partnerships.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
No physical items for this record

Cover -- Title Page -- Copyright -- Contents -- List of Contributors -- Introduction -- Acknowledgments -- References -- Section 1 The New Survey Landscape -- Chapter 1 Why Machines Matter for Survey and Social Science Researchers: Exploring Applications of Machine Learning Methods for Design, Data Collection, and Analysis -- 1.1 Introduction -- 1.2 Overview of Machine Learning Methods and Their Evaluation -- 1.3 Creating Sample Designs and Constructing Sampling Frames Using Machine Learning Methods -- 1.3.1 Sample Design Creation -- 1.3.2 Sample Frame Construction -- 1.3.3 Considerations and Implications for Applying Machine Learning Methods for Creating Sampling Frames and Designs -- 1.3.3.1 Considerations About Algorithmic Optimization -- 1.3.3.2 Implications About Machine Learning Model Error -- 1.3.3.3 Data Type Considerations and Implications About Data Errors -- 1.4 Questionnaire Design and Evaluation Using Machine Learning Methods -- 1.4.1 Question Wording -- 1.4.2 Evaluation and Testing -- 1.4.3 Instrumentation and Interviewer Training -- 1.4.4 Alternative Data Sources -- 1.5 Survey Recruitment and Data Collection Using Machine Learning Methods -- 1.5.1 Monitoring and Interviewer Falsification -- 1.5.2 Responsive and Adaptive Designs -- 1.6 Survey Data Coding and Processing Using Machine Learning Methods -- 1.6.1 Coding Unstructured Text -- 1.6.2 Data Validation and Editing -- 1.6.3 Imputation -- 1.6.4 Record Linkage and Duplicate Detection -- 1.7 Sample Weighting and Survey Adjustments Using Machine Learning Methods -- 1.7.1 Propensity Score Estimation -- 1.7.2 Sample Matching -- 1.8 Survey Data Analysis and Estimation Using Machine Learning Methods -- 1.8.1 Gaining Insights Among Survey Variables -- 1.8.2 Adapting Machine Learning Methods to the Survey Setting.

1.8.3 Leveraging Machine Learning Algorithms for Finite Population Inference -- 1.9 Discussion and Conclusions -- References -- Further Reading -- Chapter 2 The Future Is Now: How Surveys Can Harness Social Media to Address Twenty‐first Century Challenges -- 2.1 Introduction -- 2.2 New Ways of Thinking About Survey Research -- 2.3 The Challenge with … Sampling People -- 2.3.1 The Social Media Opportunities -- 2.3.1.1 Venue‐Based, Time‐Space Sampling -- 2.3.1.2 Respondent‐Driven Sampling -- 2.3.2 Outstanding Challenges -- 2.4 The Challenge with … Identifying People -- 2.4.1 The Social Media Opportunity -- 2.4.2 Outstanding Challenges -- 2.5 The Challenge with … Reaching People -- 2.5.1 The Social Media Opportunities -- 2.5.1.1 Tracing -- 2.5.1.2 Paid Social Media Advertising -- 2.5.2 Outstanding Challenges -- 2.6 The Challenge with … Persuading People to Participate -- 2.6.1 The Social Media Opportunities -- 2.6.1.1 Paid Social Media Advertising -- 2.6.1.2 Online Influencers -- 2.6.2 Outstanding Challenges -- 2.7 The Challenge with … Interviewing People -- 2.7.1 Social Media Opportunities -- 2.7.1.1 Passive Social Media Data Mining -- 2.7.1.2 Active Data Collection -- 2.7.2 Outstanding Challenges -- 2.8 Conclusion -- References -- Chapter 3 Linking Survey Data with Commercial or Administrative Data for Data Quality Assessment -- 3.1 Introduction -- 3.2 Thinking About Quality Features of Analytic Data Sources -- 3.2.1 What Is the Purpose of the Data Linkage? -- 3.2.2 What Kind of Data Linkage for What Analytic Purpose? -- 3.3 Data Used in This Chapter -- 3.3.1 NSECE Household Survey -- 3.3.2 Proprietary Research Files from Zillow -- 3.3.3 Linking the NSECE Household Survey with Zillow Proprietary Datafiles -- 3.3.3.1 Nonuniqueness of Matches -- 3.3.3.2 Misalignment of Units of Observation -- 3.3.3.3 Ability to Identify Matches.

3.3.3.4 Identifying Matches -- 3.3.3.5 Implications of the Linking Process for Intended Analyses -- 3.4 Assessment of Data Quality Using the Linked File -- 3.4.1 What Variables in the Zillow Datafile Are Most Appropriate for Use in Substantive Analyses Linked to Survey Data? -- 3.4.2 How Did Different Steps in the Survey Administration Process Contribute to Representativeness of the NSECE Survey Data? -- 3.4.3 How Well Does the Linked Datafile Represent the Overall NSECE Dataset (Including Unlinked Records)? -- 3.5 Conclusion -- References -- Further Reading -- Section 2 Total Error and Data Quality -- Chapter 4 Total Error Frameworks for Found Data -- 4.1 Introduction -- 4.2 Data Integration and Estimation -- 4.2.1 Source Datasets -- 4.2.2 The Integration Process -- 4.2.3 Unified Dataset -- 4.3 Errors in Datasets -- 4.4 Errors in Hybrid Estimates -- 4.4.1 Error‐Generating Processes -- 4.4.2 Components of Bias, Variance, and Mean Squared Error -- 4.4.3 Illustrations -- 4.4.4 Error Mitigation -- 4.4.4.1 Sample Recruitment Error -- 4.4.4.2 Data Encoding Error -- 4.5 Other Error Frameworks -- 4.6 Summary and Conclusions -- References -- Chapter 5 Measuring the Strength of Attitudes in Social Media Data -- 5.1 Introduction -- 5.2 Methods -- 5.2.1 Data -- 5.2.1.1 European Social Survey Data -- 5.2.1.2 Reddit 2016 Data -- 5.2.1.3 Reddit Survey -- 5.2.1.4 Reddit 2018 Data -- 5.2.2 Analysis -- 5.2.2.1 Missingness -- 5.2.2.2 Measurement -- 5.2.2.3 Coding -- 5.3 Results -- 5.3.1 Overall Comparisons -- 5.3.2 Missingness -- 5.3.3 Measurement -- 5.3.4 Coding -- 5.4 Summary -- 5.B.1 Political Ideology -- 5.A 2016 German ESS Questions Used in Analysis -- 5.B Search Terms Used to Identify Topics in Reddit Posts(2016 and 2018) -- 5.B.1 Political Ideology -- 5.B.2 Interest in Politics -- 5.B.3 Gay Rights -- 5.B.4 EU -- 5.B.5 Immigration -- 5.B.6 Climate.

5.C Example of Coding Steps Used to Identify Topics andAssign Sentiment in Reddit Submissions (2016 and 2018) -- References -- Chapter 6 Attention to Campaign Events: Do Twitter and Self‐Report Metrics Tell the Same Story? -- 6.1 What Can Social Media Tell Us About Social Phenomena? -- 6.2 The Empirical Evidence to Date -- 6.3 Tweets as Public Attention -- 6.4 Data Sources -- 6.5 Event Detection -- 6.6 Did Events Peak at the Same Time Across Data Streams? -- 6.7 Were Event Words Equally Prominent Across Data Streams? -- 6.8 Were Event Terms Similarly Associated with Particular Candidates? -- 6.9 Were Event Trends Similar Across Data Streams? -- 6.10 Unpacking Differences Between Samples -- 6.11 Conclusion -- References -- Chapter 7 Improving Quality of Administrative Data: A Case Study with FBI's National Incident‐Based Reporting System Data -- 7.1 Introduction -- 7.2 The NIBRS Database -- 7.2.1 Administrative Crime Statistics and the History of NIBRS Data -- 7.2.2 Construction of the NIBRS Dataset -- 7.3 Data Quality Improvement Based on the Total Error Framework -- 7.3.1 Data Quality Assessment for Using Row-Column-Cell Framework -- 7.3.1.1 Phase I: Evaluating Each Data Table -- 7.3.1.2 Row Errors -- 7.3.1.3 Column Errors -- 7.3.1.4 Cell Errors -- 7.3.1.5 Row-Column-Cell Errors Impacting NIBRS -- 7.3.1.6 Phase II: Evaluating the Integrated Data -- 7.3.1.7 Errors in Data Integration Process -- 7.3.1.8 Coverage Errors Due to Nonreporting Agencies -- 7.3.1.9 Nonresponse Errors in the Incident Data Table Due to Unreported Incident Reports -- 7.3.1.10 Invalid, Unknown, and Missing Values Within the Incident Reports -- 7.3.2 Improving Data Quality via Sampling, Weighting, and Imputation -- 7.3.2.1 Sample‐Based Method to Improve Data Representativeness at the Agency Level -- 7.3.2.2 Statistical Weighting to Adjust for Coverage Errors at the Agency Level.

7.3.2.3 Imputation to Compensate for Unreported Incidents and Missing Values in the Incident Reports -- 7.4 Utilizing External Data Sources in Improving Data Quality of the Administrative Data -- 7.4.1 Understanding the External Data Sources -- 7.4.1.1 Data Quality Assessment of External Data Sources -- 7.4.1.2 Producing Population Counts at the Agency Level Through Auxiliary Data -- 7.4.2 Administrative vs. Survey Data for Crime Statistics -- 7.4.3 A Pilot Study on Crime in the Bakken Region -- 7.5 Summary and Future Work -- References -- Chapter 8 Performance and Sensitivities of Home Detection on Mobile Phone Data -- 8.1 Introduction -- 8.1.1 Mobile Phone Data and Official Statistics -- 8.1.2 The Home Detection Problem -- 8.2 Deploying Home Detection Algorithms to a French CDR Dataset -- 8.2.1 Mobile Phone Data -- 8.2.2 The French Mobile Phone Dataset -- 8.2.3 Defining Nine Home Detection Algorithms -- 8.2.4 Different Observation Periods -- 8.2.5 Summary of Data and Setup -- 8.3 Assessing Home Detection Performance at Nationwide Scale -- 8.3.1 Ground Truth Data -- 8.3.2 Assessing Performance and Sensitivities -- 8.3.2.1 Correlation with Ground Truth Data -- 8.3.2.2 Ratio and Spatial Patterns -- 8.3.2.3 Temporality and Sensitivity -- 8.4 Results -- 8.4.1 Relations between HDAs' User Counts and Ground Truth -- 8.4.2 Spatial Patterns of Ratios Between User Counts and Population Counts -- 8.4.3 Temporality of Correlations -- 8.4.4 Sensitivity to the Duration of Observation -- 8.4.5 Sensitivity to Criteria Choice -- 8.5 Discussion and Conclusion -- References -- Section 3 Big Data in Official Statistics -- Chapter 9 Big Data Initiatives in Official Statistics -- 9.1 Introduction -- 9.2 Some Characteristics of the Changing Survey Landscape -- 9.3 Current Strategies to Handle the Changing Survey Landscape -- 9.3.1 Training Staff.

9.3.2 Forming Partnerships.

Description based on publisher supplied metadata and other sources.

Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.

There are no comments on this title.

to post a comment.

© 2024 Resource Centre. All rights reserved.