Methodological Developments in Data Linkage.
Material type:
- text
- computer
- online resource
- 9781119072485
Intro -- Title Page -- Wiley Series in Probability and Statistics -- Copyright Page -- Contents -- Foreword -- Contributors -- Chapter 1 Introduction -- 1.1 Introduction: data linkage as it exists -- 1.2 Background and issues -- 1.3 Data linkage methods -- 1.3.1 Deterministic linkage -- 1.3.2 Probabilistic linkage -- 1.3.3 Data preparation -- 1.4 Linkage error -- 1.5 Impact of linkage error on analysis of linked data -- 1.6 Data linkage: the future -- Chapter 2 Probabilistic linkage -- 2.1 Introduction -- 2.2 Overview of methods -- 2.2.1 The Fellegi-Sunter model of record linkage -- 2.2.2 Learning parameters -- 2.2.3 Additional methods for matching -- 2.2.4 An empirical example -- 2.3 Data preparation -- 2.3.1 Description of a matching project -- 2.3.2 Initial file preparation -- 2.3.3 Name standardisation and parsing -- 2.3.4 Address standardisation and parsing -- 2.3.5 Summarising comments on preprocessing -- 2.4 Advanced methods -- 2.4.1 Estimating false-match rates without training data -- 2.4.2 Adjusting analyses for linkage error -- 2.5 Concluding comments -- Chapter 3 The data linkage environment -- 3.1 Introduction -- 3.2 The data linkage context -- 3.2.1 Administrative or routine data -- 3.2.2 The law and the use of administrative (personal) data for research -- 3.2.3 The identifiability problem in data linkage -- 3.3 The tools used in the production of functional anonymity through a data linkage environment -- 3.3.1 Governance, rules and the researcher -- 3.3.2 Application process, ethics scrutiny and peer review -- 3.3.3 Shaping 'safe' behaviour: training, sanctions, contracts and licences -- 3.3.4 'Safe' data analysis environments -- 3.3.5 Fragmentation: separation of linkage process and temporary linked data -- 3.4 Models for data access and data linkage -- 3.4.1 Single centre.
3.4.2 Separation of functions: firewalls within single centre -- 3.4.3 Separation of functions: TTP linkage -- 3.4.4 Secure multiparty computation -- 3.5 Four case study data linkage centres -- 3.5.1 Population Data BC -- 3.5.2 The Secure Anonymised Information Linkage Databank, United Kingdom -- 3.5.3 Centre for Data Linkage (Population Health Research Network), Australia -- 3.5.4 The Centre for Health Record Linkage, Australia -- 3.6 Conclusion -- Chapter 4 Bias in data linkage studies -- 4.1 Background -- 4.2 Description of types of linkage error -- 4.2.1 Missed matches from missing linkage variables -- 4.2.2 Missed matches from inconsistent case ascertainment -- 4.2.3 False matches: Description of cases incorrectly matched -- 4.3 How linkage error impacts research findings -- 4.3.1 Results -- 4.3.2 Assessment of linkage bias -- 4.4 Discussion -- 4.4.1 Potential biases in the review process -- 4.4.2 Recommendations and implications for practice -- Chapter 5 Secondary analysis of linked data -- 5.1 Introduction -- 5.2 Measurement error issues arising from linkage -- 5.2.1 Correct links, incorrect links and non-links -- 5.2.2 Characterising linkage errors -- 5.2.3 Characterising errors from non-linkage -- 5.3 Models for different types of linking errors -- 5.3.1 Linkage errors under binary linking -- 5.3.2 Linkage errors under multi-linking -- 5.3.3 Incomplete linking -- 5.3.4 Modelling the linkage error -- 5.4 Regression analysis using complete binary-linked data -- 5.4.1 Linear regression -- 5.4.2 Logistic regression -- 5.5 Regression analysis using incomplete binary-linked data -- 5.5.1 Linear regression using incomplete sample to register linked data -- 5.6 Regression analysis with multi-linked data -- 5.6.1 Uncorrelated multi-linking: Complete linkage -- 5.6.2 Uncorrelated multi-linking: Sample to register linkage.
5.6.3 Correlated multi-linkage -- 5.6.4 Incorporating auxiliary population information -- 5.7 Conclusion and discussion -- Chapter 6 Record linkage: A missing data problem -- 6.1 Introduction -- 6.2 Probabilistic Record Linkage (PRL) -- 6.3 Multiple Imputation (MI) -- 6.4 Prior-Informed Imputation (PII) -- 6.4.1 Estimating matching probabilities -- 6.5 Example 1: Linking electronic healthcare data to estimate trends in bloodstream infection -- 6.5.1 Methods -- 6.5.2 Results -- 6.5.3 Conclusions -- 6.6 Example 2: Simulated data including non-random linkage error -- 6.6.1 Methods -- 6.6.2 Results -- 6.7 Discussion -- 6.7.1 Non-random linkage error -- 6.7.2 Strengths and limitations: Handling linkage error -- 6.7.3 Implications for data linkers and data users -- Acknowledgements -- Appendix A -- The latent normal model -- Normal response -- Ordered categorical data -- Unordered categorical data -- Chapter 7 Using graph databases to manage linked data -- 7.1 Summary -- 7.2 Introduction -- 7.2.1 Flat approach -- 7.2.2 Oops, your legacy is showing -- 7.2.3 Shortcomings -- 7.3 Graph approach -- 7.3.1 Overview of graph concepts -- 7.3.2 Graph queries versus relational queries -- 7.3.3 Comparison of data in flat database versus graph database -- 7.3.4 Relaxing the notion of 'truth' -- 7.3.5 Not a linkage approach per se but a management approach which enables novel linkage approaches -- 7.3.6 Linkage engine independent -- 7.3.7 Separates out linkage from cluster identification phase (and clerical review) -- 7.4 Methodologies -- 7.4.1 Overview of storage and extraction approach -- 7.4.2 Overall management of data as collections -- 7.4.3 Data loading -- 7.4.4 Identification of equivalence sets and deterministic linkage -- 7.4.5 Probabilistic linkage -- 7.4.6 Clerical review -- 7.4.7 Determining cut-off thresholds -- 7.4.8 Final cluster extraction.
7.4.9 Graph partitioning -- 7.4.10 Data management/curation -- 7.4.11 User interface challenges -- 7.4.12 Final cluster extraction -- 7.4.13 A typical end-to-end workflow -- 7.5 Algorithm implementation -- 7.5.1 Graph traversal -- 7.5.2 Cluster identification -- 7.5.3 Partitioning visitor -- 7.5.4 Encapsulating edge following policies -- 7.5.5 Graph partitioning -- 7.5.6 Insertion of review links -- 7.5.7 How to migrate while preserving current clusters -- 7.6 New approaches facilitated by graph storage approach -- 7.6.1 Multiple threshold extraction -- 7.6.2 Possibility of returning graph to end users -- 7.6.3 Optimised cluster analysis -- 7.6.4 Other link types -- 7.7 Conclusion -- Acknowledgements -- Chapter 8 Large-scale linkage for total populations in official statistics -- 8.1 Introduction -- 8.2 Current practice in record linkage for population censuses -- 8.2.1 Introduction -- 8.2.2 Case study: the 2011 England and Wales Census assessment of coverage -- 8.3 Population-level linkage in countries that operate a population register: register-based censuses -- 8.3.1 Introduction -- 8.3.2 Case study 1: Finland -- 8.3.3 Case study 2: The Netherlands Virtual Census -- 8.3.4 Case study 3: Poland -- 8.3.5 Case study 4: Germany -- 8.3.6 Summary -- 8.4 New challenges in record linkage: the Beyond 2011 Programme -- 8.4.1 Introduction -- 8.4.2 Beyond 2011 linkage methodology -- 8.4.3 The anonymisation process in Beyond 2011 -- 8.4.4 Beyond 2011 linkage strategy using pseudonymised data -- 8.4.5 Linkage quality -- 8.4.6 Next steps -- 8.4.7 Conclusion -- 8.5 Summary -- Chapter 9 Privacy-preserving record linkage -- 9.1 Introduction -- 9.2 Chapter outline -- 9.3 Linking with and without personal identification numbers -- 9.3.1 Linking using a trusted third party -- 9.3.2 Linking with encrypted PIDs -- 9.3.3 Linking with encrypted quasi-identifiers.
9.3.4 PPRL in decentralised organisations -- 9.4 PPRL approaches -- 9.4.1 Phonetic codes -- 9.4.2 High-dimensional embeddings -- 9.4.3 Reference tables -- 9.4.4 Secure multiparty computations for PPRL -- 9.4.5 Bloom filter-based PPRL -- 9.5 PPRL for very large databases: blocking -- 9.5.1 Blocking for PPRL with Bloom filters -- 9.5.2 Blocking Bloom filters with MBT -- 9.5.3 Empirical comparison of blocking techniques for Bloom filters -- 9.5.4 Current recommendations for linking very large datasets with Bloom filters -- 9.6 Privacy considerations -- 9.6.1 Probability of attacks -- 9.6.2 Kind of attacks -- 9.6.3 Attacks on Bloom filters -- 9.7 Hardening Bloom filters -- 9.7.1 Randomly selected hash values -- 9.7.2 Random bits -- 9.7.3 Avoiding padding -- 9.7.4 Standardising the length of identifiers -- 9.7.5 Sampling bits for composite Bloom filters -- 9.7.6 Rehashing -- 9.7.7 Salting keys with record-specific data -- 9.7.8 Fake injections -- 9.7.9 Evaluation of Bloom filter hardening procedures -- 9.8 Future research -- 9.9 PPRL research and implementation with national databases -- Acknowledgements -- Chapter 10 Summary -- 10.1 Introduction -- 10.2 Part 1: Data linkage as it exists today -- 10.3 Part 2: Analysis of linked data -- 10.3.1 Quality of identifiers -- 10.3.2 Quality of linkage methods -- 10.3.3 Quality of evaluation -- 10.4 Part 3: Data linkage in practice: new developments -- 10.5 Concluding remarks -- References -- Index -- Series list -- EULA.
Description based on publisher supplied metadata and other sources.
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
There are no comments on this title.