Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection

Springer Science & Business Media
1
Free sample

Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domains including applied statistics, health informatics, data mining, machine learning, artificial intelligence, database management, and digital libraries, significant advances have been achieved over the last decade in all aspects of the data matching process, especially on how to improve the accuracy of data matching, and its scalability to large databases.

Peter Christen’s book is divided into three parts: Part I, “Overview”, introduces the subject by presenting several sample applications and their special challenges, as well as a general overview of a generic data matching process. Part II, “Steps of the Data Matching Process”, then details its main steps like pre-processing, indexing, field and record comparison, classification, and quality evaluation. Lastly, part III, “Further Topics”, deals with specific aspects like privacy, real-time matching, or matching unstructured data. Finally, it briefly describes the main features of many research and open source systems available today.

By providing the reader with a broad range of data matching concepts and techniques and touching on all aspects of the data matching process, this book helps researchers as well as students specializing in data quality or data matching aspects to familiarize themselves with recent research advances and to identify open research challenges in the area of data matching. To this end, each chapter of the book includes a final section that provides pointers to further background and research material. Practitioners will better understand the current state of the art in data matching as well as the internal workings and limitations of current systems. Especially, they will learn that it is often not feasible to simply implement an existing off-the-shelf data matching system without substantial adaption and customization. Such practical considerations are discussed for each of the major steps in the data matching process.
Read more

About the author

Peter Christen is Senior Lecturer at the Research School of Computer Science at the Australian National University in Canberra, Australia. His research interests are data mining, with a focus on data matching, and privacy-preserving data sharing and mining. He has published over 50 papers in these areas, and he is the principle developer of the `Febrl' (Freely Extensible Biomedical Record Linkage) open source data cleaning, deduplication and record linkage system.
Read more
5.0
1 total
Loading...

Additional Information

Publisher
Springer Science & Business Media
Read more
Published on
Jul 4, 2012
Read more
Pages
272
Read more
ISBN
9783642311642
Read more
Language
English
Read more
Genres
Computers / Computer Vision & Pattern Recognition
Computers / Databases / Data Mining
Computers / Databases / General
Computers / Information Technology
Computers / Intelligence (AI) & Semantics
Computers / Optical Data Processing
Computers / System Administration / Storage & Retrieval
Read more
Content Protection
This content is DRM protected.
Read more

Reading information

Smartphones and Tablets

Install the Google Play Books app for Android and iPad/iPhone. It syncs automatically with your account and allows you to read online or offline wherever you are.

Laptops and Computers

You can read books purchased on Google Play using your computer's web browser.

eReaders and other devices

To read on e-ink devices like the Sony eReader or Barnes & Noble Nook, you'll need to download a file and transfer it to your device. Please follow the detailed Help center instructions to transfer the files to supported eReaders.
This book addresses the problems that are encountered, and solutions that have been proposed, when we aim to identify people and to reconstruct populations under conditions where information is scarce, ambiguous, fuzzy and sometimes erroneous.

The process from handwritten registers to a reconstructed digitized population consists of three major phases, reflected in the three main sections of this book. The first phase involves transcribing and digitizing the data while structuring the information in a meaningful and efficient way. In the second phase, records that refer to the same person or group of persons are identified by a process of linkage. In the third and final phase, the information on an individual is combined into a reconstruction of their life course.

The studies and examples in this book originate from a range of countries, each with its own cultural and administrative characteristics, and from medieval charters through historical censuses and vital registration, to the modern issue of privacy preservation. Despite the diverse places and times addressed, they all share the study of fundamental issues when it comes to model reasoning for population reconstruction and the possibilities and limitations of information technology to support this process.

It is thus not a single discipline that is involved in such an endeavor. Historians, social scientists, and linguists represent the humanities through their knowledge of the complexity of the past, the limitations of sources, and the possible interpretations of information. The availability of big data from digitized archives and the need for complex analyses to identify individuals calls for the involvement of computer scientists. With contributions from all these fields, often in direct cooperation, this book is at the heart of the digital humanities, and will hopefully offer a source of inspiration for future investigations.
©2018 GoogleSite Terms of ServicePrivacyDevelopersArtistsAbout Google|Location: United StatesLanguage: English (United States)
By purchasing this item, you are transacting with Google Payments and agreeing to the Google Payments Terms of Service and Privacy Notice.