Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection

Springer Science & Business Media
1
Free sample

Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domains including applied statistics, health informatics, data mining, machine learning, artificial intelligence, database management, and digital libraries, significant advances have been achieved over the last decade in all aspects of the data matching process, especially on how to improve the accuracy of data matching, and its scalability to large databases.

Peter Christen’s book is divided into three parts: Part I, “Overview”, introduces the subject by presenting several sample applications and their special challenges, as well as a general overview of a generic data matching process. Part II, “Steps of the Data Matching Process”, then details its main steps like pre-processing, indexing, field and record comparison, classification, and quality evaluation. Lastly, part III, “Further Topics”, deals with specific aspects like privacy, real-time matching, or matching unstructured data. Finally, it briefly describes the main features of many research and open source systems available today.

By providing the reader with a broad range of data matching concepts and techniques and touching on all aspects of the data matching process, this book helps researchers as well as students specializing in data quality or data matching aspects to familiarize themselves with recent research advances and to identify open research challenges in the area of data matching. To this end, each chapter of the book includes a final section that provides pointers to further background and research material. Practitioners will better understand the current state of the art in data matching as well as the internal workings and limitations of current systems. Especially, they will learn that it is often not feasible to simply implement an existing off-the-shelf data matching system without substantial adaption and customization. Such practical considerations are discussed for each of the major steps in the data matching process.
Read more

About the author

Peter Christen is Senior Lecturer at the Research School of Computer Science at the Australian National University in Canberra, Australia. His research interests are data mining, with a focus on data matching, and privacy-preserving data sharing and mining. He has published over 50 papers in these areas, and he is the principle developer of the `Febrl' (Freely Extensible Biomedical Record Linkage) open source data cleaning, deduplication and record linkage system.
Read more

Reviews

5.0
1 total
Loading...

Additional Information

Publisher
Springer Science & Business Media
Read more
Published on
Jul 4, 2012
Read more
Pages
272
Read more
ISBN
9783642311642
Read more
Language
English
Read more
Genres
Computers / Computer Vision & Pattern Recognition
Computers / Databases / Data Mining
Computers / Databases / General
Computers / Information Technology
Computers / Intelligence (AI) & Semantics
Computers / Optical Data Processing
Computers / System Administration / Storage & Retrieval
Read more
Content Protection
This content is DRM protected.
Read more

Reading information

Smartphones and Tablets

Install the Google Play Books app for Android and iPad/iPhone. It syncs automatically with your account and allows you to read online or offline wherever you are.

Laptops and Computers

You can read books purchased on Google Play using your computer's web browser.

eReaders and other devices

To read on e-ink devices like the Sony eReader or Barnes & Noble Nook, you'll need to download a file and transfer it to your device. Please follow the detailed Help center instructions to transfer the files to supported eReaders.
Gerrit Bloothooft
This book addresses the problems that are encountered, and solutions that have been proposed, when we aim to identify people and to reconstruct populations under conditions where information is scarce, ambiguous, fuzzy and sometimes erroneous.

The process from handwritten registers to a reconstructed digitized population consists of three major phases, reflected in the three main sections of this book. The first phase involves transcribing and digitizing the data while structuring the information in a meaningful and efficient way. In the second phase, records that refer to the same person or group of persons are identified by a process of linkage. In the third and final phase, the information on an individual is combined into a reconstruction of their life course.

The studies and examples in this book originate from a range of countries, each with its own cultural and administrative characteristics, and from medieval charters through historical censuses and vital registration, to the modern issue of privacy preservation. Despite the diverse places and times addressed, they all share the study of fundamental issues when it comes to model reasoning for population reconstruction and the possibilities and limitations of information technology to support this process.

It is thus not a single discipline that is involved in such an endeavor. Historians, social scientists, and linguists represent the humanities through their knowledge of the complexity of the past, the limitations of sources, and the possible interpretations of information. The availability of big data from digitized archives and the need for complex analyses to identify individuals calls for the involvement of computer scientists. With contributions from all these fields, often in direct cooperation, this book is at the heart of the digital humanities, and will hopefully offer a source of inspiration for future investigations.
Aurélien Géron
Peter Christen Asbj¿rnsen
So she rode a long, long way, till they came to a great steep hill. There, on the face of it, the White Bear gave a knock, and a door opened, and they came into a castle where there were many rooms all lit up; rooms gleaming with silver and gold; and there, too, was a table ready laid, and it was all as grand as grand could be. Then the White Beargave her a silver bell; and when she wanted anything, she was only to ring it, and she would get it at once.

Well, after she had eaten and drunk, and evening wore on, she got sleepy after her journey, and thought she would like to go to bed, so she rang the bell; and she had scarce taken hold of it before she came into a chamber where there was a bed made, as fair and white as any one would wish to sleep in, with silken pillows and curtains and gold fringe. All that was in the room was gold or silver; but when she had gone to bed and put out the light, a man came and laid himself alongside her. That was the White Bear, who threw off his beast shape at night; but she never saw him, for he always came after she had put out the light, and before the day dawned he was up and off again. So things went on happily for a while, but at last she began to get silent and sorrowful; for there she went about all day alone, and she longed to go home to see her father and mother and brothers and sisters. So one day, when the White Bear asked what it was that she lacked, she said it was so dull and lonely there, and how she longed to go home to see her father and mother and brothers and sisters, and that was why she was so sad and sorrowful, because she couldnÕt get to them.Ê

©2018 GoogleSite Terms of ServicePrivacyDevelopersArtistsAbout Google
By purchasing this item, you are transacting with Google Payments and agreeing to the Google Payments Terms of Service and Privacy Notice.