Robin Linacre
Robin Linacre is a data scientist at the UK Ministry of Justice and the lead author of Splink, a Python library for record linkage and deduplication at scale
Sessions
12-04
16:30
30min
Rapid deduplication and fuzzy matching of large datasets using Splink
Robin Linacre
Data deduplication is a ubiquitous data quality problem that most data people will encounter at some point in their career. It happens whenever multiple records are collected about the same person or other entity without a unique identifier that ties these records together.
This talk provides beginners with everything they need to start linking and deduping large datasets using Splink, a free Python library.
Data/ Data Science Track
Data/ Data Science Track