
Abstract: One method for approximate ("fuzzy") matching two strings is to compute the Levenshtein distance between the strings and accept a suitably low-valued result. An indexing technique that allows for this type of comparison in a time-sensitive manner is called Deletion Neighborhoods.
In this talk, we review string-oriented Deletion Neighborhoods and present a novel application of them where a similar technique may be applied to entire dataset records. Careful application of both string- and record-oriented indexing techniques allows for powerful searching and record deduplication capabilities.
Bio: Dan has been with LexisNexis Risk Solutions Group since 2014 and is an Enterprise Architect in the Solutions Lab Group. He has worked for Apple as well as Dun & Bradstreet, and he ran his own custom programming shop for a decade. He's been writing software professionally for more than 40 years and has worked on a myriad of systems, using many different programming languages.

Dan S. Camper
Title
Thaumaturge | HPCC Systems Solutions Lab
