Abstract: There are tens of billions of online profiles today, each associated with some identity, on diverse platforms including social networks, online marketplaces, dating sites and financial institutions. Every platform needs to understand, validate and verify these identities.
The landscape of identity challenges, available data, and machine-learning technology have evolved over the years. However, identity still remains a notoriously hard problem. While we’ve made a lot of progress in academia and industry, there still are several unsolved problems. In this session, we will talk through three core, interconnected problems: (1) identity authentication/validation; (2) identity matching; (3) identity verification. We will discuss our work on effectively using machine learning technology to solve these problems, along with an analysis of popular techniques used on different platforms.
Identity authentication and validation ensures high quality attributes, that affect all downstream identity processes. The challenge of identity authentication is determining whether an input identity/attribute is a valid value. While identity validation solutions need to be tailored to the attribute-type, we will share some of the common techniques applicable across all attribute types: (1) canonicalizing attribute values, and then (2) lookup against constructed datasets of the universe of all possible values. We will also discuss how some of these generic techniques are applied to validation of two different types of attributes- names and government-issued IDs.
Identity matching is fundamental for two main applications: detecting duplicates and joining with other, often external, data sources to create a richer identity. We will describe the typical identity matching pipeline which is composed of 4 steps: (1) extraction of relevant attributes from structured and unstructured sources, (2) iterative identity enrichment of the input, (3) fuzzy matching of attribute pairs, (4) building a model to compute a match confidence using similarity and uniqueness.
Identity verification is the process of confirming that a online/digital identity accurately reflects the offline identity of the person who created the online identity. The key insight we will dive deep into is verification of one piece of the online identity, and then applying coherence across various identity attributes to verify all other attributes of the online identity.
This session is geared towards product, data science, and engineering leaders who would like to introduce state-of-the-art machine-learning techniques to solve identity problems at their respective companies or fortify their existing solutions. Some familiarity with machine learning techniques is preferred, but not required.
Bio: Liren Peng is a Software Engineer on the Trust team at Airbnb. He is responsible for the architecture and development of user identity verification systems. He also works on the utilization of third party data and vendor integration. Prior to Airbnb, Liren worked at Trooly, a startup that built machine learning based trust models using both social media data and proprietary data to access the trustworthiness of individuals. He received B.S. from Carnegie Mellon University and M.Sc from Stanford University focusing data analytics.