What Really Matters in Evaluating Machine Learning Models: Swap-Ins / Swap-Outs and How to Use Them


How should machine learning models be evaluated? Specifically, if you have an existing model, need to decide whether to supplant it with a new version, how do you do that?

The most common approach is to compare the two models on a standard suite of metrics, such as F1 score, ROC-AUC, or perplexity. In this talk, I'll discuss why this approach is incomplete, and discuss a different approach for comparing models that SentiLink uses before pushing new models to production: specifically, by manually looking at the ""swap ins and swap outs"", or the cases where one model does especially poorly and the other model especially well.

I'll walk through some real world examples of how SentiLink uses this approach to evaluate models. I'll also give a concrete illustration of using this approach to compare a ""cutting edge"" deep learning model to a more standard deep learning model on a popular NLP dataset, complete with code for attendees to take away.


Seth Weidman is a data scientist at SentiLink, an Andreesen Horowitz-backed startup based in San Francisco; he works on SentiLink's core models that prevent various form of fraud - especially synthetic identity fraud - and other malicious behavior for banks and lenders. Immediately before SentiLink, Seth did machine learning engineering at Facebook for the data centers team; he also wrote an introductory book on deep learning called Deep Learning from Scratch that was published by O’Reilly in 2019. Seth has degrees in mathematics and economics from the University of Chicago.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google