
Abstract: Scikit-learn is a Python machine learning library used by data science practitioners from many disciplines. We will learn about evaluating, calibrating, and inspecting models during this training. Model evaluation is an essential piece of the ML workflow. We will cover multiple metrics and see how they behave on various combinations of datasets and models. We will explore scikit-learn's plotting API to visualize a model's performance. Next, we will learn how to calibrate a machine learning model with scikit-learn. We will see how models behave before and after calibrating by visualizing an estimator's calibration. Next, we will explore techniques to inspect machine learning models. Specifically, we will see how to examine open-box machine learning models, such as linear models and random forests. Finally, we will learn about inspection techniques that apply to all models. These techniques are flexible because they can be used in any machine learning model and show how it generates predictions.
Session Outline
Module 1: Model Evaluation
We start by training different machine learning models on datasets and evaluating their performance. We compare metrics that need thresholded predictions and other metrics that do not. Then, we use the scikit-learn plotting API to visualize a model's ROC curve and precision-recall curve.
Module 2: Model Calibration
A well-calibrated model predicts probabilities that reflect the true likelihood of an event. We evaluate models before and after calibrating and visualize an estimator's calibration.
Module 3: Model Inspection
We examine open-box machine learning models, such as linear models or random forests. We learn about inspection techniques that apply to all models, such as permutation feature importance and partial dependence curves.
Background Knowledge
Python and basic understanding of scikit-learn's API
Bio: Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.

Thomas Fan
Title
Senior Software Engineer | Quansight Labs
