Intermediate Machine Learning with Scikit-learn: Evaluation, Calibration, and Inspection


Scikit-learn is a Python machine learning library used by data science practitioners from many disciplines. We will learn about evaluating, calibrating, and inspecting models during this training. Model evaluation is an essential piece of the ML workflow. We will cover multiple metrics and see how they behave on various combinations of datasets and models. We will explore scikit-learn's plotting API to visualize a model's performance. Next, we will learn how to calibrate a machine learning model with scikit-learn. We will see how models behave before and after calibrating by visualizing an estimator's calibration. Next, we will explore techniques to inspect machine learning models. Specifically, we will see how to examine open-box machine learning models, such as linear models and random forests. Finally, we will learn about inspection techniques that apply to all models. These techniques are flexible because they can be used in any machine learning model and show how it generates predictions.

Session Outline
Module 1: Model Evaluation
We start by training different machine learning models on datasets and evaluating their performance. We compare metrics that need thresholded predictions and other metrics that do not. Then, we use the scikit-learn plotting API to visualize a model's ROC curve and precision-recall curve.

Module 2: Model Calibration
A well-calibrated model predicts probabilities that reflect the true likelihood of an event. We evaluate models before and after calibrating and visualize an estimator's calibration.

Module 3: Model Inspection
We examine open-box machine learning models, such as linear models or random forests. We learn about inspection techniques that apply to all models, such as permutation feature importance and partial dependence curves.

Background Knowledge
Python and basic understanding of scikit-learn's API


Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google