Abstract: As machine learning models have become more present in our lives, there has been increasing attention on the reliability of these models. A major component of this is understanding the how uncertain the model is about its prediction. No model is exactly right 100% of the time, so we need methods and approaches by which we can quantify the level of uncertainty around a prediction.
Approaches to uncertainty quantification (UQ) vary, and depend on the type of problem. For classification problems, the primary approach is probability calibration: making sure that the model outputs corresponding to each class ""behaves well"" as a probability. For regression problems, there are several different approaches. One can configure models to output an interval, rather than a single point prediction, along with a ""coverage"" value that specifies the probability that the interval covers the true value. The framework of Conformal Prediction provides theoretical guarantees around such interval predictions. Or one can use methods that output an entire conditional density for y given X. This is called probabilistic regression, or conditional density estimation. Several parametric and non-parametric approaches exist for this problem including PrestoBoost, Coarsage, and NGBoost.
This workshop will provide the theoretical context for these methods and then dive into real-world examples af their applications using Jupyter notebooks.
Introduction: Why and How?
We will begin with an overview of why it is important to quantify the uncertainty around a model's predictions. We will discuss how UQ differs in classification problems versus regression problems, and introduce the various approaches.
Lesson 1: Probability Calibration
For classification problems, UQ primarily revolves around assigning valid probabilities to each of the possible classes. We'll discuss various approaches including Platt scaling, isotonic regression, beta calibration, spline calibration.
Lesson 2: Conformal Prediction
For regression problems, one approach to UQ is to output intervals rather than points. We will show how the conformal prediction (CP) framework permits a simple approach to intervals prediction with theoretical guarantees. We will also show the limitations of standard approaches to CP, and more advanced attempts to overcome them.
Lesson 3: Probabilistic Regression
Another approach to UQ is to create models that output an entire density function for the value of y given X. We will demonstrate both non-parametric approaches (PrestoBoost, Coarsage) and parametric approaches (NGBoost) to this problem.
You will learn about the field of Uncertainty Quanitification and how to use various packages and tools that solve these problems in practice.
Participants should be familiar with how to train and evaluate models. This workshop is aimed at more advanced users, but everyone should be able to benefit from the lessons.
Bio: Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of three Python packages: StructureBoost, ML-Insights, and SplineCalib. In previous roles he has served as Principal Data Scientist at Clover Health, Senior VP of Analytics at PCCI, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.