
Abstract: scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This training will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines and advanced model evaluation. Model evaluation is an underappreciated aspect of machine learning, but using the right metric to measure success is critical. Practitioners are often faced with imbalanced classification tasks, where accuracy can be uninformative or misleading. We will discuss other metrics, when to use them, and how to compute them with scikit-learn. We will also look into how to build processing pipelines using scikit-learn, to chain multiple preprocessing techniques together with supervised models, and how to tune complex pipelines.
Bio: Andreas Mueller is an Associate Research Scientist at the Data Science Institute at Columbia University and author of the O'Reilly book """"Introduction to machine learning with Python"""". He is one of the core developers of the scikit-learn machine learning library and has co-maintained it for several years.
His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize access to high-quality machine learning algorithms.

Andreas Mueller, PhD
Title
Author, Research Scientist, Core Contributor of scikit-learn | Columbia Data Science Institute
