Abstract: Major advances in algorithms, systems, and hardware led to large deep learning models being trained by gradient-based stochastic optimization. However, these algorithms come with many hyperparameters that are crucial for good performance. Even if pre-trained models are used, the right one for the task needs to be chosen, and extra parameters like soft prompts may have to be adjusted. Tuning is difficult and time-consuming, even for experts, criteria like latency or cost often play a role in deciding for a winning hyperparameter configuration.
In this tutorial, we will present an overview of modern hyperparameter optimization methods. Our approach is hands-on, using Syne Tune, a new open source library for distributed hyperparameter tuning and global optimization, which implements many state-of-the-art methods behind an easy-to-use API. Syne Tune is agnostic to the execution backend, but offers close integration with the AWS SageMaker ecosystem. It offers a wide range of features, waiting to be specialized to your application.
This tutorial will be most useful for data scientists and ML practitioners who need to regularly train and tune large ML models on a time and compute budget. You will learn about basic modern tuning algorithms, and how to use the best method for your application. We will demonstrate how distributed tuning on AWS SageMaker can speed up finding the right model for your particular data. You will also learn about some advanced use cases of automated tuning.
- Module 1: Walk through modern hyperparameter optimization [40 mins]:
Automated tuning involves decisions on which models to train, and how to split parallel compute resources among the training jobs. We begin with random search and Bayesian optimization, which ignore the latter, then explore successive halving and model-based variants, where training jobs can be stopped early, or paused and resumed later on. Finally, we touch advanced setups, such as constrained Bayesian optimization and multi-objective tuning. At the end of this module, you will have learned about basic automatic tuning modalities and gained insight into when to use which of them. The tutorial is using Syne Tune
(https://github.com/awslabs/syne-tune), and we link several tutorials and examples, which can be used to deepen your understanding.
- Module 2: Distributed Tuning on SageMaker [20 mins]:
In this module, you will learn how to use Syne Tune in order to prepare, plan, and launch distributed tuning experiments on AWS SageMaker, along with visualization of the results (for example, how to combine model selection and fine tuning for Hugging Face transformer models). Again, we provide links to tutorials and examples for self-guided further study.
- Module 3: Advanced Use Cases [20 mins]:
In the final module, we go a bit deeper on one or two advanced use cases, such as multi-objective tuning of AWS instance and configurations for reducing cost and latency, while maintaining accuracy; or neural architecture search for efficient transfer learning.
You should have been exposed to the process of selecting and training a machine learning model for a given task. Examples are written in Python. Some familiarity with a deep learning framework (such as PyTorch) is helpful, but not necessary.
Bio: Matthias W. Seeger is a principal applied scientist at Amazon. He received a Ph.D. from the School of Informatics, Edinburgh university, UK, in 2003 (advisor Christopher Williams). He was a research fellow with Michael Jordan and Peter Bartlett, University of California at Berkeley, from 2003, and with Bernhard Schoelkopf, Max Planck Institute for Intelligent Systems, Tuebingen, Germany, from 2005. He led a research group at the University of Saarbruecken, Germany, from 2008, and was assistant professor at the Ecole Polytechnique Federale de Lausanne from fall 2010. He joined Amazon as machine learning scientist in 2014. He received the ICML Test of Time Award in 2020.
His interests center around Bayesian learning and decision making with probabilistic models, from gaining understanding to making it work in large scale practice. He has been working on theory and practice of Gaussian processes and Bayesian optimization, scalable variational approximate inference algorithms, Bayesian compressed sensing, and active learning for medical imaging. More recently, he worked on demand forecasting, hyperparameter tuning (Bayesian optimization) applied to deep learning (NLP), and AutoML.