Abstract: Existing production machine learning systems often suffer from various problems that make them hard to use. For example, data scientists and ML practitioners often spend most of their time stitching and managing bespoke distributed systems to build end-to-end ML applications and push models to production.
To address this, the Ray community has built Ray AI Runtime (Ray AIR), an open-source toolkit for building large-scale end-to-end ML applications.
Ray is a distributed compute framework, powering large scale machine learning models such as OpenAI's ChatGPT. By leveraging Ray’s distributed compute strata and library ecosystem, the Ray AI Runtime brings scalability and programmability to ML platforms.
The main focus of the Ray AI Runtime is to provide the compute layer for Python-based AI/ML workloads and is designed to interoperate with popular ML frameworks and other systems for storage and metadata needs.
In this session, we’ll explore and discuss the following:
- Why and what is Ray
- How AIR, built atop Ray, allows you to program and scale your machine learning workloads easily
- AIR’s interoperability and easy integration points with other systems for storage and metadata needs
- AIR’s cutting-edge features for accelerating the machine learning lifecycle such as data preprocessing, last-mile data ingestion, tuning and training, and serving at scale
Key takeaways for attendees are:
- Ray as a general purpose framework for distributed computing
- Understand how Ray AI Runtime can be used to implement scalable, programmable machine learning workflows.
- Learn how to pass and share data across distributed trainers and Ray native libraries: Tune, Serve, Train, RLlib, etc.
- How to scale python-based workloads across supported public clouds
General familiarity with machine learning tools and frameworks is assumed. We won't go deeper into any framework in particular, but attendees should e.g. know that PyTorch and Tensorflow exist and what they are used for.
Bio: Kai Fricke is a senior software engineer at Anyscale. As a core maintainer of the Ray AI Runtime he is building software for distributed machine learning training and tuning. During his postdoc at Cambridge he utilized reinforcement learning to optimize large graph structures and co-authored two open source reinforcement learning libraries.