Finetuning, Serving and Evaluating LLMs in the Wild


Post the release of Meta’s Llama weights,, the open source development of large language models (LLMs) are seeing rapid progress almost every day.

This talk will share our experience with serving and evaluating 20+ LLM-based Chatbots, including Vicuna, within the Chatbot Arena (

I will start by briefly explaining Vicuna, an open source chatbot we finetuned from Llama, and the Chatbot Arena platform we developed to evaluate the quality of such models in the wild. I will then discuss the underlying system challenges we faced: how to serve many LLMs, achieving high throughput and low latency, given only a limited amount of university-donated GPUs. I’ll cover two key enabling techniques behind the scene: paged attention (vLLM, SOSP’23) and statistical multiplexing with model parallelism (AlpaServe, OSDI’23). This is joint work with members of the LMSYS Org team at


Hao is currently a postdoctoral researcher at the Sky Lab, UC Berkeley, working with Prof. Ion Stoica. He is recently working on the Alpa project and the Sky project, aiming at democratizing large models like GPT-3. He is an Assistant Professor at Halıcıoğlu Data Science Institute and Department of Computer Science and Engineering (affiliate) at UC San Diego in Fall 2023.

He research is primarily focused on large-scale distributed ML in the joint context of ML and systems, concerning performance, usability, cost, and privacy. His work spans across distributed ML algorithms, large models, parallelisms, performance optimizations, system architectures, ML privacy, and AutoML, with applications in computer vision, natural language processing, and healthcare.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google