A Background to LLMs and Intro to PaLM 2: A Smaller, Faster and More Capable LLM

Abstract: 

Large language models have swept through the world of AI in the past few years. This talk will give some background on large language models, their origins from the early days of information theory, current capabilities, some pitfalls and potential future capabilities with a focus on the innovations behind some recent models including the sparse LLM model GLaM and the recent model PaLM 2.

PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.

Bio: 

Andrew Dai did his PhD at the University of Edinburgh before joining Google Brain 9 years ago in 2014 where he did research on language models, story generation and conversational agents and products including SmartReply. He moved to Google Health in 2017 to research deep learning for medical records. He then returned to continue research at Google Brain (now Google Deepmind) in 2020 and since then has co-led the development and training of LLMs including PaLM 2 and GLaM. Andrew also is a lead for Google SGE modelling, Gemini and data research and is excited by the new abilities we see from LLMs.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google