Decoding LLMs: Evaluation is all you need!

Abstract: 

Large Language Models (LLMs) gave a new life to the domain of natural language processing, revolutionizing various fields from conversational AI to content generation. However, as these models grow in complexity and scale, evaluating their performance presents many challenges.

One of the primary challenges in LLM evaluation lies in the absence of standardized benchmarks that comprehensively capture the capabilities of these models across diverse tasks and domains. Secondly, the black-box nature of LLMs poses significant challenges in understanding their decision-making processes and identifying biases. In this talk, we address the fundamental questions such as what constitutes effective evaluation metrics in the context of LLMs, and how these metrics align with real-world applications.

As the LLM field is seeing dynamic growth and rapid evolution of new architectures, it also requires continuous evaluation methodologies that adapt to changing contexts. Open-source initiatives play a pivotal role in addressing the challenges of LLM evaluation, driving progress, facilitating the development of standardized benchmarks, and enabling researchers to consistently benchmark LLM performance across various tasks and domains. We will also evaluate some of the OS evaluation metrics and walkthrough of code using demo data.

Learning Outcomes -
- Understand the landscape of LLM Eval: Learn more about the current SOTA eval metrics
- Deep-dive on which eval metrics to choose based on problem type: Decide on which metrics to use and how to make that decision
- run OSS packages to design own metrics - Run sample code to generate metric report for a use-case using open-source packages

Bio: 

Jayeeta is a Senior Data Scientist with several years of industry experience in Natural Language Processing (NLP), Statistical Modeling, and implementing ML solutions at scale. Currently, Jayeeta works at Fitch Ratings, a global leader in financial information services as a Manager - Emerging Technology, where she gets to leverage cutting-edge advancements in Generative AI to build efficient solutions for business use cases. She is an avid Gen AI researcher and gets to explore a lot of state-of-the-art open-source models to build impactful products and firmly believes that data, of all forms, is the best storyteller.

Jayeeta is passionate about utilizing these tools not only to build impactful products but also to use this as a platform to advocate for and empower women in tech. Throughout her career, Jayeeta has been actively involved in initiatives to support and mentor women in STEM, by leading workshops with organizations like Women Who Code and Google’s Women Techmakers. Jayeeta’s commitment to fostering diversity is further demonstrated by her roles as an ambassador for Women in Data Science at Stanford University, as a Data Science Mentor at Girl Up, United Nations Foundation, and mentor at WomenTech Network. Jayeeta is also the NYC Chapter Lead for Women in AI, a global nonprofit working towards gender-inclusive AI that benefits global society based out of Paris. At Fitch, Jayeeta is also a selected member - North America Chapter for two groups - 100 Women in Finance and Women's Bond Club.

Jayeeta is also an invited speaker at renowned conferences such as ICML, ODSC, WomenTech Global Conference, and Generative AI Summit, to name a few. Jayeeta has also been nominated for the WomenTech Global Awards in the category “Rising Star in STEM of the Year 2020” and recognized in the List of Top 100 Women Who Break the Bias, which has not only been an honor but has also reinforced her dedication to championing gender equality in technology.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google