Abstract: Off-the-shelf Large Language Models (LLMs) such as GPT-4 have already proven their versatility in numerous tasks and are revolutionizing entire industries. However, achieving exceptional performance in highly specific domains can be challenging, and traditional fine-tuning is often not accessible, due to its extensive demands in terms of data, finances, and expertise, exceeding the means of most organizations.
Retrieval-Augmented Generation (RAG) is a widely adopted technique to augment the knowledge of LLMs within very specific domains while mitigating hallucinations. RAG achieves this by shifting the burden of information retrieval from the LLM's internal knowledge to external retrieval systems, often more specialized in this task due to their focused scope. However, RAG is not a silver bullet. Getting it to perform effectively can be far from trivial, and for some use cases it’s not applicable entirely.
In this talk we will first understand what RAG is, where it shines and why it works so well in these applications. Then we are going to see the most common failure modes and walk through a few of them to evaluate whether RAG is a suitable solution at all, how to fix it or alternatively what approaches could be a better fit for the specific use case.
- What is RAG and what are its most fitting applications
- Typical failure modes of RAG and potential approaches to improve the system’s performance
- Alternatives to RAG for specific use cases with respective pros and cons"
Bio: Sara Zanzottera is an NLP Engineer at deepset and a core maintainer of Haystack, one of the most mature open-source LLM frameworks. Before joining deepset she worked for several years at CERN as a Python software engineer on the particle accelerator’s control systems.