Building an Expert Question/Answer Bot with Open Source Tools and LLMs


Quality is critical when applying large language models (LLMs) to the real world. LLMs, particularly foundation models, are trained on vast corpora of data, giving them a general ""understanding"" of the world that is nothing less than jaw-dropping. But, along with this wide coverage, LLMs also inherit an internet-level bias that is near impossible to fully understand, let alone control. This ubiquitous bias poses a challenge because it only sometimes aligns with the expectations and requirements of our unique application domains. Therefore, a one-size-fits-all LLM often needs to meet the expectation of providing quality responses for specific applications.

As much as these LLMs are data-rich, their application in the real world leaves room for improvement. Quality, not quantity, becomes the key issue. For business applications, contextual awareness, data privacy, and the ability to control these applications are vital requirements. LLMs and applications built on top of them need continuous fine-tuning to suit specific domains and align the model with our precise needs. The ability to do this consistently and reliably is becoming integral for vertical-specific LLM applications. Additionally, we must continuously tune and improve our models and applications.

In this workshop, we'll explore how Label Studio, LangChain, Chroma, and Gradio can be employed as tools for continuous improvement, specifically in building a Question-Answering (QA) system trained to answer questions about an open source project, in this case Label Studio itself, using the domain-specific knowledge from Label Studio’s GitHub documentation.

Ultimately, our aim is for the QA system to serve as a blueprint for continuous enhancement across LLM applications. We want a system that allows us to strategically navigate the continuous cycle of feedback and adaptation, all while allowing us to incorporate human understanding.

Prerequisites for this talk include: a laptop with a minimum 16GB of memory and Docker Desktop installed. Alternatively, participants may use Hugging Face Spaces to host their Label Studio and machine learning environments.


Chris Hoge is the Head of Community for HumanSignal, where he is helping to grow the Label Studio community. He has spent over a decade working in open source machine learning and infrastructure communities, including Apache TVM, Kubernetes, and OpenStack. He has an M.S. in Applied Mathematics from the University of Colorado, with an emphasis on using high-performance numerical methods for simulating physical systems. He makes his home in the Pacific Northwest, where he spends his free time trail running and playing piano.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google