Pink Elephants and Direct Principle Feedback


This tutorial presents Direct Principle Feedback (DPF), a novel approach for fine-tuning language models (LLMs) to dynamically obey new behavioral constraints at inference time. DPF addresses the Pink Elephant Problem, enabling models to avoid discussing specified unwanted topics (""""Pink Elephants"""") while focusing on desired ones (""""Grey Elephants""""). By applying DPF with high-quality synthetic data, we teach models to effectively navigate complex content guidelines across multiple contexts, offering a significant advancement over traditional reinforcement learning methods for LLM control.

Targeting professionals in fields requiring dynamic content control, such as edtech and social media, this session elucidates the process of generating synthetic preference data, the mechanics of DPF, and its application for enhancing LLM controllability. Participants will acquire the expertise to deploy LLMs capable of adapting to specific content guidelines, ensuring relevance and compliance in diverse deployment scenarios.

Through this tutorial, attendees will gain insights into leveraging DPF for addressing not only the Pink Elephant Problem but also broader challenges in LLM behavior control, marking a step forward in the development of adaptable, context-aware AI systems.

Session Outline:

Understand some of the constraints and challenges in performing RLAIF at scale. No explicit open source tools will be used, we'll primarily focus on methodology (which can easily be reimplemented in most tools)

Background Knowledge:

LLMs, RLAIF, DPO, Synthetic data.


Louis Castricato is currently a research scientist at EleutherAI, working on large scale RLAIF infrastructure. Louis, previously, was team lead at CarperAI and Head of LLMs/Research Director at Stability AI, where he worked on libraries like trlX and various RLHF projects. Louis is also a PhD student at Brown University, advised by Ellie Pavlick.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google