Improving Open Language Models by Learning from Organic Interactions

Jing Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster



When we released BlenderBot 3 in early August 2022, we included the option for adults in the United States to choose to share their de-identified interactions to help the AI community improve the ability of conversational models. From the people who chose to share, we collected, and are releasing, over 353,000 conversations, approximately 6.2 million utterances, and more than 155,000 instances of feedback, where people flagged messages as good or bad, and why – for example, whether they are nonsensical, off-topic or inappropriate. See here for the deployment data card.

To make sense of this data, we used paid crowdworkers to help us determine the quality of the conversations. At a high level, we learned:

While participants clearly have engaged in very different kinds of conversations, we believe both standard and adversarial interactions can be very useful for learning improved models. For example, we would like our models to behave well in adversarial situations, as well as being engaging in the standard conversation case.


We used the CRINGE loss to train an improved model called BlenderBot 3x frmo the resulting interactions. CRINGE loss works by training to encourage generating good responses, while decreasing the probability of generating bad ones. Using the feedback (good / bad) from the organic data we collected, in conjunction with crowdworker annotations, we divide the data into good and bad responses, and apply this training criterion.

As negative feedback includes both semantic errors such as incorrect, nonsensical or off-topic responses, as well as issues regarding safety such as inappropriate behavior, this learning discourages both kinds of mistakes. Conversely, things that the model did well during deployment previously will be encouraged in future conversations.

Our new model outperforms its predecessor with 94.4% of BlenderBot 3x’s responses evaluated as good, compared to 85.3% for BlenderBot 3. Overall, BlenderBot 3x is shown to produce both better responses on average and safer responses than BlenderBot 3 in challenging situations.


How to access deployment Data

To train models using this data, take a look at the BB3 training command and include the newly released deployment data. The following basic tasks have been created to utilize the deployment data.

See the ParlAI quickstart for help on how to set up ParlAI and access data.

To display some data from these tasks you can run something similar to the following:

parlai dd -t projects.bb3x.tasks.agents:FilterOutAdversarialHumansBotTeacher

For all of these tasks, additional attributes describing the label message (details on these in the data card) can be accessed under label_info. The cringe loss code can also be seen here.