ParlAI Projects

Here we list projects undertaken in the ParlAI framework that are shared publicly, either in the form of papers, public tasks (with leaderboards) and/or shared model code.

See the ParlAI projects page on GitHub for more information. Refer to ParlAI's agents, tasks and model zoo for what else is in ParlAI.

BlenderBot (Putting Everything Together)

BlenderBot 3x [project]. Data release of 6M chat interactions, training on this improves BB3 from 85.3% → 94.4% good messages.
BlenderBot 3 [project]. A 175B-parameter, publicly available chatbot that improves its skills & safety over time
BlenderBot 2 [project]. Version 2 of our BlenderBot model with Internet search and long-term memory.
BlenderBot 1 (Recipes for open-domain chatbots) [project]. We open source 90M, 2.7B and 9.4B parameter generative models fine-tuned on BST.

Generative Models & Architectures

CRINGE Loss [project] New loss for training language models with negative examples (with no architecture change).
Director [project] New architecture for training language models with positive and negative examples using LM+classifier heads.
Maintaining Identity [project] State-of-the-art dialogue models cannot maintain identity -- we study measurements & methods for this open problem.
More Parameters or More Compute? [project] Answer: Both! Two new methods that explore this question: Hash Layers for more parameters, and Staircase Attention for more power per parameter.
Style-Controlled Generation [project] [paper] Tasks and models for training and evaluating generative models conditioned on a style token.
Unlikelihood Training for Consistent Dialogue [project]. Methods to reduce copies & repeats, correct vocab usage, and avoiding contradiction via unlikelihood training.
What makes a good conversation? How controllable attributes affect human judgments [website] [paper]. Optimizing for multi-turn engaging conversations -- by controlling question-asking, specificity, response-relatedness and repetition.
Retrieve and Refine [paper]. Models for improved chitchat ability by combining retrieval with generative refinement.
Poly-Encoders [project] [paper]. State-of-the-art Transformer architectures + pretraining for dialogue retrieval.
Importance of Search Strategy [paper]. Analysis of the performance of search in generative models for chitchat tasks.

Well-Behaved / Safety

SaFeRDialogues: Taking Feedback Gracefully after Conversational Safety Failures [project] [paper]. Task and method for teaching bots to react gracefully to feedbacks when safety failures happen
Safety for E2E Conversational AI workshops We have helped organize workshops in 2020 and 2021 on Safety in Conversational AI.
Anticipating Safety Issues in E2E Conversational AI [project] [paper]. Benchmarks for evaluating the safety of English-language dialogue models
Recipes for Safety in Open-Domain Chatbots [project] [paper]. Methods for improving the safety of open-domain chatbots.
Build-It Break-It Fix-It for Dialogue Safety [project] [paper]. Task and method for improving the detection of offensive language in the context of dialogue.
Multi-Dimensional Gender Bias Classification [project] [paper] Training fine-grained gender bias classifiers to identify gender bias in text.
Mitigating Genderation Bias [project] [paper]. Analysis and methods for mitigating gender bias in dialogue generation, using LIGHT as a testbed.
Reducing conversational agents' overconfidence through linguistic calibration [project] [paper]. Analysis and methods for relating and correcting linguistice confidence and correctness in dialogue generation, using closed-book QA as a testbed.

Interactive & Continual Learning

Learning New Skills after Deployment [project]. Shows how to learn from Feedback for Interactive Talk & Search (FITS) + new dataset
Finding the Helpers and Ignoring the Trolls [project]. Algorithms for learning from a mixture of adversarial and non-adversarial organic users
Self-Feeding Chatbot [paper] How an agent can learn from dialogue after deployment by imitating and asking for feedback.
Beat-The-Bot Live Game [project] A new data collection and model evaluation tool, a Messenger-based Chatbot game called Beat the Bot.

Open-domain Dialogue

Long-Term Open-Domain Conversation [project] [paper]. Multi-session conversation task and memory-based models for long-form chat.
Addressing Contradictions in Dialogue Modeling [project]. A new task for contradiction detection and its use for non-contradicting generation.
Blended Skill Talk [project]. Blending the skills of engagingness, personality, empathy and knowledge with a task that mixes PersonaChat, Empathetic Dialogues and Wizard of Wikipedia elements.
dodeca Dialogue [project]. Set of 12 (existing) tasks for building an agent that can see and talk. We build a strong baseline system with SOTA on many tasks.
Dialogue Natural Language Inference [external website]. Task and method for improving dialogue consistency.
Empathetic Dialogues [paper] [external website] [video]. Task & models for chitchat displaying empathy.
ConvAI2 Competition [external website]. Competition on dialogue chitchat based on the PersonaChat task.
Persona-Chat [project]. Task & models for chitchat with a given persona.

Knowledge Grounded

SeeKeR: [project] Modular open source search-augmented language model.
Reason first, then respond: [project] [paper] A modular Generation method for Knowledge-infused Dialogue.
Internet-Augmented Dialogue Generation [project] [paper]. Utilizing a search-engine for open domain chitchat task & models.
Retrieval Augmentation Reduces Hallucination in Conversation [project] [paper]. Exploratory architectures that add retrieval mechanisms to dialogue models, reducing hallucination while maintaining conversational ability.
Wizard of Wikipedia [project] [paper]. Knowledge-grounded open domain chitchat task & models.

Visually Grounded

Multi-Modal BlenderBot [project] [paper]. Model for multi-modal dialogue about both images and chitchat.
Image Chat [paper] [task]. Task for personality-based engaging dialogue on images.
Personality-Captions [project] [paper]. Task for personality-based engaging comments on images.

Environment Grounded

LIGHT [project] A large-scale text adventure game research platform for agents that speak and act.
Mastering the Dungeon (Archived) [project]. Task & models for training grounded agents in a text adventure game via MTurk.
Talk The Walk (Archived) [paper]. Task & models for grounded dialogue for the task of navigating New York City streets.
Multi-party LIGHT [paper]. Task & models for multi-party chat in LIGHT. Conversation is grounded on charaters' persona and the location.

QA

HotPotQA [external website]. QA task with multi-hop reasoning. Task built with ParlAI Mturk.
CoQA [external website]. QA task with a series of interconnected questions. Task built with ParlAI Mturk.
DrQA [parlai agent] [project] [external website] [paper]. QA model for answering questions by retrieving and reading knowledge.

Evaluation

Human Evaluation Methods Comparison [project] [paper]. Compares how well different human crowdworker evaluation techniques can detect relative performance differences among dialogue models.
ACUTE-Eval [parlai task] [paper]. ACUTE Eval is a sensitive human evaluation method for dialogue which evaluates whole conversations in a pair-wise fashion, and is our recommended method in many cases.