ParlAI (pronounced “par-lay”) is a framework for dialogue AI research, implemented in Python.

Its goal is to provide researchers:

Many tasks are supported, including popular datasets such as SQuAD, bAbI tasks, MS MARCO, MCTest, WikiQA, WebQuestions, SimpleQuestions, WikiMovies, QACNN & QADailyMail, CBT, BookTest, bAbI Dialogue tasks, Ubuntu Dialogue, OpenSubtitles, Cornell Movie, VQA-COCO2014, VisDial and CLEVR. See here for the current complete task list.

Included are examples of training neural models with PyTorch, with batch training on GPU or hogwild training on CPUs. Using Tensorflow or other frameworks instead is also straightforward.

Our aim is for the number of tasks and agents that train on them to grow in a community-based way.

ParlAI is described in the following paper: “ParlAI: A Dialog Research Software Platform", arXiv:1705.06476.

See the news page for the latest additions & updates, and the website for further docs.


Unified framework for evaluation of dialogue models

End goal is general dialogue, which includes many different skills

End goal is real dialogue with people

Set of datasets to bootstrap a working dialogue model for human interaction


Basic Examples

Note: If any of these examples fail, check the requirements section to see if you have missed something.

Display 10 random examples from task 1 of the "1k training examples" bAbI task:

python examples/ -t babi:task1k:1

Displays 100 random examples from multi-tasking on the bAbI task and the SQuAD dataset at the same time:

python examples/ -t babi:task1k:1,squad -ne 100

Evaluate on the bAbI test set with a human agent (using the local keyboard as input):

python examples/ -m local_human -t babi:Task1k:1 -dt valid

Evaluate an IR baseline model on the validation set of the Movies Subreddit dataset:

python examples/ -m ir_baseline -t "#moviedd-reddit" -dt valid

Display the predictions of that same IR baseline model:

python examples/ -m ir_baseline -t "#moviedd-reddit" -dt valid

Train a seq2seq model on the "10k training examples" bAbI task 1 with batch size of 32 examples until accuracy reaches 95% on validation (requires pytorch):

python examples/ -t babi:task10k:1 -m seq2seq -mf /tmp/model_s2s -bs 32 -vtim 30 -vcut 0.95

Trains an attentive LSTM model on the SQuAD dataset with a batch size of 32 examples (pytorch and regex):

python examples/ -m drqa -t squad -bs 32 -mf /tmp/model_drqa

Tests an existing attentive LSTM model (DrQA reader) on the SQuAD dataset from our model zoo:

python examples/ -t squad -mf "models:drqa/squad/model"

Talk to a ConvAI2 baseline KvMemNN model from the model zoo:

python projects/convai2/ -mf models:convai2/kvmemnn/model


ParlAI currently requires Python3.

Dependencies of the core modules are listed in requirement.txt.

Some models included (in parlai/agents) have additional requirements.

Installing ParlAI

Run the following commands to clone the repository and install ParlAI:

git clone ~/ParlAI
cd ~/ParlAI; python develop

This will link the cloned directory to your site-packages.

This is the recommended installation procedure, as it provides ready access to the examples and allows you to modify anything you might need. This is especially useful if you if you want to submit another task to the repository.

All needed data will be downloaded to ~/ParlAI/data, and any non-data files (such as the MemNN code) if requested will be downloaded to ~/ParlAI/downloads. If you need to clear out the space used by these files, you can safely delete these directories and any files needed will be downloaded again.

Worlds, agents and teachers

The main concepts (classes) in ParlAI:

After defining a world and the agents in it, a main loop can be run for training, testing or displaying, which calls the function world.parley(). The skeleton of an example main is given in the left panel, and the actual code for parley() on the right.

Actions and Observations

All agents (including teachers) speak to each other with a single format -- the observation/action object (a python dict). This is used to pass text, labels, rewards, and more between agents. It’s the same object type when talking (acting) or listening (observing), but a different view (i.e. with different values in the fields).

The observation/action dict fields are as follows (or see the documentation):

Each of these fields are technically optional, depending on your dataset, though the 'text' field will most likely be used in nearly all exchanges.

Note: during validation and testing, the labels field is renamed eval_labels – this way, the model won’t accidentally train on the labels, but they are still available for calculating model-side loss.

For a fixed supervised learning dataset like bAbI, a typical exchange from the training set might be as follows (the test set would not include labels):

Teacher: {
    'text': 'Sam went to the kitchen\nPat gave Sam the milk\nWhere is the milk?',
    'labels': ['kitchen'],
    'label_candidates': ['hallway', 'kitchen', 'bathroom'],
    'episode_done': False
Student: {
    'text': 'hallway'
Teacher: {
    'text': 'Sam went to the hallway\nPat went to the bathroom\nWhere is the milk?',
    'labels': ['hallway'],
    'label_candidates': ['hallway', 'kitchen', 'bathroom'],
    'episode_done': True
Student: {
    'text': 'hallway'
Teacher: {
    ... # starts next episode


The code is set up into several main directories:

Each directory is described in more detail below, ordered by dependencies.


The core library contains the following files:


The agents directory contains agents that have been approved into the ParlAI framework for shared use. We encourage you to contribute new ones! Some agents currently available within this directory:

See the directory for the complete list.


This directory contains a few particular examples of basic loops.


Our first release included the following datasets (shown in the left panel), and accessing one of them is as simple as specifying the name of the task as a command line option, as shown in the dataset display utility (right panel):

Over 20 tasks were supported in the first release, including popular datasets such as SQuAD, bAbI tasks, MCTest, WikiQA, WebQuestions, SimpleQuestions, WikiMovies, QACNN, QADailyMail, CBT, BookTest, bAbI Dialog tasks, Ubuntu, OpenSubtitles, Cornell Movie, VQA-COCO2014. Since then, several datasets have been added such as VQAv2, VisDial, MNIST_QA, Personalized Dialog, InsuranceQA, MS MARCO, TriviaQA, and CLEVR. See here for the current complete task list.

Choosing a task in ParlAI is as easy as specifying it on the command line, as shown in the above image (right). If the dataset has not been used before, ParlAI will automatically download it. As all datasets are treated in the same way in ParlAI (with a single dialogue API), a dialogue agent can in principle switch training and testing between any of them. Even better, one can specify many tasks at once (multi-tasking) by simply providing a comma-separated list, e.g. the command line “-t babi,squad”, to use those two datasets, or even all the QA datasets at once (-t #qa) or indeed every task in ParlAI at once (-t #all). The aim is to make it easy to build and evaluate very rich dialogue models.

Each task folder contains:

To add your own task, see the tutorial.


An important part of ParlAI is seamless integration with Mechanical Turk for data collection, training and evaluation.

Human Turkers are also viewed as agents in ParlAI and hence person-person, person-bot, or multiple people and bots in group chat can all converse within the standard framework, switching out the roles as desired with no code changes to the agents. This is because Turkers also receive and send via a (pretty printed) version of the same interface, using the fields of the observation/action dict.

We currently provide three examples: collecting data, human evaluation of a bot, and round-robin chat between local humans and remote Turkers.

The mturk library contains the following directories:

To run an MTurk task:

To add your own MTurk task:

Please see the MTurk tutorial to learn more about the MTurk examples and how to create and run your own task.


Please see the Facebook Messenger tutorial to learn more about how to use ParlAI with Facebook Messenger.


If you have any questions, bug reports or feature requests, please don't hesitate to post on our Github Issues page.

The Team

ParlAI is currently maintained by Emily Dinan, Alexander H. Miller, Stephen Roller, Kurt Shuster, Jack Urbanek and Jason Weston. A non-exhaustive list of other major contributors includes: Will Feng, Adam Fisch, Jiasen Lu, Antoine Bordes, Devi Parikh, Dhruv Batra, Filipe de Avila Belbute Peres and Chao Pan.


Please cite the arXiv paper if you use ParlAI in your work:

  title={ParlAI: A Dialog Research Software Platform},
  author={{Miller}, A.~H. and {Feng}, W. and {Fisch}, A. and {Lu}, J. and {Batra}, D. and {Bordes}, A. and {Parikh}, D. and {Weston}, J.},
  journal={arXiv preprint arXiv:{1705.06476}},


ParlAI is MIT licensed. See the LICENSE file for details.