This module provides a set of teachers that deal with dialog:

FixedDialogTeacher(Teacher) Base class for teachers in tasks that have fixed dialog - i.e., dialog that is not dynamically generated but rather is pulled from set examples. However, the class can be extended to all tasks involved fixed data. Implements much of the basic functionality of these teachers, including observe(), act(), next_example()

Base teacher class for doing dialog specifically with fixed chat logs.
Teacher class that provides access to data in the Facebook Dialog format. See the class description for more details.

This module also includes DataLoader, a threadpool data loader for FixedDialogTeacher, and DialogData/StreamDialogData, data structures for accessing textual dialog data and utilized by DialogTeacher

class parlai.core.teachers.DataLoader(opt)

A worker thread that provides a threadpool for data loading.

A teacher may submit a request to the loader, which will return the appropriate data.

To submit a request, a teacher should call request_load with the following arguments:

  • receive_fn - a receive function (for receiving the data)
  • load_fn - a load function (for loading the data)
  • args - arguments for the load function
    -> args can be either a dictionary of arguments for a function, or
    a list of positional arguments
class parlai.core.teachers.FixedDialogTeacher(opt, shared=None)

A teacher agent for all teachers involved in tasks with fixed data.

This class provides the following functionality for its subclasses:

  • Resets a teacher
  • Provides an observe method
  • Computes and retrieves the next episode index for a teacher
  • Provides a threadpool option for loading data (especially useful for large data, e.g. images)

To utilize the DataLoader for threadpool loading, a teacher should implement the submit_load_request function to send a load request to the DataLoader by calling self.data_loader.request_load with the appropriate arguments (receive_fn, load_fn, args). The DataLoader then returns the data to the teacher’s data_queue, which the teacher can poll in its act method.

The following is an example of the DataLoader usage in the VQA-V1 teacher.

  1. In the teacher’s init function, the teacher calls its submit_load_request function to preload an image.

  2. The submit_load_request function gets the next episode_idx, and computes the image path for the load request.

  3. At the end of submit_load_request, the teacher calls self.data_loader.request_load with three args: - self.receive_data - the function that the DataLoader calls to

    return the the loaded object

    • self.image_loader.load - the function used to load the image
      from the image path
    • [img_path] - a list of arguments for the load function, which
      in this case is the path of the image.
  1. In the teacher’s act function, the teacher loads the data from its data queue.
  2. At the end of the act function, the teacher calls submit_load_request to preload an image for the next example.

Reset the dialog so that it is at the start of the epoch, and all metrics are reset.


An agent should implement this method to submit requests to the data loader. At the end of this method, the agent should call self.data_loader.request_load() with the appropriate args.


Function for receiving data from the data loader.


Get the number of episodes in this dataset.


Get the total number of examples in this dataset.

get(episode_idx, entry_idx=0)

Get the specified episode and the specified entry in that episode.

Many datasets have only single-entry episodes, so entry_idx defaults to zero. Children must override this method in order to inherit the next_example method.


Process observation for metrics.


Send new dialog message.

class parlai.core.teachers.DialogTeacher(opt, shared=None)

A base teacher class for doing dialog with fixed chat logs.

This class provides a set a basic functionality:

  • uses data class to store and query text data
  • generates action tables to send to the student agent from the data
  • metrics tracking count of sent vs correctly answered queries

If you have opt.numthreads > 1, this also activates a shared memory array for the data and lock-protected shared-memory metrics.

In order to subclass this class, you must implement setup_data() in your class (or subclass another class which does, like FbDialogTeacher), which reads your data file as an iterator.


Returns None by default, but override this in children (such as FbDialogTeacher) to load up candidate labels for every example.

class parlai.core.teachers.DialogData(opt, data_loader=None, cands=None, shared=None, **kwargs)

Provides a data structure for accessing textual dialog data. This can be used whenever the dialog data is a fixed log of chats (i.e not a simulator setting). The logs can include dialog text and possibly supervised labels, candidate labels and rewards.

All these are stored in this internal data format which is used by the DialogTeacher class.

data_loader is an iterable, with each call returning:

(x, ...), new_episode?


  • x is a query and possibly context

... can contain additional fields, specifically

  • y is an iterable of label(s) for that query
  • r is the str reward for getting that query correct
  • c is an iterable of label candidates that the student can choose from
  • i is a str path to an image on disk, which will be loaded by the data class at request-time. should always point to the raw image file.
  • new_episode? is a boolean value specifying whether that example is the start of a new episode. If you don’t use episodes set this to True every time.

cands can be set to provide a list of candidate labels for every example in this dataset, which the agent can choose from (the correct answer should be in this set).

random tells the data class whether or not to visit episodes sequentially or randomly when returning examples to the caller.


Return number of episodes in the dataset.


Returns total number of entries available. Each episode has at least one entry, but might have many more.

get(episode_idx, entry_idx=0)

Returns a specific entry from the dataset.


Packs an entry into an action-observation dictionary.

class parlai.core.teachers.StreamDialogData(opt, data_loader=None, cands=None, shared=None, **kwargs)

Provides a data structure for streaming textual dialog data. This can be used whenever the dialog data follows the format described in DialogData but cannot fit entirely into memory.

Additional keyword-argument cycle defines if the stream should restart from the beginning after an epoch is finished (defaults to True).


Returns a the next entry from the stream in the current episode for this instance. When episode is done returns first entry of next episode.


Reset the datastream to its beginning

class parlai.core.teachers.FbDialogTeacher(opt, shared=None)

This module provides access to data in the Facebook Dialog format.

Subclasses DialogTeacher for functionality and provides an implementation of setup_data() which iterates over datasets in the “fbdialog” format.

The way FB Dialog data is set up is as follows:

1 Sam went to the kitchen.
2 Pat gave Sam the milk.
3 Where is the milk?<TAB>kitchen<TAB>1<TAB>hallway|kitchen|bathroom
4 Sam went to the hallway
5 Pat went to the bathroom
6 Where is the milk?<TAB>hallway<TAB>1<TAB>hallway|kitchen|bathroom

Lines 1-6 represent a single episode, with two different examples: the first example is lines 1-3, and the second is lines 4-6.

Lines 1,2,4, and 5 represent contextual information.

Lines 3 and 6 contain a query, a label, a reward for getting the question correct, and three label candidates.

Since both of these examples are part of the same episode, the information provided in the first example is relevant to the query in the second example and therefore the agent must remember the first example in order to do well.

In general dialog in this format can be any speech, not just QA pairs:

1 Hi how's it going?<TAB>It's going great. What's new?
2 Well I'm working on a new project at work.<TAB>Oh me too!
3 Oh cool!<TAB>Tell me about yours.



Load global fixed set of candidate labels that the teacher provides every example (the true labels for a specific example are also added to this set, so that it’s possible to get the right answer).


Reads data in the fbdialog format.

Returns ((x,y,r,c), new_episode?) tuples.

x represents a query, y represents the labels, r represents any reward, and c represents any label_candidates.

The example above will be translated into the following tuples:

x: 'Sam went to the kitchen\nPat gave Sam the milk\nWhere is the milk?'
y: ['kitchen']
r: '1'
c: ['hallway', 'kitchen', 'bathroom']
new_episode = True (this is the first example in the episode)
x: 'Sam went to the hallway\nPat went to the bathroom\nWhere is the
y: ['hallway']
r: '1'
c: ['hallway', 'kitchen', 'bathroom']
new_episode = False (this is the second example in the episode)