core.teachers

This module provides a set of teachers that deal with dialog:

FixedDialogTeacher(Teacher) Base class for teachers in tasks that have fixed dialog - i.e., dialog that is not dynamically generated but rather is pulled from set examples. However, the class can be extended to all tasks involved fixed data. Implements much of the basic functionality of these teachers, including observe(), act(), next_example()

DialogTeacher(FixedDialogTeacher)
Base teacher class for doing dialog specifically with fixed chat logs.
FbDialogTeacher(DialogTeacher)
Teacher class that provides access to data in the Facebook Dialog format. See the class description for more details. ** NOTE: ** We plan to deprecate this method soon in favor of ParlAIDialogTeacher, however several existing tasks are currently still using it.
ParlAIDialogTeacher(DialogTeacher)
Teacher class that provides access to data in the ParlAI Dialog format. See the class description for more details.

This module also includes DataLoader, a threadpool data loader for FixedDialogTeacher, and DialogData/StreamDialogData, data structures for accessing textual dialog data and utilized by DialogTeacher

class parlai.core.teachers.DataLoader(opt)

A worker thread that provides a threadpool for data loading.

A teacher may submit a request to the loader, which will return the appropriate data.

To submit a request, a teacher should call request_load with the following arguments:

Parameters:
  • receive_fn – a receive function (for receiving the data)
  • load_fn – a load function (for loading the data)
  • args – arguments for the load function. args can be either a dictionary of arguments for a function, or a list of positional arguments
class parlai.core.teachers.FixedDialogTeacher(opt, shared=None)

A teacher agent for all teachers involved in tasks with fixed data.

This class provides the following functionality for its subclasses:

  • Resets a teacher
  • Provides an observe method
  • Computes and retrieves the next episode index for a teacher
  • Provides a threadpool option for loading data (especially useful for large data, e.g. images)

In order to take advantage of the first few features, all a subclass has to implement is three functions: num_episodes, num_examples, and get (which returns a specific example from a specific episode).

To utilize the DataLoader for threadpool loading, a teacher should implement the submit_load_request function to send a load request to the DataLoader by calling self.data_loader.request_load with the appropriate arguments (receive_fn, load_fn, args). The DataLoader then returns the data to the teacher’s data_queue, which the teacher can poll in its act method.

The following is an example of the DataLoader usage in the VQA-V1 teacher.

  1. In the teacher’s init function, the teacher calls its submit_load_request function to preload an image.

  2. The submit_load_request function gets the next episode_idx, and computes the image path for the load request.

  3. At the end of submit_load_request, the teacher calls self.data_loader.request_load with three args:

    • self.receive_data - the function that the DataLoader calls to return the the loaded object
    • self.image_loader.load - the function used to load the image from the image path
    • [img_path] - a list of arguments for the load function, which in this case is the path of the image.
  4. In the teacher’s act function, the teacher loads the data from its data queue.

  5. At the end of the act function, the teacher calls submit_load_request to preload an image for the next example.

To see this in action, take a look at this teacher in tasks.vqa_v1.agents.

reset()

Reset the dialog so that it is at the start of the epoch, and all metrics are reset.

submit_load_request()

An agent should implement this method to submit requests to the data loader. At the end of this method, the agent should call self.data_loader.request_load() with the appropriate args.

By default, this method does nothing.

receive_data(future)

Function for receiving data from the data loader.

Parameters:future – result from the load request.
share()

Shares data structures between other instances created for batching or hogwild.

next_episode_idx(num_eps=None, loop=None)

Returns the next episode index.

Parameters:
  • num_eps – default None uses num_episodes value.
  • loop – default None loops during training but not evaluation.
next_example()

Returns the next example. If there are multiple examples in the same episode, returns the next one in that episode. If that episode is over, gets a new episode index and returns the first example of that episode.

next_batch()

Returns the next batch of examples.

num_episodes()

Get the number of episodes in this dataset.

num_examples

Get the total number of examples in this dataset.

get(episode_idx, entry_idx=0)

Get the specified episode and the specified entry in that episode. Children must override this method in order to inherit the next_example method.

Parameters:
  • episode_idx – which episode to return examples from
  • entry_idx – which example to return from the episode. Many datasets have only single-entry episodes, so this defaults tozero.
observe(observation)

Process observation for metrics.

batch_act(observations)

Returns an entire batch of examples instead of just one.

act()

Send new dialog message.

class parlai.core.teachers.DialogTeacher(opt, shared=None)

A base teacher class for doing dialog with fixed chat logs.

This class provides a set a basic functionality:

  • uses data class to store and query text data
  • generates action tables to send to the student agent from the data

If you have opt.numthreads > 1, this also activates a shared memory array for the data and lock-protected shared-memory metrics.

In order to subclass this class, you must implement setup_data() in your class (or subclass another class which does, like FbDialogTeacher), which reads your data file as an iterator.

label_candidates()

Returns None by default, but override this in children (such as FbDialogTeacher) to load up candidate labels for every example.

class parlai.core.teachers.DialogData(opt, data_loader=None, cands=None, shared=None, **kwargs)

Provides a data structure for accessing textual dialog data. This can be used whenever the dialog data is a fixed log of chats (i.e not a simulator setting). The logs can include dialog text and possibly supervised labels, candidate labels and rewards.

All these are stored in this internal data format which is used by the DialogTeacher class.

Parameters:
  • opt – options to initialize the class
  • data_loader – an iterable with each call returning a tuple in the form ((x, y, r, c, i), new_episode?) where the x and new_episode fields are mandatory and other fields may be omitted or None.
  • cands – can be set to provide a list of candidate labels for every example in this dataset, which the agent can choose from (the correct answer should be in this set).
  • random – tells the data class whether or not to visit episodes sequentially or randomly when returning examples to the caller.

The contents of the ((x, y, r, c, i), new_episode?) tuples returned by the data loader is the following:

  • x (str) is a query and possibly context
  • y (iter) is an iterable of label(s) for that query
  • r (str) is the str reward for getting that query correct
  • c (iter) is an iterable of label candidates that the student can choose from
  • i (str) is a str path to an image on disk, which will be loaded by the data class at request-time. should always point to the raw image file.
  • new_episode? (bool) is a boolean value specifying whether that example is the start of a new episode. If you don’t use episodes set this to True every time.
num_episodes()

Return number of episodes in the dataset.

num_examples

Returns total number of entries available.

Each episode has at least one entry, but might have many more.

get(episode_idx, entry_idx=0)

Get the specified episode and the specified entry in that episode.

Parameters:
  • episode_idx – which episode to return examples from
  • entry_idx – which example to return from the episode. Many datasets have only single-entry episodes, so this defaults tozero.
build_table(entry)

Packs an entry into an action-observation dictionary.

Parameters:entry – a tuple in the form described in the class docstring.
class parlai.core.teachers.StreamDialogData(opt, data_loader=None, cands=None, shared=None, **kwargs)

Provides a data structure for streaming textual dialog data. This can be used whenever the dialog data follows the format described in DialogData but cannot fit entirely into memory.

Additional keyword-argument cycle defines if the stream should restart from the beginning after an epoch is finished (defaults to True).

Parameters:
  • opt – options to initialize the class
  • data_loader – an iterable with each call returning a tuple in the form ((x, y, r, c, i), new_episode?) where the x and new_episode fields are mandatory and other fields may be omitted or None.
  • cands – can be set to provide a list of candidate labels for every example in this dataset, which the agent can choose from (the correct answer should be in this set).
  • random – tells the data class whether or not to visit episodes sequentially or randomly when returning examples to the caller.
  • cycle – (default True) whether to restart at beginning when end of stream reached without reset being called.
load_length()

Calculates the length of the dataset and caches it in a file. Note that this can take some time for large datasets. Episode and entry indexes cannot be specified during streaming.

get()

Returns a the next entry from the stream in the current episode for this instance. When episode is done returns first entry of next episode.

reset()

Reset the datastream to its beginning.

class parlai.core.teachers.FbDialogTeacher(opt, shared=None)

This module provides access to data in the Facebook Dialog format.

Subclasses DialogTeacher for functionality and provides an implementation of setup_data() which iterates over datasets in the “fbdialog” format. If your data is in the format below, use this class to handle file parsing for you.

The way FB Dialog data is set up is as follows:

1 Sam went to the kitchen.
2 Pat gave Sam the milk.
3 Where is the milk?<TAB>kitchen<TAB>1<TAB>hallway|kitchen|bathroom
4 Sam went to the hallway.
5 Pat went to the bathroom.
6 Where is the milk?<TAB>hallway<TAB>1<TAB>hallway|kitchen|bathroom

Lines 1-6 represent a single episode, with two different examples: the first example is lines 1-3, and the second is lines 4-6.

Lines 1,2,4, and 5 represent contextual information.

Lines 3 and 6 contain a query, a label, a reward for getting the question correct, and three label candidates.

Since both of these examples are part of the same episode, the information provided in the first example is relevant to the query in the second example and therefore the agent must remember the first example in order to do well.

In general dialog in this format can contain any speech, not just QA pairs:

1 Hi how's it going?<TAB>It's going great. What's new?
2 Well I'm working on a new project at work.<TAB>Oh me too!
3 Oh cool!<TAB>Tell me about yours.

etc.

Note that dialogs are interpreted as being one-way. For example, consider this dialog:

1 X1    Y1
2 X2    Y2
3 X3    Y3

A set of examples X1 => Y1, X2 => Y2, and X3 => Y3 will be generated. However, Y1 => X2 and Y2 => X3 are not created as separate examples by default. This makes sense for some data (we don’t need to train on the idea that “kitchen” should be followed by “Sam went to the hallway...” above), but for other datasets it may be helpful to add additional examples in the reverse direction (“Oh cool!” is a response to “Oh me too!” above).

load_cands(path)

Load global fixed set of candidate labels that the teacher provides every example (the true labels for a specific example are also added to this set, so that it’s possible to get the right answer).

setup_data(path)

Reads data in the fbdialog format.

Returns ((x,y,r,c), new_episode?) tuples.

x represents a query, y represents the labels, r represents any reward, and c represents any label_candidates.

The example above will be translated into the following tuples:

x: 'Sam went to the kitchen\nPat gave Sam the milk\nWhere is the milk?'
y: ['kitchen']
r: '1'
c: ['hallway', 'kitchen', 'bathroom']
new_episode = True (this is the first example in the episode)
x: 'Sam went to the hallway\nPat went to the bathroom\nWhere is the
    milk?'
y: ['hallway']
r: '1'
c: ['hallway', 'kitchen', 'bathroom']
new_episode = False (this is the second example in the episode)
class parlai.core.teachers.ParlAIDialogTeacher(opt, shared=None)

This module provides access to data in the ParlAI Text Dialog format.

Subclasses FixedDialogTeacher for functionality and provides an implementation of setup_data() which iterates over datasets in the “ParlAI text” format. If your data is in the format below, use this class to handle file parsing for you.

The way the data is set up is as follows:

text:Sam went to the kitchen.

Pat gave Sam the milk. Where is the milk?<TAB>labels:kitchen<TAB>reward:1<TAB>label_candidates:hallway|kitchen|bathroom

text:Sam went to the hallway.

Pat went to the bathroom. Where is the milk?<TAB>labels:hallway<TAB>reward:1<TAB>label_candidateshallway|kitchen|bathroom<TAB>episode_done:True

Lines 1-2 represent a single episode, with a different example on each line. The lines contain a query and a label for getting the question correct, and three label candidates.

Since both of these examples are part of the same episode, the information provided in the first example is relevant to the query in the second example and therefore the agent must remember the first example in order to do well.

In general dialog this format can contain any speech, not just QA pairs:

text:Hi how's it going?<TAB>labels:It's going great. What's new?
text:Well I'm working on a new project at work.<TAB>labels:Oh me too!
text:Oh cool!<TAB>labels:Tell me about yours.

etc.

Note that dialogs are interpreted as being one-way. For example, consider this dialog:

1 X1    Y1
2 X2    Y2
3 X3    Y3

A set of examples X1 => Y1, X2 => Y2, and X3 => Y3 will be generated. However, Y1 => X2 and Y2 => X3 are not created as separate examples by default. This makes sense for some data (we don’t need to train on the idea that “kitchen” should be followed by “Sam went to the hallway...” above), but for other datasets it may be helpful to add additional examples in the reverse direction (“Oh cool!” is a response to “Oh me too!” above).