This module provides a set of teachers that deal with dialog:
FixedDialogTeacher(Teacher)Base class for teachers in tasks that have fixed dialog - i.e., dialog that is not dynamically generated but rather is pulled from set examples. However, the class can be extended to all tasks involved fixed data. Implements much of the basic functionality of these teachers, including
- Base teacher class for doing dialog specifically with fixed chat logs.
- Teacher class that provides access to data in the Facebook Dialog format. See the class description for more details.
This module also includes
DataLoader, a threadpool data loader for
StreamDialogData, data structures for accessing textual
dialog data and utilized by
A worker thread that provides a threadpool for data loading.
A teacher may submit a request to the loader, which will return the appropriate data.
To submit a request, a teacher should call
request_loadwith the following arguments:
receive_fn- a receive function (for receiving the data)
load_fn- a load function (for loading the data)
args- arguments for the load function
- -> args can be either a dictionary of arguments for a function, or
- a list of positional arguments
A teacher agent for all teachers involved in tasks with fixed data.
This class provides the following functionality for its subclasses:
- Resets a teacher
- Provides an observe method
- Computes and retrieves the next episode index for a teacher
- Provides a threadpool option for loading data (especially useful for large data, e.g. images)
To utilize the DataLoader for threadpool loading, a teacher should implement the
submit_load_requestfunction to send a load request to the DataLoader by calling
self.data_loader.request_loadwith the appropriate arguments (
receive_fn, load_fn, args). The DataLoader then returns the data to the teacher’s
data_queue, which the teacher can poll in its
The following is an example of the DataLoader usage in the VQA-V1 teacher.
In the teacher’s
initfunction, the teacher calls its
submit_load_requestfunction to preload an image.
submit_load_requestfunction gets the next
episode_idx, and computes the image path for the load request.
At the end of
submit_load_request, the teacher calls
self.data_loader.request_loadwith three args: -
self.receive_data- the function that the DataLoader calls to
return the the loaded object
self.image_loader.load- the function used to load the image
- from the image path
[img_path]- a list of arguments for the load function, which
- in this case is the path of the image.
- In the teacher’s
actfunction, the teacher loads the data from its data queue.
- At the end of the
actfunction, the teacher calls
submit_load_requestto preload an image for the next example.
Reset the dialog so that it is at the start of the epoch, and all metrics are reset.
An agent should implement this method to submit requests to the data loader. At the end of this method, the agent should call
self.data_loader.request_load()with the appropriate args.
Function for receiving data from the data loader.
Get the number of episodes in this dataset.
Get the total number of examples in this dataset.
Get the specified episode and the specified entry in that episode.
Many datasets have only single-entry episodes, so entry_idx defaults to zero. Children must override this method in order to inherit the next_example method.
Process observation for metrics.
Send new dialog message.
A base teacher class for doing dialog with fixed chat logs.
This class provides a set a basic functionality:
- uses data class to store and query text data
- generates action tables to send to the student agent from the data
- metrics tracking count of sent vs correctly answered queries
If you have
opt.numthreads > 1, this also activates a shared memory array for the data and lock-protected shared-memory metrics.
In order to subclass this class, you must implement
setup_data()in your class (or subclass another class which does, like
FbDialogTeacher), which reads your data file as an iterator.
Noneby default, but override this in children (such as
FbDialogTeacher) to load up candidate labels for every example.
DialogData(opt, data_loader=None, cands=None, shared=None, **kwargs)¶
Provides a data structure for accessing textual dialog data. This can be used whenever the dialog data is a fixed log of chats (i.e not a simulator setting). The logs can include dialog text and possibly supervised labels, candidate labels and rewards.
All these are stored in this internal data format which is used by the
data_loaderis an iterable, with each call returning:
(x, ...), new_episode?
xis a query and possibly context
...can contain additional fields, specifically
yis an iterable of label(s) for that query
ris the str reward for getting that query correct
cis an iterable of label candidates that the student can choose from
iis a str path to an image on disk, which will be loaded by the data class at request-time. should always point to the raw image file.
new_episode?is a boolean value specifying whether that example is the start of a new episode. If you don’t use episodes set this to
candscan be set to provide a list of candidate labels for every example in this dataset, which the agent can choose from (the correct answer should be in this set).
randomtells the data class whether or not to visit episodes sequentially or randomly when returning examples to the caller.
Return number of episodes in the dataset.
Returns total number of entries available. Each episode has at least one entry, but might have many more.
Returns a specific entry from the dataset.
Packs an entry into an action-observation dictionary.
StreamDialogData(opt, data_loader=None, cands=None, shared=None, **kwargs)¶
Provides a data structure for streaming textual dialog data. This can be used whenever the dialog data follows the format described in DialogData but cannot fit entirely into memory.
Additional keyword-argument cycle defines if the stream should restart from the beginning after an epoch is finished (defaults to True).
Returns a the next entry from the stream in the current episode for this instance. When episode is done returns first entry of next episode.
Reset the datastream to its beginning
This module provides access to data in the Facebook Dialog format.
DialogTeacherfor functionality and provides an implementation of
setup_data()which iterates over datasets in the “fbdialog” format.
The way FB Dialog data is set up is as follows:
1 Sam went to the kitchen. 2 Pat gave Sam the milk. 3 Where is the milk?<TAB>kitchen<TAB>1<TAB>hallway|kitchen|bathroom 4 Sam went to the hallway 5 Pat went to the bathroom 6 Where is the milk?<TAB>hallway<TAB>1<TAB>hallway|kitchen|bathroom
Lines 1-6 represent a single episode, with two different examples: the first example is lines 1-3, and the second is lines 4-6.
Lines 1,2,4, and 5 represent contextual information.
Lines 3 and 6 contain a query, a label, a reward for getting the question correct, and three label candidates.
Since both of these examples are part of the same episode, the information provided in the first example is relevant to the query in the second example and therefore the agent must remember the first example in order to do well.
In general dialog in this format can be any speech, not just QA pairs:
1 Hi how's it going?<TAB>It's going great. What's new? 2 Well I'm working on a new project at work.<TAB>Oh me too! 3 Oh cool!<TAB>Tell me about yours.
Load global fixed set of candidate labels that the teacher provides every example (the true labels for a specific example are also added to this set, so that it’s possible to get the right answer).
Reads data in the fbdialog format.
xrepresents a query,
yrepresents the labels,
rrepresents any reward, and
crepresents any label_candidates.
The example above will be translated into the following tuples:
x: 'Sam went to the kitchen\nPat gave Sam the milk\nWhere is the milk?' y: ['kitchen'] r: '1' c: ['hallway', 'kitchen', 'bathroom'] new_episode = True (this is the first example in the episode)
x: 'Sam went to the hallway\nPat went to the bathroom\nWhere is the milk?' y: ['hallway'] r: '1' c: ['hallway', 'kitchen', 'bathroom'] new_episode = False (this is the second example in the episode)