Creating an Agent

Author: Alexander Holden Miller

In this tutorial, we’ll be setting up an agent which learns from the data it sees to produce the right answers.

For this agent, we’ll be implementing a simple GRU Seq2Seq agent based on Sequence to Sequence Learning with Neural Networks (Sutskever et al. 2014) and Sean Robertson’s Seq2Seq PyTorch tutorial.

Part 1: Naming Things

In order to make programmatic importing easier, we use a simple naming scheme for our models, so that on the command line we can just type “–model seq2seq” (“-m seq2seq”) to load up the seq2seq model.

To this end, we create a folder under parlai/agents with the name seqseq, and then put an empty file there along with Then, we name our agent “Seq2seqAgent”.

The ParlAI argparser automatically tries to translate “–model seq2seq” to “parlai.agents.seq2seq.seq2seq:Seq2seqAgent”. Underscores in the name become capitals in the class name: “–model local_human” resides at “parlai.agents.local_human.local_human:LocalHumanAgent”.

If you need to put a model at a different path, you can specify the full path on the command line in the format above (with a colon in front of the class name). For example, “–model parlai.agents.remote_agent.remote_agent:ParsedRemoteAgent”.

Part 2: Main Agent Methods

First off, generally we should inherit from the Agent class in parlai.core.agents. This provides us with some default implementations (often, pass) of some utility functions like “shutdown”.

First let’s focus on the primary functions to implement: __init__, observe, and act.

The standard initialization parameters for agents are a dict of command-line parameters opt and an optional dict of shared parameters called shared.

For our Seq2Seq model we’ll call our parent init method, which does a few basic operations like setting self.observation to None and creating a deep copy of the opt dict. Don’t forget to pass the shared parameter to the parent init as well.

Then, we do a check to see if the shared parameter is set. When it is not None, it’s telling this instance to initialize with this particular state, as this instance will be used either for batched or hogwild training (depending on your preference).

A loose version of that implementation is this:

class ExampleSeq2seqAgent(Agent):

    def __init__(self, opt, shared=None):
        # initialize defaults first
        super().__init__(opt, shared)

        # ... some setup for both shared and original instances

        if not shared:
            # set up model from scratch
            # ... copy initialized data from shared table

To see more detail about sharing, batching, and hogwild in general, check out Data Handling, Batching, and Hogwild.

We’ll take a quick digression to describe how it applies to this agent.

Batching Example

Let’s say we are training our seq2seq model on babi:task10k:1. What happens behind the scenes for a batch size of 4 is that we actually create four shared versions of the bAbI Task10k teacher, and four shared versions of the seq2seq agent. These shared versions are initialized from the originals: for the bAbI teachers, they inherit the data from their parent agent, but they each have their own local state such as the current example they’re showing or how far through a bAbI episode they are (bAbI task 1 has five examples per episode).

For the seq2seq agent, each shared agent is keeping track of the previous examples they’ve seen in this same episode, since each observation does not repeat previously seen but related information–the agent has to remember it. Note that this only applies when the batch-sort commandline parameter is disabled (it’s enabled by default), but since it can be useful to disable it sometimes we’ll go into more detail here.

For example, in the first entry in the episode the agent could get something like the following: “John is in the bathroom. Mary is in the kitchen. Where is Mary?” And in the second example in the episode, the agent could get: “Mary picked up the milk. Mary went to the hallway. Where is John?” Here, the answer is in the first example’s context, so the agent had to remember the previous text it saw within the same episode.

Observations are generated by calling the act function on each teacher, then passing those observations to each agent by calling the observe function of the shared agents. The agents are free to transform the previous observation (for example, prepending previously seen text from the same episode, if applicable). These transformed observations are packed into a list, which is then passed to batch_act function our agent implements. We can implement batch_act differently from the simple act function to take advantage of the effects of batching over multiple examples when executing or updating our model.

Thus, since our agent’s shared-instances will only be used to keep track of state particular to their sequence of examples in the batch, we have barely anything to do when setting these shared instances up–they won’t be doing any intensive computation, just basic reading of the input.

The full initialization of the model is included further below, but is very particular to this particular implementation. Let’s talk more about the primary agent functions we need to define first.

Observing and Acting

Let’s take a look at the observe function. Here, we can modify the observation dict if necessary, and then return it to be queued for batching.

Check out again the observations documentation for more details about all of the fields contained the observations.

In this version, we first make a deep copy of the observation. Then, if this is not the first entry in an episode (some datasets like SQuAD have only one entry for every episode, but others like bAbI have multiple), then we prepend the previous text to the current text. We use a newline to separate them in case the model wants to recognize the difference between different lines.

Then, we store whether this is the last entry in the episode so that we’ll be ready to reset next time if we need to.

A simple version of this is shown here:

def observe(self, observation):
    observation = copy.deepcopy(observation)
    if not self.episode_done:
        # if the last example wasn't the end of an episode, then we need to
        # recall what was said in that example
        prev_dialogue = self.observation['text']
        observation['text'] = prev_dialogue + '\n' + observation['text']
    self.observation = observation
    self.episode_done = observation['episode_done']
    return observation

In the current implementation we use a utility function for more complex processing, but this is a suitable first step.

Next up is the act function. Since we are going to implement a batched version, we’ll just call the batched version from our single-example act to reduce code duplication.

def act(self):
    # call batch_act with this batch of one
    return self.batch_act([self.observation])[0]

Now it’s time for the batch_act function. This function gets a list of length batchsize of observations and returns a list of the same length with this agent’s replies.

We’ll follow this loose format:

  1. Set up our list of dicts to send back as replies, with the agent’s ID set.
  2. Convert the incoming observations into tensors to feed into our model.
  3. Produce predictions on the input text using the model. If labels were available, update the model as well.
  4. Unpack the predictions into the reply dicts and return them.
def batch_act(self, observations):
    batchsize = len(observations)
    # initialize a table of replies with this agent's id
    batch_reply = [{'id': self.getID()} for _ in range(batchsize)]

    # convert the observations into batches of inputs and targets
    # `labels` stores the true labels returned in the `ys` vector
    # `valid_inds` tells us the indices of all valid examples
    # e.g. for input [{}, {'text': 'hello'}, {}, {}], valid_inds is [1]
    # since the other three elements had no 'text' field
    xs, ys, labels, valid_inds, is_training = self.vectorize(observations)

    if xs is None:
        # no valid examples, just return empty responses
        return batch_reply

    predictions = self.predict(xs, ys, is_training)

    # maps returns predictions back to the right `valid_inds`
    # in the example above, a prediction `world` should reply to `hello`
        predictions.cpu().data, valid_inds, batch_reply, observations,
        self.dict, self.END_IDX, labels=labels,
        answers=labels, if ys is not None else None,
        report_freq=self.opt.get('report_freq', 0))

    return batch_reply

Since the implementation of vectorize and predict are particular to our model, we’ll table those for now. Next up, we’ll cover some of the other methods in the Agent API.

Part 3: Extended Agent API

There are a few other useful methods you may want to define in your agent to take of additional functionality one might want during training. Many of these functions will be automatically called if you use our example training function to train your model.


Agents can use this method to share any information they might want between different instances during batching or hogwild training. For example, during hogwild training all models are being trained indepedently in multiple processes, so you would want to share the model parameters between each one. Teacher classes use this method to share their data and metrics with other shared intances.

If you define this method, it’s usually a good idea to initialize the shared dict that’s begin return by calling super().share() first. For example, the Teacher class in parlai.core.agents defines it this way:

def share(self):
    """In addition to default Agent shared parameters, share metrics."""
    shared = super().share()
    shared['metrics'] = self.metrics
    return shared

In our seq2seq model, we’ll share a bunch of basic initial states. Most of the implementation is shown here:

def share(self):
    """Share internal states between parent and child instances."""
    shared = super().share()
    shared['opt'] = self.opt
    shared['dict'] = self.dict

    if self.opt.get('numthreads', 1) > 1:
        # we're doing hogwild so share the model too
        shared['encoder'] = self.encoder
        shared['decoder'] = self.decoder

    return shared


This function allows your model to do any final wrapup, such as writing any last logging info, saving an end-state version of the model if desired, or closing any open connections.

The standard ParlAI seq2seq model saves the model parameters to opt[‘model_file’] + ‘.shutdown_state’. In contrast, the agents in parlai/agents/remote_agent use this to close their open TCP connection after sending a shutdown signal through.

Most models won’t need to do anything in particular here.

Part 4: Finishing the Seq2Seq model

Here we’ll see how to add commandline arguments to the command line parser, and then we’ll take a look at the full details of __init__, vectorize, predict, and more.


We use this static method to add commandline arguments to the program.

def dictionary_class():
    return DictionaryAgent

def add_cmdline_args(argparser):
    """Add command-line arguments specifically for this agent."""
    agent = argparser.add_argument_group('Seq2Seq Arguments')
    agent.add_argument('-hs', '--hiddensize', type=int, default=128,
                       help='size of the hidden layers')
    agent.add_argument('-esz', '--embeddingsize', type=int, default=128,
                       help='size of the token embeddings')
    agent.add_argument('-nl', '--numlayers', type=int, default=2,
                       help='number of hidden layers')
    agent.add_argument('-lr', '--learningrate', type=float, default=1,
                       help='learning rate')
    agent.add_argument('-dr', '--dropout', type=float, default=0.1,
                       help='dropout rate')
    agent.add_argument('--no-cuda', action='store_true', default=False,
                       help='disable GPUs even if available')
    agent.add_argument('--gpu', type=int, default=-1,
                       help='which GPU device to use')
    agent.add_argument('-rf', '--report-freq', type=float, default=0.001,
                       help='Report frequency of prediction during eval.')
    return agent

Full __init__()

Here’s full code to get an initialization of a model working. We recommend storing model modules in a separate class and importing them (and if you’re using torch, extending nn.Module). We’ll show a version which defines its modules in the same file, since it’s a simple model.

Note that we’re showing the simple version from the PyTorch tutorial below. The full seq2seq implementation in ParlAI adds a lot more bells and whistles.

from parlai.core.agents import Agent
from parlai.core.dict import DictionaryAgent
from parlai.core.utils import PaddingUtils
from parlai.core.thread_utils import SharedTable

import torch
from torch.autograd import Variable
from torch import optim
import torch.nn as nn
import torch.nn.functional as F

import copy

class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size, numlayers):
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size, num_layers=numlayers,

    def forward(self, input, hidden):
        embedded = self.embedding(input)
        output, hidden = self.gru(embedded, hidden)
        return output, hidden

class DecoderRNN(nn.Module):
    def __init__(self, output_size, hidden_size, numlayers):
        self.hidden_size = hidden_size

        self.embedding = nn.Embedding(output_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size, num_layers=numlayers,
        self.out = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=2)

    def forward(self, input, hidden):
        emb = self.embedding(input)
        rel = F.relu(emb)
        output, hidden = self.gru(rel, hidden)
        scores = self.softmax(self.out(output))
        return scores, hidden

class ExampleSeq2seqAgent(Agent):

    def __init__(self, opt, shared=None):
        # initialize defaults first
        super().__init__(opt, shared)

        # check for cuda
        self.use_cuda = not opt.get('no_cuda') and torch.cuda.is_available()
        if opt.get('numthreads', 1) > 1:
            torch.set_num_threads(1) = 'Seq2Seq'

        if not shared:
            # set up model from scratch
            self.dict = DictionaryAgent(opt)
            hsz = opt['hiddensize']
            nl = opt['numlayers']

            # encoder captures the input text
            self.encoder = EncoderRNN(len(self.dict), hsz, nl)
            # decoder produces our output states
            self.decoder = DecoderRNN(len(self.dict), hsz, nl)

            if self.use_cuda:

            if opt.get('numthreads', 1) > 1:
            # ... copy initialized data from shared table
            self.opt = shared['opt']
            self.dict = shared['dict']

            if 'encoder' in shared:
                # hogwild shares model as well
                self.encoder = shared['encoder']
                self.decoder = shared['decoder']

        if hasattr(self, 'encoder'):
            # we set up a model for original instance and multithreaded ones
            self.criterion = nn.NLLLoss()

            # set up optims for each module
            lr = opt['learningrate']
            self.optims = {
                'encoder': optim.SGD(self.encoder.parameters(), lr=lr),
                'decoder': optim.SGD(self.decoder.parameters(), lr=lr),

            self.longest_label = 1
            self.hiddensize = opt['hiddensize']
            self.numlayers = opt['numlayers']
            # we use END markers to end our output
            self.END_IDX = self.dict[self.dict.end_token]
            # get index of null token from dictionary (probably 0)
            self.NULL_IDX = self.dict[self.dict.null_token]
            # we use START markers to start our output
            self.START_IDX = self.dict[self.dict.start_token]
            self.START = torch.LongTensor([self.START_IDX])
            if self.use_cuda:
                self.START = self.START.cuda()


    def reset(self):
        """Reset observation and episode_done."""
        self.observation = None
        self.episode_done = True


The batchify function takes in a list of observations and turns them into tensors to use with our model.

def vectorize(self, observations):
    """Convert a list of observations into input & target tensors."""
    is_training = any(('labels' in obs for obs in observations))
    # utility function for padding text and returning lists of indices
    # parsed using the provided dictionary
    xs, ys, labels, valid_inds, _, _ = PaddingUtils.pad_text(
        observations, self.dict, end_idx=self.END_IDX,
        null_idx=self.NULL_IDX, dq=False, eval_labels=True)
    if xs is None:
        return None, None, None, None, None

    # move lists of indices returned above into tensors
    xs = torch.LongTensor(xs)
    if self.use_cuda:
        xs = xs.cuda()
    xs = Variable(xs)

    if ys is not None:
        ys = torch.LongTensor(ys)
        if self.use_cuda:
            ys = ys.cuda()
        ys = Variable(ys)

    return xs, ys, labels, valid_inds, is_training


The predict function returns an output from our model. If the targets are provided, then it also updates the model. The predictions will be biased in this case, since we condition each token on the true label token, but we are okay with that–it just improves training F1 scores.

def predict(self, xs, ys=None, is_training=False):
    """Produce a prediction from our model.
    Update the model using the targets if available.
    bsz = xs.size(0)
    zeros = Variable(torch.zeros(self.numlayers, bsz, self.hiddensize))
    if self.use_cuda:
        zeros = zeros.cuda()
    starts = Variable(self.START)
    starts = starts.expand(bsz, 1)  # expand to batch size

    if is_training:
        loss = 0
        target_length = ys.size(1)
        # save largest seen label for later
        self.longest_label = max(target_length, self.longest_label)

        encoder_outputs, encoder_hidden = self.encoder(xs, zeros)

        # Teacher forcing: Feed the target as the next input
        y_in = ys.narrow(1, 0, ys.size(1) - 1)
        decoder_input =[starts, y_in], 1)
        decoder_output, decoder_hidden = self.decoder(decoder_input,

        scores = decoder_output.view(-1, decoder_output.size(-1))
        loss = self.criterion(scores, ys.view(-1))

        _max_score, idx = decoder_output.max(2)
        predictions = idx
        # just predict
        encoder_output, encoder_hidden = self.encoder(xs, zeros)
        decoder_hidden = encoder_hidden

        predictions = []
        scores = []
        done = [False for _ in range(bsz)]
        total_done = 0
        decoder_input = starts

        for _ in range(self.longest_label):
            # generate at most longest_label tokens
            decoder_output, decoder_hidden = self.decoder(decoder_input,
            _max_score, idx = decoder_output.max(2)
            preds = idx
            decoder_input = preds

            # check if we've produced the end token
            for b in range(bsz):
                if not done[b]:
                    # only add more tokens for examples that aren't done
                    if[b][0] == self.END_IDX:
                        # if we produced END, we're done
                        done[b] = True
                        total_done += 1
            if total_done == bsz:
                # no need to generate any more
        predictions =, 1)

    return predictions

For other utility functions like loading from file, or to see any new features that we may have added to the model such as attention over the input or ranking candidates, check out the source code at parlai/agents/seq2seq.

Full Implementation & running this model

You can see the full code for this here.

You can try this model now with a command like the following:

# batchsize 32, numthreads 1
python examples/ -t babi:task10k:1 --dict-file /tmp/dict_babi:task10k:1 -bs 32 -vtim 30 -vcut 0.95 -m example_seq2seq

# batchsize 1, numthreads 40, no cuda, lower learning rate
python examples/ -t babi:task10k:1 --dict-file /tmp/dict_babi:task10k:1 -bs 32 -vtim 30 -vcut 0.95 -m example_seq2seq --no-cuda -lr 0.01