parlai.core.torch_agent¶
Torch Agent implements much of the boilerplate necessary for creating
a neural dialogue agent, so you can focus on modeling. Torch Agent limits its
functionality to maintaining dialogue history, transforming text into vectors of
indices, and loading/saving models. The user is required to implement their own
logic in methods like train_step
and eval_step
.
Torch Ranker Agent and Torch Generator Agent have more specialized stub methods, and provide many rich features and benefits. Torch Ranker Agent assumes your model ranks possible responses from a set of possible candidates, and provides options around negative sampling, candidate sampling, and large-scale candidate prediction. Torch Generator Agent assumes your model generates utterances auto-regressively, and provides generic implementations of beam search.
Torch Agent¶
General utility code for building PyTorch-based agents in ParlAI.
Contains the following main utilities:
TorchAgent class which serves as a useful parent class for other model agents
Batch namedtuple which is the input type of the main abstract methods of the TorchAgent class
Output namedtuple which is the expected output type of the main abstract methods of the TorchAgent class
History class which handles tracking the dialogue state over the course of an episode.
See below for documentation on each specific tool.
- class parlai.core.torch_agent.Batch(text_vec=None, text_lengths=None, label_vec=None, label_lengths=None, labels=None, valid_indices=None, candidates=None, candidate_vecs=None, reward=None, image=None, is_training: Optional[bool] = None, _context_original_length: Optional[LongTensor] = None, _context_truncate_rate: Optional[LongTensor] = None, _context_truncated_length: Optional[LongTensor] = None, _label_original_length: Optional[LongTensor] = None, _label_truncate_rate: Optional[LongTensor] = None, _label_truncated_length: Optional[LongTensor] = None, **kwargs)[source]¶
Bases:
AttrDict
Batch is a namedtuple containing data being sent to an agent.
This is the input type of the train_step and eval_step functions. Agents can override the batchify function to return a Batch with additional fields if they would like, though we recommend calling the parent function to set up these fields as a base.
Batch objects contain some magic semantics when dealing with CUDA. Namely, Batch objects have a to() method that can be used to send all tensors to a particular device (GPU). This is undesireable in some instances, as some fields may be used only for accumulating metrics, or are only used on CPU. Prefixing a field with an underscore will prevent it from being transferred to GPU.
Note that in upcoming versions of ParlAI, we will enable features for getting speedups in training which work best when the number of non-Tensor objects in a batch is minimal.
- Parameters
text_vec – bsz x seqlen tensor containing the parsed text data.
label_vec – bsz x seqlen tensor containing the parsed label (one per batch row).
labels – list of length bsz containing the selected label for each batch row (some datasets have multiple labels per input example).
valid_indices – tensor of length bsz containing the original indices of each example in the batch. we use these to map predictions back to their proper row, since e.g. we may sort examples by their length or some examples may be invalid.
candidates – list of lists of text. outer list has size bsz, inner lists vary in size based on the number of candidates for each row in the batch.
candidate_vecs – list of lists of tensors. outer list has size bsz, inner lists vary in size based on the number of candidates for each row in the batch.
image – list of image features in the format specified by the –image-mode arg.
reward – Tensor containing the “reward” field of observations, if present
- __init__(text_vec=None, text_lengths=None, label_vec=None, label_lengths=None, labels=None, valid_indices=None, candidates=None, candidate_vecs=None, reward=None, image=None, is_training: Optional[bool] = None, _context_original_length: Optional[LongTensor] = None, _context_truncate_rate: Optional[LongTensor] = None, _context_truncated_length: Optional[LongTensor] = None, _label_original_length: Optional[LongTensor] = None, _label_truncate_rate: Optional[LongTensor] = None, _label_truncated_length: Optional[LongTensor] = None, **kwargs)[source]¶
Initialize AttrDict using input dict.
- class parlai.core.torch_agent.Output(text=None, text_candidates=None, **kwargs)[source]¶
Bases:
AttrDict
Output is an object containing agent predictions.
This is the expected return type of the train_step and eval_step functions, though agents can choose to return None if they do not want to answer.
- Parameters
text (List[str]) – list of strings of length bsz containing the predictions of the model
text_candidates (List[List[str]]) – list of lists of length bsz containing ranked predictions of the model. each sub-list is an ordered ranking of strings, of variable length.
- class parlai.core.torch_agent.History(opt, field='text', maxlen=None, size=-1, p1_token='__p1__', p2_token='__p2__', dict_agent=None)[source]¶
Bases:
object
History handles tracking the dialogue state over the course of an episode.
History may also be used to track the history of any field.
- Parameters
field – field in the observation to track over the course of the episode (defaults to ‘text’)
maxlen – sets the maximum number of tunrs
p1_token – token indicating ‘person 1’; opt must have ‘person_tokens’ set to True for this to be added
p1_token – token indicating ‘person 2’; opt must have ‘person_tokens’ set to True for this to be added
dict_agent – DictionaryAgent object for tokenizing the history
- __init__(opt, field='text', maxlen=None, size=-1, p1_token='__p1__', p2_token='__p2__', dict_agent=None)[source]¶
- update_history(obs: Message, temp_history: Optional[str] = None)[source]¶
Update the history with the given observation.
- Parameters
obs – Observation used to update the history.
temp_history – Optional temporary string. If it is not None, this string will be appended to the end of the history. It will not be in the history on the next dialogue turn. Set to None to stop adding to the history.
- class parlai.core.torch_agent.TorchAgent(opt: Opt, shared=None)[source]¶
Bases:
ABC
,Agent
A provided abstract base agent for any model that wants to use Torch.
Exists to make it easier to implement a new agent. Not necessary, but reduces duplicated code.
Many methods are intended to be either used as is when the default is acceptable, or to be overriden and called with super(), with the extra functionality added to the initial result. See the method comment for recommended behavior.
This agent serves as a common framework for all ParlAI models which want to use PyTorch.
- classmethod optim_opts()[source]¶
Fetch optimizer selection.
By default, collects everything in torch.optim, as well as importing: - qhm / qhmadam if installed from github.com/facebookresearch/qhoptim
Override this (and probably call super()) to add your own optimizers.
- static dictionary_class()[source]¶
Return the dictionary class that this agent expects to use.
Can be overridden if a more complex dictionary is required.
- classmethod history_class()[source]¶
Return the history class that this agent expects to use.
Can be overridden if a more complex history is required.
- classmethod add_cmdline_args(parser: ParlaiParser, partial_opt: Optional[Opt] = None) ParlaiParser [source]¶
Add the default commandline args we expect most agents to want.
- build_dictionary()[source]¶
Return the constructed dictionary, which will be set to self.dict.
If you need to add additional tokens to the dictionary, this is likely the right place to do it.
- init_optim(params, optim_states=None, saved_optim_type=None, is_finetune: bool = False) bool [source]¶
Initialize optimizer with model parameters.
- Parameters
params – parameters from the model
optim_states – optional argument providing states of optimizer to load
saved_optim_type – type of optimizer being loaded, if changed will skip loading optimizer states
is_finetune – bool indicating whether this training run is a fine-tune or not
- Returns
boolean indicating whether the optimizer failed to initialize with optim_states.
- build_lr_scheduler(states=None, hard_reset=False)[source]¶
Create the learning rate scheduler, and assign it to self.scheduler. This scheduler will be updated upon a call to receive_metrics. May also create self.warmup_scheduler, if appropriate.
- Parameters
states (state_dict) – Possible state_dict provided by model checkpoint, for restoring LR state
hard_reset (bool) – If true, the LR scheduler should ignore the state dictionary.
- record_local_metric(keyname: str, values: List[Metric])[source]¶
Record an example-level metric for all items in the batch.
Local metrics are maybe recorded anywhere within batch act. They will automatically be collated and returned at the end of batch_act. The beginning of batch_act resets these, so you may not use them during observe.
Example local metrics include ppl, token_acc, any other agent-specific metrics.
Share fields from parent as well as useful objects in this class.
Subclasses will likely want to share their model as well.
- vectorize(obs, history, add_start=True, add_end=True, text_truncate=None, label_truncate=None)[source]¶
Make vectors out of observation fields and store in the observation.
In particular, the ‘text’ and ‘labels’/’eval_labels’ fields are processed and a new field is added to the observation with the suffix ‘_vec’.
If you want to use additional fields on your subclass, you can override this function, call super().vectorize(…) to process the text and labels, and then process the other fields in your subclass.
Additionally, if you want to override some of these default parameters, then we recommend using a pattern like:
def vectorize(self, *args, **kwargs): kwargs['add_start'] = False return super().vectorize(*args, **kwargs)
- Parameters
obs – Single observation from observe function.
add_start – default True, adds the start token to each label.
add_end – default True, adds the end token to each label.
text_truncate – default None, if set truncates text vectors to the specified length.
label_truncate – default None, if set truncates label vectors to the specified length.
- Returns
the input observation, with ‘text_vec’, ‘label_vec’, and ‘cands_vec’ fields added.
- batchify(obs_batch, sort=False)[source]¶
Create a batch of valid observations from an unchecked batch.
A valid observation is one that passes the lambda provided to the function, which defaults to checking if the preprocessed ‘text_vec’ field is present which would have been set by this agent’s ‘vectorize’ function.
Returns a namedtuple Batch. See original definition above for in-depth explanation of each field.
If you want to include additional fields in the batch, you can subclass this function and return your own “Batch” namedtuple: copy the Batch namedtuple at the top of this class, and then add whatever additional fields that you want to be able to access. You can then call super().batchify(…) to set up the original fields and then set up the additional fields in your subclass and return that batch instead.
- Parameters
obs_batch – List of vectorized observations
sort – Default False, orders the observations by length of vectors. Set to true when using torch.nn.utils.rnn.pack_padded_sequence. Uses the text vectors if available, otherwise uses the label vectors if available.
- match_batch(batch_reply, valid_inds, output=None)[source]¶
Match sub-batch of predictions to the original batch indices.
Batches may be only partially filled (i.e when completing the remainder at the end of the validation or test set), or we may want to sort by e.g the length of the input sequences if using pack_padded_sequence.
This matches rows back with their original row in the batch for calculating metrics like accuracy.
If output is None (model choosing not to provide any predictions), we will just return the batch of replies.
Otherwise, output should be a parlai.core.torch_agent.Output object. This is a namedtuple, which can provide text predictions and/or text_candidates predictions. If you would like to map additional fields into the batch_reply, you can override this method as well as providing your own namedtuple with additional fields.
- Parameters
batch_reply – Full-batchsize list of message dictionaries to put responses into.
valid_inds – Original indices of the predictions.
output – Output namedtuple which contains sub-batchsize list of text outputs from model. May be None (default) if model chooses not to answer. This method will check for
text
andtext_candidates
fields.
- get_temp_history(observation) Optional[str] [source]¶
Return a string to temporarily insert into history for a single turn.
NOTE: This does NOT attempt to provide any sort of delimiter or spacing between the original history and the temporary history. If you require such delimiter or spacing, you should include it in the temp history.
Intentionally overridable so more complex models can insert temporary history strings, i.e. strings that are removed from the history after a single turn.
- observe(observation)[source]¶
Process incoming message in preparation for producing a response.
This includes remembering the past history of the conversation.
- self_observe(self_message: Message) None [source]¶
Observe one’s own utterance.
This is used so that the agent can incorporate its own response into the dialogue history after a batch_act. Failure to implement this will result in an agent that cannot hear itself speak.
- Parameters
self_message – The message corresponding to the output from batch_act.
- save_nonprimary(path=None)[source]¶
Save model parameters, when you are working on the non-primary worker.
For models or optimizers that shard parameters, this ensures we sync.
- save(path=None)[source]¶
Save model parameters to path (or default to model_file arg).
Please try to refrain from overriding this function, and instead override state_dict(self) for more specific saving.
- load_state_dict(state_dict)[source]¶
Load the state dict into model.
This is easily overridable to facilitate transfer of state dicts.
- load(path: str) Dict[str, Any] [source]¶
Return opt and model states.
Override this method for more specific loading.
- classmethod upgrade_opt(opt_from_disk: Opt)[source]¶
Upgrade legacy options when loading an opt file from disk.
This is primarily made available to provide a safe space to handle backwards-compatible behavior. For example, perhaps we introduce a new option today, which wasn’t previously available. We can have the argument have a new default, but fall back to the “legacy” compatibility behavior if the option doesn’t exist.
upgrade_opt
provides an opportunity for such checks for backwards compatibility. It is called shortly after loading the opt file from disk, and is called before the Agent is initialized.Other possible examples include:
Renaming an option,
Deprecating an old option,
Splitting coupled behavior, etc.
Implementations of
upgrade_opt
should conform to high standards, due to the risk of these methods becoming complicated and difficult to reason about. We recommend the following behaviors:1.
upgrade_opt
should only be used to provide backwards compatibility. Other behavior should find a different location. 2. Children should always call the parent’supgrade_opt
first. 3.upgrade_opt
should always warn when an option was overwritten. 4. Include comments annotating the date and purpose of each upgrade. 5. Add an integration test which ensures your old work behaves appropriately.
- batch_act(observations)[source]¶
Process a batch of observations (batchsize list of message dicts).
These observations have been preprocessed by the observe method.
Subclasses can override this for special functionality, but if the default behaviors are fine then just override the
train_step
andeval_step
methods instead. The former is called when labels are present in the observations batch; otherwise, the latter is called.
- backward(loss, **kwargs)[source]¶
Perform a backward pass.
It is recommended you use this instead of loss.backward(), for integration with distributed training and FP16 training.
Torch Generator Agent¶
Generic PyTorch-based Generator agent.
Implements quite a bit of boilerplate, including forced-decoding loss and a tree search.
Contains the following utilities:
ref:TorchGeneratorAgent class, which serves as a useful parent for generative torch agents.
Beam class which provides some generic beam functionality for classes to use
- class parlai.core.torch_generator_agent.SearchBlocklist(dict_agent: DictionaryAgent)[source]¶
Bases:
object
Search block list facilitates blocking ngrams from being generated.
- __init__(dict_agent: DictionaryAgent) None [source]¶
- class parlai.core.torch_generator_agent.TorchGeneratorModel(padding_idx=0, start_idx=1, end_idx=2, unknown_idx=3, input_dropout=0, longest_label=1, **kwargs)[source]¶
Bases:
Module
,ABC
Abstract TorchGeneratorModel.
This interface expects you to implement model with the following reqs:
- Attribute model.encoder
takes input returns tuple (enc_out, enc_hidden, attn_mask)
- Attribute model.decoder
takes decoder params and returns decoder outputs after attn
- Attribute model.output
takes decoder outputs and returns distr over dictionary
- __init__(padding_idx=0, start_idx=1, end_idx=2, unknown_idx=3, input_dropout=0, longest_label=1, **kwargs)[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- decode_forced(encoder_states, ys)[source]¶
Decode with a fixed, true sequence, computing loss.
Useful for training, or ranking fixed candidates.
- Parameters
ys (LongTensor[bsz, time]) – the prediction targets. Contains both the start and end tokens.
encoder_states (model specific) – Output of the encoder. Model specific types.
- Returns
pair (logits, choices) containing the logits and MLE predictions
- Return type
(FloatTensor[bsz, ys, vocab], LongTensor[bsz, ys])
- abstract reorder_encoder_states(encoder_states, indices)[source]¶
Reorder encoder states according to a new set of indices.
This is an abstract method, and must be implemented by the user.
Its purpose is to provide beam search with a model-agnostic interface for beam search. For example, this method is used to sort hypotheses, expand beams, etc.
For example, assume that encoder_states is an bsz x 1 tensor of values
indices = [0, 2, 2] encoder_states = [[0.1] [0.2] [0.3]]
then the output will be
output = [[0.1] [0.3] [0.3]]
- Parameters
encoder_states (model specific) – output from encoder. type is model specific.
indices (list[int]) – the indices to select over. The user must support non-tensor inputs.
- Returns
The re-ordered encoder states. It should be of the same type as encoder states, and it must be a valid input to the decoder.
- Return type
model specific
- abstract reorder_decoder_incremental_state(incremental_state, inds)[source]¶
Reorder incremental state for the decoder.
Used to expand selected beams in beam search. Unlike reorder_encoder_states, implementing this method is optional. However, without incremental decoding, decoding a single beam becomes O(n^2) instead of O(n), which can make beam search impractically slow.
In order to fall back to non-incremental decoding, just return None from this method.
- Parameters
incremental_state (model specific) – second output of model.decoder
inds (LongTensor[n]) – indices to select and reorder over.
- Returns
The re-ordered decoder incremental states. It should be the same type as incremental_state, and usable as an input to the decoder. This method should return None if the model does not support incremental decoding.
- Return type
model specific
- forward(*xs, ys=None, prev_enc=None, maxlen=None, bsz=None)[source]¶
Get output predictions from the model.
- Parameters
xs (LongTensor[bsz, seqlen]) – input to the encoder
ys (LongTensor[bsz, outlen]) – Expected output from the decoder. Used for teacher forcing to calculate loss.
prev_enc – if you know you’ll pass in the same xs multiple times, you can pass in the encoder output from the last forward pass to skip recalcuating the same encoder output.
maxlen – max number of tokens to decode. if not set, will use the length of the longest label this model has seen. ignored when ys is not None.
bsz – if ys is not provided, then you must specify the bsz for greedy decoding.
- Returns
(scores, candidate_scores, encoder_states) tuple
scores contains the model’s predicted token scores. (FloatTensor[bsz, seqlen, num_features])
candidate_scores are the score the model assigned to each candidate. (FloatTensor[bsz, num_cands])
encoder_states are the output of model.encoder. Model specific types. Feed this back in to skip encoding on the next call.
- class parlai.core.torch_generator_agent.PPLMetric(numer: Union[int, float, Tensor], denom: Union[int, float, Tensor] = 1)[source]¶
Bases:
AverageMetric
- class parlai.core.torch_generator_agent.TorchGeneratorAgent(opt: Opt, shared=None)[source]¶
Bases:
TorchAgent
,ABC
Abstract Generator agent; only meant to be extended.
TorchGeneratorAgent aims to handle much of the bookkeeping and infrastructure work for any generative models, like seq2seq or transformer. It implements the train_step and eval_step. The only requirement is that your model must be implemented with the TorchGeneratorModel interface.
- classmethod upgrade_opt(opt_from_disk: Opt)[source]¶
Upgrade legacy options when loading an opt file from disk.
This is primarily made available to provide a safe space to handle backwards-compatible behavior. For example, perhaps we introduce a new option today, which wasn’t previously available. We can have the argument have a new default, but fall back to the “legacy” compatibility behavior if the option doesn’t exist.
upgrade_opt
provides an opportunity for such checks for backwards compatibility. It is called shortly after loading the opt file from disk, and is called before the Agent is initialized.Other possible examples include:
Renaming an option,
Deprecating an old option,
Splitting coupled behavior, etc.
Implementations of
upgrade_opt
should conform to high standards, due to the risk of these methods becoming complicated and difficult to reason about. We recommend the following behaviors:1.
upgrade_opt
should only be used to provide backwards compatibility. Other behavior should find a different location. 2. Children should always call the parent’supgrade_opt
first. 3.upgrade_opt
should always warn when an option was overwritten. 4. Include comments annotating the date and purpose of each upgrade. 5. Add an integration test which ensures your old work behaves appropriately.
- classmethod add_cmdline_args(parser: ParlaiParser, partial_opt: Optional[Opt] = None) ParlaiParser [source]¶
Add command line arguments.
- build_criterion()[source]¶
Construct and return the loss function.
By default torch.nn.CrossEntropyLoss.
If overridden, this model should produce a sum that can be used for a per-token loss.
Share internal states between parent and child instances.
- batchify(obs_batch, sort=False)[source]¶
Create a batch of valid observations from an unchecked batch.
A valid observation is one that passes the lambda provided to the function, which defaults to checking if the preprocessed ‘text_vec’ field is present which would have been set by this agent’s ‘vectorize’ function.
Returns a namedtuple Batch. See original definition above for in-depth explanation of each field.
If you want to include additional fields in the batch, you can subclass this function and return your own “Batch” namedtuple: copy the Batch namedtuple at the top of this class, and then add whatever additional fields that you want to be able to access. You can then call super().batchify(…) to set up the original fields and then set up the additional fields in your subclass and return that batch instead.
- Parameters
obs_batch – List of vectorized observations
sort – Default False, orders the observations by length of vectors. Set to true when using torch.nn.utils.rnn.pack_padded_sequence. Uses the text vectors if available, otherwise uses the label vectors if available.
- record_per_token_metrics(batch, loss_per_token)[source]¶
Override this method for custom loss values that require loss_per_token.
- compute_loss(batch, return_output=False)[source]¶
Compute and return the loss for the given batch.
Easily overridable for customized loss functions.
If return_output is True, the full output from the call to self.model() is also returned, via a (loss, model_output) pair.
- class parlai.core.torch_generator_agent.TreeSearch(beam_size, block_ngram=-1, context_block_ngram=-1, padding_token=0, bos_token=1, eos_token=2, min_length=3, device='cpu', length_penalty=0.65, verbose=False, gpu_beam_blocking=False, dict=None)[source]¶
Bases:
object
Abstract Tree Search class.
It keeps information about beam_size concurrent, developing hypotheses. Concrete implementations make choices about which token to explore next at each point in the tree. Different choices result in different generation algorithms.
- __init__(beam_size, block_ngram=-1, context_block_ngram=-1, padding_token=0, bos_token=1, eos_token=2, min_length=3, device='cpu', length_penalty=0.65, verbose=False, gpu_beam_blocking=False, dict=None)[source]¶
Instantiate Beam object.
- Parameters
beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary
- set_context(context: LongTensor) TSType [source]¶
Set the internal context representation and return self.
- Parameters
context – a LongTensor representing the input context; used for context ngram blocking, if supplied
- set_batch_context(batch_context_list: LongTensor, batch_idx: int, gpu_beam_blocking: bool) TSType [source]¶
Version of .set_context() that operates on a single element of a batch.
Set the internal context representation and return self.
- Parameters
batch_context_list – a list of lists, each one containing the context for one member of the batch
batch_idx – index of the batch
gpu_beam_blocking – whether we are using gpu kernel for beam blocking, if so return a tensor, else return a list.
- abstract select_paths(logprobs, prior_scores, current_length) _PathSelection [source]¶
Select the next vocabulary item in these beams.
- Parameters
logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens
- Returns
a {hypothesis_ids, token_ids, scores, token_details} , where:
hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.
- get_rescored_finished(n_best=None)[source]¶
Return finished hypotheses according to adjusted scores.
Score adjustment is done according to the Google NMT paper, which penalizes long utterances.
- Parameters
n_best – number of finalized hypotheses to return
- Returns
- list of (tokens, score, token_metadata) 3-tuples, in sorted order, where:
tokens is a tensor of token ids
score is the adjusted log probability of the entire utterance
- token_metadata dictionary:
token_logprobs -> a tensor of conditional log probabilities of tokens token_ranks -> a tensor of ranks of tokens in vocabulator, by probability, when sampled
- class parlai.core.torch_generator_agent.GreedySearch(*args, **kwargs)[source]¶
Bases:
TreeSearch
Greedy search.
Picks the highest probability utterance at each step. Only works with –beam-size 1.
- __init__(*args, **kwargs)[source]¶
Instantiate Beam object.
- Parameters
beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary
- select_paths(logprobs, prior_scores, current_length) _PathSelection [source]¶
Select the next vocabulary item in these beams.
- Parameters
logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens
- Returns
a {hypothesis_ids, token_ids, scores, token_details} , where:
hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.
- class parlai.core.torch_generator_agent.BeamSearch(beam_size, block_ngram=-1, context_block_ngram=-1, padding_token=0, bos_token=1, eos_token=2, min_length=3, device='cpu', length_penalty=0.65, verbose=False, gpu_beam_blocking=False, dict=None)[source]¶
Bases:
TreeSearch
Beam search.
- class parlai.core.torch_generator_agent.DelayedBeamSearch(k, delay, *args, **kwargs)[source]¶
Bases:
TreeSearch
DelayedBeam: Top-K sampling followed by beam search (Massarelli et al., 2019).
Samples from a truncated distribution where only the most probable K words are considered at each time for the first N tokens, then switches to beam after N steps.
See https://arxiv.org/abs/1911.03587 for details.
- __init__(k, delay, *args, **kwargs)[source]¶
Instantiate Beam object.
- Parameters
beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary
- select_paths(logprobs, prior_scores, current_length) _PathSelection [source]¶
Select the next vocabulary item in these beams.
- Parameters
logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens
- Returns
a {hypothesis_ids, token_ids, scores, token_details} , where:
hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.
- class parlai.core.torch_generator_agent.DelayedNucleusBeamSearch(p, delay, *args, **kwargs)[source]¶
Bases:
TreeSearch
- __init__(p, delay, *args, **kwargs)[source]¶
Instantiate Beam object.
- Parameters
beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary
- select_paths(logprobs, prior_scores, current_length) _PathSelection [source]¶
Select the next vocabulary item in these beams.
- Parameters
logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens
- Returns
a {hypothesis_ids, token_ids, scores, token_details} , where:
hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.
- class parlai.core.torch_generator_agent.TopKSampling(k, *args, **kwargs)[source]¶
Bases:
TreeSearch
Top-K sampling (Fan et al., 2018).
Samples from a truncated distribution where only the most probable K words are considered at each time.
Typical values of k are 2, 10, 50.
See https://arxiv.org/abs/1805.04833 for details.
- __init__(k, *args, **kwargs)[source]¶
Instantiate Beam object.
- Parameters
beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary
- select_paths(logprobs, prior_scores, current_length) _PathSelection [source]¶
Select the next vocabulary item in these beams.
- Parameters
logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens
- Returns
a {hypothesis_ids, token_ids, scores, token_details} , where:
hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.
- class parlai.core.torch_generator_agent.NucleusSampling(p, *args, **kwargs)[source]¶
Bases:
TreeSearch
Nucelus, aka top-p sampling (Holtzman et al., 2019).
Samples from a truncated distribution which covers a fixed CDF proportion of the original distribution.
Typical values of p are 0.3 and 0.9.
See https://arxiv.org/abs/1904.09751 for details.
- __init__(p, *args, **kwargs)[source]¶
Instantiate Beam object.
- Parameters
beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary
- get_mask(sorted_probs: Tensor) Tensor [source]¶
Get probability mask.
- Parameters
sorted_probs – sorted probabilities
- Return mask
mask out tokens below the p value when sampling.
- select_paths(logprobs, prior_scores, current_length) _PathSelection [source]¶
Select the next vocabulary item in these beams.
- Parameters
logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens
- Returns
a {hypothesis_ids, token_ids, scores, token_details} , where:
hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.
- class parlai.core.torch_generator_agent.FactualNucleusSampling(p, lambda_decay, omega_bound, p_reset, beam_size, *args, **kwargs)[source]¶
Bases:
NucleusSampling
Factual Nucleus Sampling.
See https://arxiv.org/pdf/2206.04624.pdf for more information
- __init__(p, lambda_decay, omega_bound, p_reset, beam_size, *args, **kwargs)[source]¶
Instantiate Beam object.
- Parameters
beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary
Torch Ranker Agent¶
Torch Ranker Agents provide functionality for building ranking models.
See the TorchRankerAgent tutorial for examples.
- class parlai.core.torch_ranker_agent.TorchRankerAgent(opt: Opt, shared=None)[source]¶
Bases:
TorchAgent
Abstract TorchRankerAgent class; only meant to be extended.
TorchRankerAgents aim to provide convenient functionality for building ranking models. This includes:
Training/evaluating on candidates from a variety of sources.
Computing hits@1, hits@5, mean reciprical rank (MRR), and other metrics.
Caching representations for fast runtime when deploying models to production.
- classmethod add_cmdline_args(parser: ParlaiParser, partial_opt: Optional[Opt] = None) ParlaiParser [source]¶
Add CLI args.
- build_criterion()[source]¶
Construct and return the loss function.
By default torch.nn.CrossEntropyLoss.
- set_interactive_mode(mode, shared=False)[source]¶
Set interactive mode defaults.
In interactive mode, we set ignore_bad_candidates to True. Additionally, we change the eval_candidates to the option specified in –interactive-candidates, which defaults to False.
Interactive mode possibly changes the fixed candidates path if it does not exist, automatically creating a candidates file from the specified task.
- abstract score_candidates(batch, cand_vecs, cand_encs=None)[source]¶
Given a batch and candidate set, return scores (for ranking).
- Parameters
batch (Batch) – a Batch object (defined in torch_agent.py)
cand_vecs (LongTensor) – padded and tokenized candidates
cand_encs (FloatTensor) – encoded candidates, if these are passed into the function (in cases where we cache the candidate encodings), you do not need to call self.model on cand_vecs
- is_valid(obs)[source]¶
Override from TorchAgent.
Check to see if label candidates contain the label.
Share model parameters.
- set_vocab_candidates(shared)[source]¶
Load the tokens from the vocab as candidates.
self.vocab_candidates will contain a [num_cands] list of strings self.vocab_candidate_vecs will contain a [num_cands, 1] LongTensor
- set_fixed_candidates(shared)[source]¶
Load a set of fixed candidates and their vectors (or vectorize them here).
self.fixed_candidates will contain a [num_cands] list of strings self.fixed_candidate_vecs will contain a [num_cands, seq_len] LongTensor
See the note on the –fixed-candidate-vecs flag for an explanation of the ‘reuse’, ‘replace’, or path options.
Note: TorchRankerAgent by default converts candidates to vectors by vectorizing in the common sense (i.e., replacing each token with its index in the dictionary). If a child model wants to additionally perform encoding, it can overwrite the vectorize_fixed_candidates() method to produce encoded vectors instead of just vectorized ones.
- encode_candidates(padded_cands)[source]¶
Convert the given candidates to vectors.
This is an abstract method that must be implemented by the user.
- Parameters
padded_cands – The padded candidates.
- vectorize_fixed_candidates(cands_batch, add_start=False, add_end=False)[source]¶
Convert a batch of candidates from text to vectors.
- Parameters
cands_batch – a [batchsize] list of candidates (strings)
- Returns
a [num_cands] list of candidate vectors
By default, candidates are simply vectorized (tokens replaced by token ids). A child class may choose to overwrite this method to perform vectorization as well as encoding if so desired.
Torch Classifier Agent¶
Torch Classifier Agents classify text into a fixed set of labels.
- class parlai.core.torch_classifier_agent.ConfusionMatrixMetric(true_positives: Union[int, float, Tensor] = 0, true_negatives: Union[int, float, Tensor] = 0, false_positives: Union[int, float, Tensor] = 0, false_negatives: Union[int, float, Tensor] = 0)[source]¶
Bases:
Metric
Class that keeps count of the confusion matrix for classification.
Also provides helper methods computes precision, recall, f1, weighted_f1 for classification.
- property macro_average: bool¶
Indicates whether this metric should be macro-averaged when globally reported.
- class parlai.core.torch_classifier_agent.PrecisionMetric(true_positives: Union[int, float, Tensor] = 0, true_negatives: Union[int, float, Tensor] = 0, false_positives: Union[int, float, Tensor] = 0, false_negatives: Union[int, float, Tensor] = 0)[source]¶
Bases:
ConfusionMatrixMetric
Class that takes in a ConfusionMatrixMetric and computes precision for classifier.
- class parlai.core.torch_classifier_agent.RecallMetric(true_positives: Union[int, float, Tensor] = 0, true_negatives: Union[int, float, Tensor] = 0, false_positives: Union[int, float, Tensor] = 0, false_negatives: Union[int, float, Tensor] = 0)[source]¶
Bases:
ConfusionMatrixMetric
Class that takes in a ConfusionMatrixMetric and computes recall for classifier.
- class parlai.core.torch_classifier_agent.ClassificationF1Metric(true_positives: Union[int, float, Tensor] = 0, true_negatives: Union[int, float, Tensor] = 0, false_positives: Union[int, float, Tensor] = 0, false_negatives: Union[int, float, Tensor] = 0)[source]¶
Bases:
ConfusionMatrixMetric
Class that takes in a ConfusionMatrixMetric and computes f1 for classifier.
- class parlai.core.torch_classifier_agent.AUCMetrics(class_name: Union[int, str], max_bucket_dec_places: int = 3, pos_dict: Optional[Counter[float]] = None, neg_dict: Optional[Counter[float]] = None)[source]¶
Bases:
Metric
Computes Area Under ROC Curve (AUC) metrics.
Does so by keeping track of positives’ and negatives’ probability score counts in Counters or dictionaries. Note the introduction of max_bucket_dec_places; this integer number determines the number of digits to save for the probability scores. A higher max_bucket_dec_places will a more accurate estimate of the exact AUC metric, but may also use more space.
NOTE: currently only used for classifiers in the eval_model script; to use, add the argument -auc <max_bucket_dec_places> when calling eval_model script
- property macro_average: bool¶
Indicates whether this metric should be macro-averaged when globally reported.
- __init__(class_name: Union[int, str], max_bucket_dec_places: int = 3, pos_dict: Optional[Counter[float]] = None, neg_dict: Optional[Counter[float]] = None)[source]¶
- update_raw(true_labels: List[Union[int, str]], pos_probs: List[float], class_name)[source]¶
given the true/golden labels and the probabilities of the positive class, we will update our bucket dictionaries of positive and negatives (based on the class_name); max_bucket_dec_places is also used here to round the probabilities and possibly.
- class parlai.core.torch_classifier_agent.WeightedF1Metric(metrics: Dict[str, ClassificationF1Metric])[source]¶
Bases:
Metric
Class that represents the weighted f1 from ClassificationF1Metric.
- property macro_average: bool¶
Indicates whether this metric should be macro-averaged when globally reported.
- __init__(metrics: Dict[str, ClassificationF1Metric]) None [source]¶
- class parlai.core.torch_classifier_agent.TorchClassifierAgent(opt: Opt, shared=None)[source]¶
Bases:
TorchAgent
Abstract Classifier agent. Only meant to be extended.
TorchClassifierAgent aims to handle much of the bookkeeping any classification model.
- classmethod add_cmdline_args(parser: ParlaiParser, partial_opt: Optional[Opt] = None) ParlaiParser [source]¶
Add CLI args.
Share model parameters.
Torch Image Agent¶
Subclass of TorchAgent used for handling image features.
- class parlai.core.torch_image_agent.TorchImageAgent(opt, shared=None)[source]¶
Bases:
TorchAgent
Subclass of TorchAgent that allows for encoding image features.
Provides flags and utility methods.
- classmethod add_cmdline_args(parser: ParlaiParser, partial_opt: Optional[Opt] = None) ParlaiParser [source]¶
Add command-line arguments specifically for this agent.