File for miscellaneous utility functions and constants.

class parlai.core.utils.AttrDict(*args, **kwargs)

Bases: dict

Helper class to have a dict-like object with dot access.

For example, instead of d = {‘key’: ‘value’} use d = AttrDict(key=’value’). To access keys, instead of doing d[‘key’] use d.key.

While this has some limitations on the possible keys (for example, do not set the key items or you will lose access to the items() method), this can make some code more clear.

__init__(*args, **kwargs)

Initialize AttrDict using input dict.

class parlai.core.utils.NoLock

Bases: object

Empty lock. Does nothing when you enter or exit.

class parlai.core.utils.OffensiveLanguageDetector

Bases: object

Tries to detect offensive language in text.

Detects offensive language using a list of offensive language and phrases from


Get data from external sources and build data representation.


Add a single phrase to the filter.


Add list of custom phrases to the filter.


Determine if text contains any offensive words in the filter.

str_segment(text, dict_agent, max_length)

Function that segments a word without spaces into the most probable phrase with spaces

  • text (string) – string to segment
  • dict_agent (DictionaryAgent) – Dictionary we use to look at word frequencies
  • max_length (int) – max_length of string to segment

the segmented string

Return type:


Example Usage:

dict_agent = DictionaryAgent using Wiki Toxic Comments data old = OffensiveLanguageDector()

split_str = old.str_segment(‘fucku2’, dict_agent, 20) split_str is ‘fuck u 2’

We can then run old.contains_offensive_language(split_str) which yields the offensive word ‘fuck’

class parlai.core.utils.PaddingUtils

Bases: object

Helps with padding input and target tensors.


classmethod map_predictions(predictions, valid_inds, batch_reply, observations, dictionary, end_idx, report_freq=0.1, labels=None, answers=None, ys=None)

Match predictions to original index in the batch.

Predictions are mapped back to appropriate indices in the batch_reply using valid_inds.

report_freq – how often we report predictions


classmethod pad_text(observations, dictionary, end_idx=None, null_idx=0, dq=False, eval_labels=True, truncate=None)

Pad observations to max width.

We check that examples are valid, pad with zeros, and sort by length so that we can use the pack_padded function. The list valid_inds keeps track of which indices are valid and the order in which we sort the examples.

dq – whether we should use deque or list eval_labels – whether or not we want to consider eval labels truncate – truncate input and output lengths


class parlai.core.utils.Predictor(args=None, **kwargs)

Bases: object

Wrapper to set up running version of model and request predictions.

Note that this maintains no World state (does not use a World), merely providing the observation directly to the model and getting a response.

This is limiting when it comes to certain use cases, but allows for quick model deployment.

__init__(args=None, **kwargs)

Initialize the predictor, setting up opt automatically if needed.

Args is expected to be in the same format as sys.argv: e.g. a list in the form [‘–model’, ‘seq2seq’, ‘-hs’, 128, ‘-lr’, 0.5].

kwargs is interpreted by appending ‘–’ to it and replacing underscores with hyphens, so ‘dict_file=/tmp/dict.tsv’ would be interpreted as ‘–dict-file /tmp/dict.tsv’.


From a ParlAI-standard message dict, get model prediction.

class parlai.core.utils.TimeLogger

Bases: object

Class for logging time progress against a goal.


Set up timer.

log(done, total, report=None)

Log report, time elapsed, and percentage progress towards goal.

  • done – number of examples completed so far
  • total – total number of elements to be completed. if total > 0, calculates the time remaining and percentage complete.
  • report – dict of pairs to log

tuple log string, log dict log string contains time elapsed and string representation of the log dict log dict contains pairs of all items to log, which includes percentage complete and projected time left if total > 0


Return current timer time.


Return time elapsed at last log call.

class parlai.core.utils.Timer

Bases: object

Computes elapsed time.


Initialize timer.


Reset timer to zero.


Resume timer.


Pause timer.


Get current timer time.

parlai.core.utils.argsort(keys, *lists, descending=False)

Reorder each list in lists by the (descending) sorted order of keys.

  • keys (iter) – Keys to order by.
  • lists (list[list]) – Lists to reordered by keys’s order. Correctly handles lists and 1-D tensors.
  • descending (bool) – Use descending order if true.

The reordered items.

parlai.core.utils.clip_text(text, max_len)

Clip text to max length, adding ellipses.

parlai.core.utils.display_messages(msgs, prettify=False, ignore_fields='', max_len=1000)

Return a string describing the set of messages provided.

If prettify is true, candidates are displayed using prettytable. ignore_fields provides a list of fields in the msgs which should not be displayed.

parlai.core.utils.flatten(teacher, context_length=-1, include_labels=True)

DEPRECATED: If you would like to make use of batch sorting, please use the PytorchDataTeacher instead

Return a flattened version of a teacher’s data.

All episodes will have length 1 but contain the desired amount of context.

If context_length is not -1, will use only that many past utterances. Default is -1, full past. Setting it to one only uses the input text.

If include_labels is True, will include a random label in past utterances. Default is True.

parlai.core.utils.load_cands(path, lines_have_ids=False, cands_are_replies=False)

Load global fixed set of candidate labels that the teacher provides.

Every example will include these as candidates. The true labels for a specific example are also added to this set, so that it’s possible to get the right answer.

parlai.core.utils.maintain_dialog_history(history, observation, reply='', historyLength=1, useReplies='label_else_model', dict=None, useStartEndIndices=True, splitSentences=False)

Keep track of dialog history, up to a truncation length.

Either includes replies from the labels, model, or not all using param ‘replies’.


parlai.core.utils.make_batches(data, bsz)

DEPRECATED: If you would like to make use of batch sorting, please use the PytorchDataTeacher instead.

Return a list of lists of size bsz given a list of examples.

parlai.core.utils.msg_to_str(msg, ignore_fields='')

Convert ParlAI message dict to string.

  • msg – dict to convert into a string.
  • ignore_fields – (default ‘’) comma-separated field names to not include in the string even if they’re in the msg dict.

Build a nolock for other classes to use for no-op locking.

parlai.core.utils.padded_3d(tensors, pad_idx=0, use_cuda=0)

Make 3D padded tensor for list of lists of 1D tensors or lists.

  • tensors – list of lists of 1D tensors (or lists)
  • pad_idx – padding to fill tensor with
  • use_cuda – whether to call cuda() before returning

3D tensor with the maximum dimensions of the inputs

parlai.core.utils.padded_tensor(items, pad_idx=0, use_cuda=False, left_padded=False, max_len=None)

Create a right-padded matrix from an uneven list of lists.

Returns (padded, lengths), where padded is the padded matrix, and lengths is a list containing the lengths of each row.

Matrix is right-padded (filled to the right) by default, but can be left padded if the flag is set to True.

Matrix can also be placed on cuda automatically.

  • items (list[iter[int]]) – List of items
  • sort (bool) – If True, orders by the length
  • pad_idx (int) – the value to use for padding
  • use_cuda (bool) – if true, places padded on GPU
  • left_padded (bool) –
  • max_len (int) – if None, the max length is the maximum item length

(padded, lengths) tuple

Return type:

(Tensor[int64], list[int])

parlai.core.utils.round_sigfigs(x, sigfigs=4)

Round value to specified significant figures.

  • x – input number
  • sigfigs – number of significant figures to return

float number rounded to specified sigfigs

parlai.core.utils.set_namedtuple_defaults(namedtuple, default=None)

Set all of the fields for a given nametuple to a singular value.

Additionally removes the default docstring for each field. Modifies the tuple in place, but returns it anyway.

More info:

  • namedtuple – A constructed collections.namedtuple
  • default – The default value to set.

the modified namedtuple

parlai.core.utils.sort_data(data, key='text_label', method='spaces')

DEPRECATED: If you would like to make use of batch sorting, please use the PytorchDataTeacher instead.

Given a list of data, sort it according to the method and key.

Currently the only supported method is counting the number of spaces. This appeared to be reliable enough and much faster than tokenizing. It performs much better than just using the length of the string.

Currently the only supported key is sorting by first the text, then the label. See for an evaluation of alternative approaches for machine translation. Sorting by the source (text) gives a good improvement in speed over random batching and is robust to different types of optimization. Breaking ties by sorting by label length gives a further improvement in speed but can reduce robustness with some optimization schemes.

parlai.core.utils.str_to_msg(txt, ignore_fields='')

Convert formatted string to ParlAI message dict.

  • txt – formatted string to convert. String format is tab-separated fields, with colon separating field name and contents.
  • ignore_fields – (default ‘’) comma-separated field names to not include in the msg dict even if they’re in the string.
parlai.core.utils.warn_once(msg, warningtype=None)

Raise a warning, but only once.

  • msg (str) – Message to display
  • warningtype (Warning) – Type of warning, e.g. DeprecationWarning