core.utils

File for miscellaneous utility functions and constants.

parlai.core.utils.maintain_dialog_history(history, observation, reply='', historyLength=1, useReplies='label_else_model', dict=None, useStartEndIndices=True, splitSentences=False)

Keep track of dialog history, up to a truncation length.

Either includes replies from the labels, model, or not all using param ‘replies’.

DEPRECATED. USE PARLAI.CORE.TORCH_AGENT INSTEAD.

parlai.core.utils.load_cands(path, lines_have_ids=False, cands_are_replies=False)

Load global fixed set of candidate labels that the teacher provides.

Every example will include these as candidates. The true labels for a specific example are also added to this set, so that it’s possible to get the right answer.

class parlai.core.utils.Predictor(args=None, **kwargs)

Wrapper to set up running version of model and request predictions.

Note that this maintains no World state (does not use a World), merely providing the observation directly to the model and getting a response.

This is limiting when it comes to certain use cases, but allows for quick model deployment.

__init__(args=None, **kwargs)

Initialize the predictor, setting up opt automatically if needed.

Args is expected to be in the same format as sys.argv: e.g. a list in the form [‘–model’, ‘seq2seq’, ‘-hs’, 128, ‘-lr’, 0.5].

kwargs is interpreted by appending ‘–’ to it and replacing underscores with hyphens, so ‘dict_file=/tmp/dict.tsv’ would be interpreted as ‘–dict-file /tmp/dict.tsv’.

predict(observation)

From a ParlAI-standard message dict, get model prediction.

class parlai.core.utils.Timer

Computes elapsed time.

__init__()

Initialize timer.

reset()

Reset timer to zero.

resume()

Resume timer.

stop()

Pause timer.

time()

Get current timer time.

class parlai.core.utils.TimeLogger

Class for logging time progress against a goal.

__init__()

Set up timer.

total_time()

Return time elapsed at last log call.

time()

Return current timer time.

log(done, total, report=None)

Log report, time elapsed, and percentage progress towards goal.

Parameters:
  • done – number of examples completed so far
  • total – total number of elements to be completed. if total > 0, calculates the time remaining and percentage complete.
  • report – dict of pairs to log
Returns:

tuple log string, log dict log string contains time elapsed and string representation of the log dict log dict contains pairs of all items to log, which includes percentage complete and projected time left if total > 0

class parlai.core.utils.AttrDict(*args, **kwargs)

Helper class to have a dict-like object with dot access.

For example, instead of d = {‘key’: ‘value’} use d = AttrDict(key=’value’). To access keys, instead of doing d[‘key’] use d.key.

While this has some limitations on the possible keys (for example, do not set the key items or you will lose access to the items() method), this can make some code more clear.

__init__(*args, **kwargs)

Initialize AttrDict using input dict.

parlai.core.utils.round_sigfigs(x, sigfigs=4)

Round value to specified significant figures.

Parameters:
  • x – input number
  • sigfigs – number of significant figures to return
Returns:

float number rounded to specified sigfigs

parlai.core.utils.flatten(teacher, context_length=-1, include_labels=True)

DEPRECATED: If you would like to make use of batch sorting, please use the PytorchDataTeacher instead

Return a flattened version of a teacher’s data.

All episodes will have length 1 but contain the desired amount of context.

If context_length is not -1, will use only that many past utterances. Default is -1, full past. Setting it to one only uses the input text.

If include_labels is True, will include a random label in past utterances. Default is True.

parlai.core.utils.sort_data(data, key='text_label', method='spaces')

DEPRECATED: If you would like to make use of batch sorting, please use the PytorchDataTeacher instead.

Given a list of data, sort it according to the method and key.

Currently the only supported method is counting the number of spaces. This appeared to be reliable enough and much faster than tokenizing. It performs much better than just using the length of the string.

Currently the only supported key is sorting by first the text, then the label. See https://arxiv.org/abs/1706.05765 for an evaluation of alternative approaches for machine translation. Sorting by the source (text) gives a good improvement in speed over random batching and is robust to different types of optimization. Breaking ties by sorting by label length gives a further improvement in speed but can reduce robustness with some optimization schemes.

parlai.core.utils.make_batches(data, bsz)

DEPRECATED: If you would like to make use of batch sorting, please use the PytorchDataTeacher instead.

Return a list of lists of size bsz given a list of examples.

class parlai.core.utils.NoLock

Empty lock. Does nothing when you enter or exit.

__enter__()

No-op.

__exit__(exc_type, exc_value, exc_traceback)

No-op.

parlai.core.utils.no_lock()

Build a nolock for other classes to use for no-op locking.

class parlai.core.utils.ProgressLogger(throttle=1, should_humanize=True)

Throttles and display progress in human readable form.

__init__(throttle=1, should_humanize=True)

Initialize Progress logger.

Parameters:
  • throttle – default 1, number in seconds to use as throttle rate
  • should_humanize – default True, whether to humanize data units
humanize(num, suffix='B')

Convert units to more human-readable format.

log(curr, total, width=40, force=False)

Display a bar showing the current progress.

class parlai.core.utils.PaddingUtils

Helps with padding input and target tensors.

DEPRECATED. USE PARLAI.CORE.TORCH_AGENT INSTEAD.

classmethod pad_text(observations, dictionary, end_idx=None, null_idx=0, dq=False, eval_labels=True, truncate=None)

Pad observations to max width.

We check that examples are valid, pad with zeros, and sort by length so that we can use the pack_padded function. The list valid_inds keeps track of which indices are valid and the order in which we sort the examples.

dq – whether we should use deque or list eval_labels – whether or not we want to consider eval labels truncate – truncate input and output lengths

DEPRECATED. USE PARLAI.CORE.TORCH_AGENT INSTEAD.

classmethod map_predictions(predictions, valid_inds, batch_reply, observations, dictionary, end_idx, report_freq=0.1, labels=None, answers=None, ys=None)

Match predictions to original index in the batch.

Predictions are mapped back to appropriate indices in the batch_reply using valid_inds.

report_freq – how often we report predictions

DEPRECATED. USE PARLAI.CORE.TORCH_AGENT INSTEAD.

class parlai.core.utils.OffensiveLanguageDetector

Tries to detect offensive language in text.

Detects offensive language using a list of offensive language and phrases from https://github.com/LDNOOBW.

__init__()

Get data from external sources and build data representation.

add_phrase(phrase)

Add a single phrase to the filter.

add_words(phrase_list)

Add list of custom phrases to the filter.

contains_offensive_language(text)

Determine if text contains any offensive words in the filter.

__contains__(key)

Determine if text contains any offensive words in the filter.

parlai.core.utils.clip_text(text, max_len)

Clip text to max length, adding ellipses.

parlai.core.utils.display_messages(msgs, prettify=False, ignore_fields='', max_len=1000)

Return a string describing the set of messages provided.

If prettify is true, candidates are displayed using prettytable. ignore_fields provides a list of fields in the msgs which should not be displayed.

parlai.core.utils.str_to_msg(txt, ignore_fields='')

Convert formatted string to ParlAI message dict.

Parameters:
  • txt – formatted string to convert. String format is tab-separated fields, with colon separating field name and contents.
  • ignore_fields – (default ‘’) comma-separated field names to not include in the msg dict even if they’re in the string.
parlai.core.utils.msg_to_str(msg, ignore_fields='')

Convert ParlAI message dict to string.

Parameters:
  • msg – dict to convert into a string.
  • ignore_fields – (default ‘’) comma-separated field names to not include in the string even if they’re in the msg dict.
parlai.core.utils.set_namedtuple_defaults(namedtuple, default=None)

Set all of the fields for a given nametuple to a singular value.

Modifies the tuple in place, but returns it anyway.

More info: https://stackoverflow.com/a/18348004

Parameters:
  • namedtuple – A constructed collections.namedtuple
  • default – The default value to set.
Returns:

the modified namedtuple

parlai.core.utils.padded_tensor(items, pad_idx=0, use_cuda=False, left_padded=False, max_len=None)

Create a right-padded matrix from an uneven list of lists.

Returns (padded, lengths), where padded is the padded matrix, and lengths is a list containing the lengths of each row.

Matrix is right-padded (filled to the right) by default, but can be left padded if the flag is set to True.

Matrix can also be placed on cuda automatically.

Parameters:
  • items (list[iter[int]]) – List of items
  • sort (bool) – If True, orders by the length
  • pad_idx (int) – the value to use for padding
  • use_cuda (bool) – if true, places padded on GPU
  • left_padded (bool) –
  • max_len (int) – if None, the max length is the maximum item length
Returns:

(padded, lengths) tuple

Return type:

(Tensor[int64], list[int])

parlai.core.utils.padded_3d(tensors, pad_idx=0, use_cuda=0)

Make 3D padded tensor for list of lists of 1D tensors or lists.

Parameters:
  • tensors – list of lists of 1D tensors (or lists)
  • pad_idx – padding to fill tensor with
  • use_cuda – whether to call cuda() before returning
Returns:

3D tensor with the maximum dimensions of the inputs

parlai.core.utils.argsort(keys, *lists, descending=False)

Reorder each list in lists by the (descending) sorted order of keys.

Parameters:
  • keys (iter) – Keys to order by.
  • lists (list[list]) – Lists to reordered by keys’s order. Correctly handles lists and 1-D tensors.
  • descending (bool) – Use descending order if true.
Returns:

The reordered items.

parlai.core.utils.warn_once(msg, warningtype=None)

Raise a warning, but only once.

Parameters:
  • msg (str) – Message to display
  • warningtype (Warning) – Type of warning, e.g. DeprecationWarning