parlai.utils¶
ParlAI has very many utilities, roughly organized by function.
parlai.utils.bpe¶
Byte pair encoding (BPE).
Lots of BPE things for ParlAI
- parlai.utils.bpe.bpe_factory(opt: Opt, shared: TShared) BPEHelper [source]¶
BPE Helper Factory.
Returns the appropriate BPE helper given the opt as well as available libraries.
- Parameters
opt – options
shared – shared dict
- Return BPEHelper
returns the appropriate BPEHelper object
- class parlai.utils.bpe.BPEHelper(opt: Opt, shared: Optional[TShared] = None)[source]¶
Bases:
ABC
Abstract BPE Helper.
BPE Helper subclasses must implement appropriate abstractmethods.
- __init__(opt: Opt, shared: Optional[TShared] = None)[source]¶
Subclasses _should_ override __init__ to initialize other things.
- encode(text: str) List[str] [source]¶
Tokenize text.
Checks for add_prefix_space; handles accordingly
NOTE: DO NOT OVERRIDE
- Parameters
text – text to tokenize
- Return tokens
A list of tokens
- abstract helper_encode(text: str) List[str] [source]¶
Tokenize text.
Subclasses should override this method for encoding.
- Parameters
text – text to tokenize
- Return tokens
A list of tokens
- decode(tokens: List[str], token_ids: List[int], delimiter: str = ' ') str [source]¶
Decode list of tokens into a text string.
NOTE: DO NOT OVERRIDE
- Parameters
tokens – list of tokens
token_ids – list of token ids
delimiter – string delimiter for tokens
- Return text
decoded text
- abstract helper_decode(tokens: List[str], token_ids: List[int], delimiter: str) str [source]¶
Decode list of tokens into text string.
Subclasses should override this method for decoding.
- Parameters
tokens – list of tokens
token_ids – list of token ids
delimiter – string delimiter for tokens
- Return text
decoded text
- abstract sync_with_dict(dict_agent)[source]¶
Sync BPE Helper dictionary with dict_agent dict.
- Parameters
dict_agent – agent with which we are syncing the dictionary
- add_special_tokens(dict_agent, special_tokens: List[str])[source]¶
Add special tokens to the tokenizer.
These tokens are never split, and prioritized over the BPE tokenization.
- finalize(frequencies: Dict[str, int], num_symbols: int, minfreq: int) bool [source]¶
Build the codecs.
Default helpers are pre-trained and thus do not build their own codecs
- Parameters
frequencies – dictionary of (token: frequency) pairs
num_symbols – Number of BPE symbols. Recommend 30000-40000. If <= 0, default 30000 will be used.
minfreq – Minimum frequency of a token before forced BPE decomposition. If <= 0 will use subword-nmt default of 2.
- Return did_finalize
return whether codecs are finalized this call.
- class parlai.utils.bpe.SubwordBPEHelper(opt: Opt, shared: Optional[TShared] = None)[source]¶
Bases:
BPEHelper
Helper class for performing BPE subword tokenization.
For technical details, please refer to https://arxiv.org/abs/1508.07909. This class just wraps around the official subword-nmt repository.
This API expects the user to call tokenize() (encode) onto the training data, then call finalize() to learn the encodings, and then iterate over the data in a second pass, calling tokenize() again to get processed output.
- __init__(opt: Opt, shared: Optional[TShared] = None)[source]¶
Initialize the BPE module.
- Parameters
opt – options
shared – shared dictionary
- add_special_tokens(dict_agent, special_tokens: List[str])[source]¶
Add special tokens to the tokenizer.
These tokens are never split, and prioritized over the BPE tokenization.
- helper_encode(text: str) List[str] [source]¶
Tokenize the text with bpe if codecs are already finalized.
Otherwise, returns the regularly split tokens that will train the bpe.
- Parameters
text – Raw text to tokenize.
- Returns
a list of tokens. Will use BPE once finalized.
- helper_decode(tokens: List[str], token_ids: List[int], delimiter: str) str [source]¶
Decode list of tokens into text string.
- Parameters
tokens – list of tokens
token_ids – list of token ids
delimiter – string delimiter for tokens
- Return text
decoded text
- finalize(frequencies: Dict[str, int], num_symbols: int = 30000, minfreq: int = 2) bool [source]¶
Build the codecs.
- Parameters
frequencies – dictionary of (token: frequency) pairs
num_symbols – Number of BPE symbols. Recommend 30000-40000. If <= 0, default 30000 will be used.
minfreq – Minimum frequency of a token before forced BPE decomposition. If <= 0 will use subword-nmt default of 2.
- Return did_finalize
return whether codecs are finalized this call.
- class parlai.utils.bpe.Gpt2BpeHelper(opt: Opt, shared: Optional[TShared] = None)[source]¶
Bases:
BPEHelper
BPE Helper for GPT2 Models.
- Original source:
Original license: MIT
- This is a modified implementation from that of fairseq:
https://github.com/pytorch/fairseq/blob/main/fairseq/data/encoders/gpt2_bpe_utils.py
Fairseq license: MIT
- bytes_to_unicode() Dict[int, str] ¶
Returns list of utf-8 byte and a corresponding list of unicode strings.
The reversible bpe codes work on unicode strings. This means you need a large # of unicode characters in your vocab if you want to avoid UNKs. When you’re at something like a 10B token dataset you end up needing around 5K for decent coverage. This is a significant percentage of your normal, say, 32K bpe vocab. To avoid that, we want lookup tables between utf-8 bytes and unicode strings. And avoids mapping to whitespace/control characters the bpe code barfs on.
- get_pairs(word: Tuple[str, ...]) Set[Tuple[str, str]] [source]¶
Return set of symbol pairs in a word.
Word is represented as tuple of symbols (symbols being variable-length strings).
- Parameters
word – word to symbolize
- Return pairs
set of tuples of symbols
- bpe(token: str) str [source]¶
Convert token to BPE.
- Parameters
token – token to convert
- Return bpe_encoding
string bpe encoding
- helper_encode(text: str) List[str] [source]¶
Tokenize text.
- Parameters
text – text to tokenize
- Return tokens
A list of tokens
- helper_decode(tokens: List[str], token_ids: List[int], delimiter: str) str [source]¶
Decode list of tokens into text string.
- Parameters
tokens – list of tokens
token_ids – list of token ids
delimiter – string delimiter for tokens
- Return text
decoded text
- class parlai.utils.bpe.HuggingFaceBpeHelper(opt: Opt, shared: Optional[TShared] = None)[source]¶
Bases:
BPEHelper
HuggingFace’s ByteLevelBPE Tokenizer.
Fast because Rust.
- __init__(opt: Opt, shared: Optional[TShared] = None)[source]¶
Subclasses _should_ override __init__ to initialize other things.
- helper_encode(text: str) List[str] [source]¶
Decode list of tokens into text string.
- Parameters
tokens – list of tokens
delimiter – string delimiter for tokens
- Return text
decoded text
- helper_decode(tokens: List[str], token_ids: List[int], delimiter: str) str [source]¶
Decode list of tokens into text string.
- Parameters
tokens – list of tokens
token_ids – list of token ids
delimiter – string delimiter for tokens
- Return text
decoded text
- add_special_tokens(dict_agent, special_tokens: List[str])[source]¶
Add special tokens to the tokenizer and dict_agent.
- class parlai.utils.bpe.SlowBytelevelBPE(opt: Opt, shared: Optional[TShared] = None)[source]¶
Bases:
Gpt2BpeHelper
Stand-in for HuggingFace if we do not have access to tokenizers.
Only EVER used for a model used in interactive mode that was previously trained with HF BPE.
parlai.utils.conversations¶
Utility methods for conversations format.
- class parlai.utils.conversations.Metadata(datapath)[source]¶
Bases:
object
Utility class for conversation metadata.
Metadata should be saved at
<datapath>.metadata
.
- class parlai.utils.conversations.Turn(id=None, text=None, **kwargs)[source]¶
Bases:
AttrDict
Utility class for a dialog turn.
- class parlai.utils.conversations.Conversation(episode)[source]¶
Bases:
object
Utility class for iterating through a single episode.
Used in the context of the Conversations class.
- class parlai.utils.conversations.Conversations(datapath)[source]¶
Bases:
object
Utility class for reading and writing from ParlAI Conversations format.
Conversations should be saved in JSONL format, where each line is a JSON of the following form:
WARNING: The data below must be on ONE LINE per dialogue in a conversation file or it will not load!!
- classmethod save_conversations(act_list, datapath, opt, save_keys='all', context_ids='context', self_chat=False, **kwargs)[source]¶
Write Conversations to file from an act list.
Conversations assume the act list is of the following form: a list of episodes, each of which is comprised of a list of act pairs (i.e. a list dictionaries returned from one parley)
parlai.utils.data¶
Utilities related to handling data.
- class parlai.utils.data.DatatypeHelper[source]¶
Bases:
object
Helper class to determine properties from datatype strings.
- classmethod fold(datatype: str) str [source]¶
Extract the fold part of the datatype.
- Parameters
datatype – parlai datatype
- Returns
the fold
>>> DatatypeHelper.fold("train:ordered") ... "train"
- classmethod strip_stream(datatype: str) str [source]¶
Remove :stream from the datatype.
Used by ChunkTeacher where behavior does not change based on streaming.
- Parameters
datatype – parlai datatype
- Returns
a non-streaming version of the datatype.
>>> DatatypeHelper.fold("train:stream") "train" >>> DatatypeHelper.fold("train") "train"
- classmethod should_cycle(datatype: str) bool [source]¶
Return whether we should cycle data based on the datatype.
- Parameters
datatype – parlai datatype
- Return should_cycle
given datatype, return whether we should cycle
- classmethod should_shuffle(datatype: str) bool [source]¶
Return whether we should shuffle data based on the datatype.
- Parameters
datatype – parlai datatype
- Return should_shuffle
given datatype, return whether we should shuffle
- classmethod is_training(datatype: str) bool [source]¶
Return whether we should return eval_labels or labels.
- Parameters
datatype – parlai datatype
- Return is_training
bool indicating whether should return eval_labels or labels
- classmethod is_streaming(datatype: str) bool [source]¶
Return whether this is streaming.
- Parameters
datatype – parlai datatype
- Returns
bool indicating whether we are streaming
- classmethod split_data_by_fold(fold: str, data: List, train_frac: float, valid_frac: float, test_frac: float, seed: int = 42)[source]¶
Splits a list of data into train/valid/test folds. The members of these folds are randomized (in a consistent manner) by a seed. This is a convenience function for datasets that do not have a canonical split.
- Parameters
fold – parlai fold/datatype
data – List of data examples to be split
train_frac – Fraction of data to be used for the “train” fold. train_frac, valid_frac, and test_frac should sum to 1.
valid_frac – Fraction of data to be used for the “valid” fold. train_frac, valid_frac, and test_frac should sum to 1.
test_frac – Fraction of data to be used for the “test” fold. train_frac, valid_frac, and test_frac should sum to 1.
seed – Seed for shuffling
- classmethod split_subset_data_by_fold(fold: str, subsets: List[List], train_frac: float, valid_frac: float, test_frac: float, seed: int = 42)[source]¶
Splits a list of subsets of data, where we want equal samples from each subset, into train/valid/test folds, ensuring that samples from a given subset are not changed to another fold as more subsets are added.
For example, say a dataset has domains A, B. Let’s say we have an experiment where we train and validate a model on domain A, then on domains A + B. If we naively concatinate the subset of data from A + B together, randomize it, and split the result into train, valid, and test folds, there is no guarantee that valid or test examples from A-only will not end up into the train fold of the A + B split from this naive concatination process.
The members of these folds are randomized (but in a fixed manner) by a seed.
- Parameters
fold – parlai fold/datatype
subsets – List of subsets of data examples to be split
train_frac – Fraction of data to be used for the “train” fold. train_frac, valid_frac, and test_frac should sum to 1.
valid_frac – Fraction of data to be used for the “valid” fold. train_frac, valid_frac, and test_frac should sum to 1.
test_frac – Fraction of data to be used for the “test” fold. train_frac, valid_frac, and test_frac should sum to 1.
seed – Seed for shuffling
parlai.utils.distributed¶
Useful utilities for training in distributed mode.
Many of these functions act as wrappers which perform no-ops if code is running in non- distributed mode.
- parlai.utils.distributed.is_primary_worker()[source]¶
Determine if we are the primary (rank 0) worker.
Returns False if we are a secondary worker. Returns True if we are either (1) not in distributed mode (2) or are the primary (rank 0) worker.
- parlai.utils.distributed.get_rank()[source]¶
Returns the rank of the current worker.
Returns 0 if not in distributed.
- parlai.utils.distributed.override_print(suppress=False, prefix=None)[source]¶
Context manager to override the print to suppress or modify output.
Recommended usage is to call this with suppress=True for all non-primary workers, or call with a prefix of rank on all workers.
>>> with override_print(prefix="rank{}".format(rank)): ... my_computation() :param bool suppress: if true, all future print statements are noops. :param str prefix: if not None, this string is prefixed to all future print statements.
- parlai.utils.distributed.all_gather_list(data)[source]¶
Gather arbitrary data from all nodes into a list.
Similar to ~torch.distributed.all_gather but for arbitrary Python data. Note that data must be picklable.
- Parameters
data – data from the local worker to be gathered on other workers
- Returns
a list containing [data1, data2, …] of all workers
- parlai.utils.distributed.sync_object(data)[source]¶
Sync an object among all workers.
All workers will return the same value for data when returning from this method, always using the primary worker’s version. Useful for ensuring control flow decisions are made the same.
- Parameters
data (object) – The object to synchronize. Must be pickleable.
- Returns
the synchronized data
- parlai.utils.distributed.sync_parameters(model: Module) bool [source]¶
Sync all parameters across all workers are the same.
Always returns True, or raises an AssertionError if there was a failure.
- Parameters
model – A pytorch model.
- Returns
always True
- parlai.utils.distributed.distributed_context(rank, opt, rank_offset=0, gpu=None, init_method='tcp://localhost:61337')[source]¶
A context which wraps initialization of a distributed/multiprocessing run.
Every process in the distributed run should launch with this. In true distributed setting you may wish to use slurm_distributed_context instead.
- Parameters
rank (int) – This process’s rank, less rank_offset.
rank_offset (int) – Used as an offset of rank. Used between multiprocessing vs true distributed, and a hack around torch.multiprocessing.spawn being only used for the non-primary workers.
opt – command line options distributed training setups on the same machine.
gpu (int) – Which GPU to use. Defaults to using rank and local devices, but must be manually specified when using many-hosts.
method (str init) – Init method, such as
tcp://localhost:61337
. See torch.distributed docs.
- parlai.utils.distributed.get_dist_group()[source]¶
Find the default pytorch distributed group.
Used within FSDP to mark which workers are participating. Important to manually call this because FSDP will cache old groups, but our test suite will instantiate new groups per test.
- parlai.utils.distributed.slurm_distributed_context(opt)[source]¶
Initialize a distributed context, using the SLURM environment.
Does some work to read the environment to find a list of participating nodes and the main node.
- Parameters
opt – Command line options.
- parlai.utils.distributed.find_free_port() int [source]¶
Find a free port we can bind to locally.
Credit: https://stackoverflow.com/questions/1365265/on-localhost-how-do-i-pick-a-free-port-number
parlai.utils.fp16¶
Utility methods for mixed precision training.
- class parlai.utils.fp16.FP16SafeCrossEntropy(weight: Optional[Tensor] = None, ignore_index: int = -100, reduction: str = 'none')[source]¶
Bases:
Module
FP16-safe cross entropy loss.
This avoids overflow in the softmax by doing the operation in FP32.
- __init__(weight: Optional[Tensor] = None, ignore_index: int = -100, reduction: str = 'none')[source]¶
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(scores, targets)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- parlai.utils.fp16.clip_grad_norm(params, max_norm: float = 0, sync: bool = False)[source]¶
Clips grad norms.
During combination with FSDP, will also ensure that grad norms are aggregated across all workers, since each worker only stores their shard of the gradients.
- Parameters
params – Parameters whose gradients we wish to clip
max_norm – Maximum norm we wish the gradients to have. If non-positive, then we will not perform clipping.
sync – Boolean indicating whether we should aggregate across the distributed group. Used only in combination with FSDP.
- Returns
The gradient norm across all parameters, before clipping.
- class parlai.utils.fp16.SafeFP16Optimizer(optimizer, aggregate_gnorms=False)[source]¶
Bases:
Optimizer
- load_state_dict(state_dict)[source]¶
Load an optimizer state dict.
In general we should prefer the configuration of the existing optimizer instance (e.g., learning rate) over that found in the state_dict. This allows us to resume training from a checkpoint using a new set of optimizer args.
- backward(loss, update_main_grads=False, retain_graph=False)[source]¶
Computes the sum of gradients of the given tensor w.r.t. graph leaves.
Compared to
fairseq.optim.FairseqOptimizer.backward()
, this function additionally dynamically scales the loss to avoid gradient underflow.
- property loss_scale¶
Convenience function which TorchAgent calls to get current scale value.
- class parlai.utils.fp16.DynamicLossScaler(init_scale: float = 32768.0, scale_factor: float = 2.0, scale_window: int = 2000, tolerance: float = 0.0, threshold: Optional[float] = None)[source]¶
Bases:
object
Dynamically adjusts the loss scaling factor.
Dynamic loss scalers are important in mixed-precision training. They help us avoid underflows and overflows in low-precision gradients.
See here for information: <https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#lossscaling>
Shamelessly stolen and adapted from Fairseq. <https://github.com/pytorch/fairseq/blob/main/fairseq/optim/fp16_optimizer.py>
- __init__(init_scale: float = 32768.0, scale_factor: float = 2.0, scale_window: int = 2000, tolerance: float = 0.0, threshold: Optional[float] = None)[source]¶
- Parameters
init_scale – Initial loss scale.
scale_factor – Factor by which to increase or decrease loss scale.
scale_window – If we do not experience overflow in scale_window iterations, loss scale will increase by scale_factor.
tolerance – Pct of iterations that have overflowed after which we must decrease the loss scale
threshold – If not None, loss scale will decrease below this threshold
- class parlai.utils.fp16.MemoryEfficientFP16Optimizer(init_optimizer: Optimizer, aggregate_gnorms: bool = False, loss_initial_scale: float = 131072.0, min_loss_scale: float = 0.0001)[source]¶
Bases:
Optimizer
Wrap an optimizer to perform memory-efficient mixed precision training.
This class wraps an optimizer to perform FP16 training. This implementation is heavily based on the Fairseq implementation of MemoryEfficientFP16Optimizer, which can be found here: <https://github.com/pytorch/fairseq/blob/main/fairseq/optim/fp16_optimizer.py#L382>
This allows you to train bigger models on a single GPU, but can be unstable. Prefer the SafeFP16 implementation if you do not have concerns about memory.
- Parameters
params – Model parameters
optimizer – Any torch optimizer
loss_initial_scale (float) – Initial loss scaling. Default chosen empirically, but models with very low or high loss values may need this adjusted. Stick with powers of 2
min_loss_scale (float) – Throws an error if your loss scale goes below this threshold
- __init__(init_optimizer: Optimizer, aggregate_gnorms: bool = False, loss_initial_scale: float = 131072.0, min_loss_scale: float = 0.0001)[source]¶
- property params¶
Return an iterable of the parameters held by the optimizer.
- add_param_group(param_group)[source]¶
Add a param group to the
Optimizer
s param_groups.This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the
Optimizer
as training progresses.- Args:
- param_group (dict): Specifies what Tensors should be optimized along with group
specific optimization options.
- clip_main_grads(gradient_clip)[source]¶
Clips gradient norm and updates dynamic loss scaler.
Returns -1 if the most recently computed gradients overflowed.
- backward(loss, update_main_grads=False)[source]¶
Computes the sum of gradients of the given tensor w.r.t. graph leaves.
Compared to a regular backwards call , this function dynamically scales the loss to avoid gradient underflow.
- load_state_dict(state_dict)[source]¶
Load an optimizer state dict.
Override from PyTorch implementation to avoid casting to FP32.
- property loss_scale¶
Convenience function which TorchAgent calls to get current scale value.
- class parlai.utils.fp16.MemoryEfficientFP16Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False, *, foreach: Optional[bool] = None, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: Optional[bool] = None)[source]¶
Bases:
Adam
Override from Pytorch implementation to ensure aggregations done in FP32.
- class parlai.utils.fp16.Adafactor(params, lr=None, eps=(1e-30, 0.001), clip_threshold=1.0, decay_rate=-0.8, beta1=None, weight_decay=0.0, warmup_init=False)[source]¶
Bases:
Optimizer
Implements Adafactor algorithm.
This implementation is based on: Adafactor: Adaptive Learning Rates with Sublinear Memory Cost (see https://arxiv.org/abs/1804.04235)
Taken from the fairseq implementation, which can be found here: <https://github.com/pytorch/fairseq/blob/main/fairseq/optim/adafactor.py>.
- Parameters
(iterable) (params) – iterable of parameters to optimize or dicts defining parameter groups
optional) (weight_decay (float,) – external learning rate (default: None)
float]) (eps (tuple[float,) – regularization constans for square gradient and parameter scale respectively (default: (1e-30, 1e-3))
(float) (beta1) – threshold of root mean square of final gradient update (default: 1.0)
(float) – coefficient used to compute running averages of square gradient (default: -0.8)
(float) – coefficient used for computing running averages of gradient (default: None)
optional) – weight decay (L2 penalty) (default: 0)
(bool) (warmup_init) – if true, learning rate is scaled by root mean square of parameter (default: True)
(bool) – if true, time-dependent learning rate is computed instead of external learning rate (default: True)
(bool) – time-dependent learning rate computation depends on whether warm-up initialization is being used (default: False)
parlai.utils.logging¶
parlai.utils.misc¶
File for miscellaneous utility functions and constants.
- parlai.utils.misc.maintain_dialog_history(history, observation, reply='', historyLength=1, useReplies='label_else_model', dict=None, useStartEndIndices=True, splitSentences=False)[source]¶
Keep track of dialog history, up to a truncation length.
Either includes replies from the labels, model, or not all using param ‘replies’.
DEPRECATED. USE PARLAI.CORE.TORCH_AGENT INSTEAD.
- parlai.utils.misc.load_cands(path, lines_have_ids=False, cands_are_replies=False)[source]¶
Load global fixed set of candidate labels that the teacher provides.
Every example will include these as candidates. The true labels for a specific example are also added to this set, so that it’s possible to get the right answer.
- class parlai.utils.misc.TimeLogger[source]¶
Bases:
object
Class for logging time progress against a goal.
- log(done, total, report=None)[source]¶
Log report, time elapsed, and percentage progress towards goal.
- Parameters
done – number of examples completed so far
total – total number of elements to be completed. if total > 0, calculates the time remaining and percentage complete.
report – dict of pairs to log
- Returns
tuple log string, log dict log string contains time elapsed and string representation of the log dict log dict contains pairs of all items to log, which includes percentage complete and projected time left if total > 0
- class parlai.utils.misc.AttrDict(*args, **kwargs)[source]¶
Bases:
dict
Helper class to have a dict-like object with dot access.
For example, instead of d = {‘key’: ‘value’} use d = AttrDict(key=’value’). To access keys, instead of doing d[‘key’] use d.key.
While this has some limitations on the possible keys (for example, do not set the key items or you will lose access to the items() method), this can make some code more clear.
- parlai.utils.misc.float_formatter(f: Union[float, int]) str [source]¶
Format a float as a pretty string.
- parlai.utils.misc.nice_report(report) str [source]¶
Render an agent Report as a beautiful string.
If pandas is installed, we will use it to render as a table. Multitask metrics will be shown per row, e.g.
If pandas is not available, we will use a dict with like-metrics placed next to each other.
- parlai.utils.misc.round_sigfigs(x: Union[float, Tensor], sigfigs=4) float [source]¶
Round value to specified significant figures.
- Parameters
x – input number
sigfigs – number of significant figures to return
- Returns
float number rounded to specified sigfigs
- parlai.utils.misc.display_messages(msgs: List[Dict[str, Any]], prettify: bool = False, ignore_agent_reply: bool = False, add_fields: str = '', max_len: int = 1000, verbose: bool = False) Optional[str] [source]¶
Return a string describing the set of messages provided.
If prettify is true, candidates are displayed using prettytable. add_fields provides a list of fields in the msgs which should be displayed if verbose is off.
- parlai.utils.misc.str_to_msg(txt, ignore_fields='')[source]¶
Convert formatted string to ParlAI message dict.
- Parameters
txt – formatted string to convert. String format is tab-separated fields, with colon separating field name and contents.
ignore_fields – (default ‘’) comma-separated field names to not include in the msg dict even if they’re in the string.
- parlai.utils.misc.msg_to_str(msg, ignore_fields='')[source]¶
Convert ParlAI message dict to string.
- Parameters
msg – dict to convert into a string.
ignore_fields – (default ‘’) comma-separated field names to not include in the string even if they’re in the msg dict.
- parlai.utils.misc.set_namedtuple_defaults(namedtuple, default=None)[source]¶
Set all of the fields for a given nametuple to a singular value.
Additionally removes the default docstring for each field. Modifies the tuple in place, but returns it anyway.
More info: https://stackoverflow.com/a/18348004
- Parameters
namedtuple – A constructed collections.namedtuple
default – The default value to set.
- Returns
the modified namedtuple
- parlai.utils.misc.warn_once(msg: str) None [source]¶
Log a warning, but only once.
- Parameters
msg (str) – Message to display
parlai.utils.pickle¶
ParlAI’s custom unpickler.
As modules move around or are renamed, it old torch model files become invalid, since they look for modules in all the wrong places. Furthermore, we occasionally use APEX for performance reasons, but we don’t want to outright die if the user has not installed it.
This module is to handle both of these issues. It is used like this:
>>> import parlai.utils.pickle
>>> state_dict = torch.load(filename, pickle_module=parlai.utils.pickle)
parlai.utils.safety¶
Utility functions and classes for detecting offensive language.
- class parlai.utils.safety.OffensiveLanguageClassifier(shared: Optional[TShared] = None, custom_model_file='zoo:dialogue_safety/single_turn/model')[source]¶
Bases:
object
Load model trained to detect offensive language in the context of single- turn dialogue utterances.
This model was trained to be robust to adversarial examples created by humans. See <http://parl.ai/projects/dialogue_safety/> for more information.
- class parlai.utils.safety.OffensiveStringMatcher(datapath: Optional[str] = None)[source]¶
Bases:
object
Detects offensive language using a list of offensive language and phrases from https://github.com/LDNOOBW.
- __init__(datapath: Optional[str] = None)[source]¶
Get data from external sources and build data representation.
If datapath ends in ‘.txt’ it is assumed a custom model file is already given.
parlai.utils.strings¶
Utility functions and classes for handling text strings.
parlai.utils.testing¶
General utilities for helping writing ParlAI unit and integration tests.
- parlai.utils.testing.skipUnlessTorch(testfn, reason='pytorch is not installed')[source]¶
Decorate a test to skip if torch is not installed.
- parlai.utils.testing.skipIfGPU(testfn, reason='Test is CPU-only')[source]¶
Decorate a test to skip if a GPU is available.
Useful for disabling hogwild tests.
- parlai.utils.testing.skipUnlessGPU(testfn, reason='Test requires a GPU')[source]¶
Decorate a test to skip if no GPU is available.
- parlai.utils.testing.skipUnlessBPE(testfn, reason='Test requires subword NMT')[source]¶
Decorate a test to skip if BPE is not installed.
- parlai.utils.testing.skipIfCircleCI(testfn, reason='Test disabled in CircleCI')[source]¶
Decorate a test to skip if running on CircleCI.
- parlai.utils.testing.skipUnlessVision(testfn, reason='torchvision not installed')[source]¶
Decorate a test to skip unless torchvision is installed.
- parlai.utils.testing.skipUnlessFairseq(testfn, reason='fairseq not installed')[source]¶
Decorate a test to skip unless fairseq is installed.
- parlai.utils.testing.skipUnlessMephisto(testfn, reason='mephisto not installed')[source]¶
Decorate a test to skip unless mephisto is installed.
- parlai.utils.testing.skipUnlessClearML(testfn, reason='clearml not installed')[source]¶
Decorate a test to skip unless clearml is installed.
- class parlai.utils.testing.retry(ntries=3, log_retry=False)[source]¶
Bases:
object
Decorator for flaky tests. Test is run up to ntries times, retrying on failure.
- Parameters
ntries – the number of tries to attempt
log_retry – if True, prints to stdout on retry to avoid being seen as “hanging”
On the last time, the test will simply fail.
>>> @retry(ntries=10) ... def test_flaky(self): ... import random ... self.assertLess(0.5, random.random())
- parlai.utils.testing.git_ls_files(root=None, skip_nonexisting=True)[source]¶
List all files tracked by git.
- parlai.utils.testing.git_changed_files(skip_nonexisting=True)[source]¶
List all the changed files in the git repository.
- Parameters
skip_nonexisting (bool) – If true, ignore files that don’t exist on disk. This is useful for disregarding files created in main, but don’t exist in HEAD.
- parlai.utils.testing.git_commit_messages()[source]¶
Output each commit message between here and main.
- parlai.utils.testing.is_new_task_filename(filename)[source]¶
Check if a given filename counts as a new task.
Used in tests and test triggers, and only here to avoid redundancy.
- parlai.utils.testing.capture_output()[source]¶
Suppress all logging output into a single buffer.
Use as a context manager.
>>> with capture_output() as output: ... print('hello') >>> output.getvalue() 'hello'
- parlai.utils.testing.tempdir()[source]¶
Create a temporary directory.
Use as a context manager so the directory is automatically cleaned up.
>>> with tempdir() as tmpdir: ... print(tmpdir) # prints a folder like /tmp/randomname
- parlai.utils.testing.timeout(time: int = 30)[source]¶
Raise a timeout if a function does not return in time time.
Use as a context manager, so that the signal class can reset it’s alarm for SIGALARM
- Parameters
time (int) – Time in seconds to wait for timeout. Default is 30 seconds.
- parlai.utils.testing.train_model(opt: Opt) Tuple[Dict[str, Any], Dict[str, Any]] [source]¶
Run through a TrainLoop.
If model_file is not in opt, then this helper will create a temporary directory to store the model, dict, etc.
- Returns
(stdout, valid_results, test_results)
- Return type
(str, dict, dict)
- parlai.utils.testing.eval_model(opt, skip_valid=False, skip_test=False, valid_datatype='valid', test_datatype='test')[source]¶
Run through an evaluation loop.
- Parameters
opt – Any non-default options you wish to set.
skip_valid (bool) – If true skips the valid evaluation, and the first return value will be None.
skip_test (bool) – If true skips the test evaluation, and the second return value will be None.
valid_datatype (str) – If custom datatype required for valid, e.g. train:evalmode, specify here
- Returns
(valid_results, test_results)
- Return type
(dict, dict)
If model_file is not in opt, then this helper will create a temporary directory to store the model files, and clean up afterwards. You can keep the directory by disabling autocleanup
parlai.utils.torch¶
Utility methods for dealing with torch code.
- parlai.utils.torch.neginf(dtype: dtype) float [source]¶
Return a representable finite number near -inf for a dtype.
- parlai.utils.torch.atomic_save(state_dict: Any, path: str) None [source]¶
Like torch.save, but atomic.
Useful for preventing trouble coming from being pre-empted or killed while writing to disk. Works by writing to a temporary file, and then renaming the file to the final name.
- parlai.utils.torch.padded_tensor(items: List[Union[List[int], LongTensor]], pad_idx: int = 0, left_padded: bool = False, max_len: Optional[int] = None, fp16friendly: bool = False) Tuple[LongTensor, List[int]] [source]¶
Create a padded matrix from an uneven list of lists.
Returns (padded, lengths), where padded is the padded matrix, and lengths is a list containing the lengths of each row.
Matrix is right-padded (filled to the right) by default, but can be left padded if the flag is set to True.
Matrix can also be placed on cuda automatically.
- Parameters
items (list[iter[int]]) – List of items
sort (bool) – If True, orders by the length
pad_idx (int) – the value to use for padding
left_padded (bool) –
max_len (int) – if None, the max length is the maximum item length
fp16friendly (bool) – if True, pads the time dimension to be a multiple of 4.
- Returns
(padded, lengths) tuple
- Return type
(Tensor[int64], list[int])
- parlai.utils.torch.padded_3d(tensors: List[LongTensor], pad_idx: int = 0, dtype: Optional[dtype] = torch.int64, fp16friendly: bool = False)[source]¶
Make 3D padded tensor for list of lists of 1D tensors or lists.
Will keep items on the same device as originally.
- Parameters
tensors – list of lists of 1D tensors (or lists)
pad_idx – padding to fill tensor with
fp16friendly (bool) – if True, pads the final dimension to be a multiple of 8.
- Returns
3D tensor with the maximum dimensions of the inputs
- parlai.utils.torch.concat_without_padding(text_idx, cand_idx, use_cuda, null_idx=0)[source]¶
Concatenate two right padded tensors and move padding to the right.
- For example,
if text_idx = [[1, 2, 3, 4, 0, 0 ]] and cand_idx = [[5, 6, 7, 8, 0, 0 ]]:
- Then result = (tokens, segments) where
tokens = [[1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0]] segments = [[0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0]]
- parlai.utils.torch.argsort(keys: List[Any], *lists: List[List[Any]], descending: bool = False)[source]¶
Reorder each list in lists by the (descending) sorted order of keys.
- Parameters
keys (iter) – Keys to order by.
lists (list[list]) – Lists to reordered by keys’s order. Correctly handles lists and 1-D tensors.
descending (bool) – Use descending order if true.
- Returns
The reordered items.
- parlai.utils.torch.compute_grad_norm(parameters, norm_type=2.0)[source]¶
Compute norm over gradients of model parameters.
- Parameters
parameters – the model parameters for gradient norm calculation. Iterable of Tensors or single Tensor
norm_type – type of p-norm to use
- Returns
the computed gradient norm
- class parlai.utils.torch.IdentityLayer(*args, **kwargs)[source]¶
Bases:
Module
Identity layer module.
Useful for decoder-only Torch Generator agents.
- parlai.utils.torch.total_parameters(model: Module) int [source]¶
Count the total number of parameters in the model.
- Parameters
model – the model whose parameters we wish to count.
- Returns
total number of parameters in the model.
- parlai.utils.torch.trainable_parameters(model: Module) int [source]¶
Count the total number of trainable parameters in the model.
- Parameters
model – the model whose parameters we wish to count.
- Returns
total number of trainable parameters in the model.
- class parlai.utils.torch.PipelineWorkItem(chunk_idx, layer_nos, next_device)¶
Bases:
tuple
- chunk_idx¶
Alias for field number 0
- layer_nos¶
Alias for field number 1
- next_device¶
Alias for field number 2
- class parlai.utils.torch.PipelineHelper[source]¶
Bases:
object
PipelineHelper assists with implementing pipelining in model parallelism.
For a tutorial on model parallelism, as it’s implemented in parts of ParlAI, see https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html.
Usage: >>> my_model = PipelineHelper().make_parallel(my_model)
Note that you will need to manually implement logic which handles the moved layers.
- check_compatibility(opt)[source]¶
Check compatibility for opts.
Really just used to raise an error message if the user mixes multiprocessing and model parallelism.
- make_parallel(model: Module) Module [source]¶
Allocate specific layers in a model to be ModelParallel.
Limited to only ModuleLists within the model. Uses some heuristics to attempt to evenly distribute layers across GPUs, in order to balance memory usage. They are:
Assume the 0th GPU will host the optimizer, word embeddings, etc.
Assume activation memory is linear with the number of parameters.
All layers are approximately equal in size.
- static guess_split_size(item: Chunk, num_gpus: Optional[int] = None, dim=0) int [source]¶
Estimate the number of chunks we should split the batch into via heuristics.
- static split(item: Chunk, split_size: Optional[int] = None, dim=0) List[Chunk] [source]¶
Split a tensor or group of tensors into smaller chunks of the same type.
- Parameters
item – The item being split. May be a Tensor, a tuple of Tensors, or a dictionary mapping str -> Tensor.
split_size – The maximum size of each output chunk. If None, we will guess using heuristics
dim – The dimension to split along.
- static join(items: List[Chunk], dim=0) Chunk [source]¶
Join chunks back together, the inverse of split.
- Parameters
items – All the output chunks. Each chunk may be a tensor or a group of tensors.
dim – The dimension to join along.
- static chunk_to(chunk: Chunk, device: str) Chunk [source]¶
Move the chunk to the device.
Handles chunks which are groups of tensors.
- static schedule_work_items(layers: ModuleList, chunks: List[Chunk])[source]¶
Iterate through chunks and layers that should be pipelined.
Each iteration of this generator yields the following properties:
layer_nos: a list of indices of layers for you to forward through
chunk_idx: the index of the chunk we are manipulating. Use this if you need to update chunk representations.
next_device: where the chunk should be moved to AFTER the layer computation is done.
parlai.utils.typing¶
Definitions of general ParlAI types.
- parlai.utils.typing.TScalar¶
ParlAI type to represent an object that is theoretically expressible as a scalar value. Ints and floats are clearly scalars, and torch.Tensors can be represented by a scalar if Tensor.numel() == 1. Used as input type for classes derived from Metric.
alias of
Union
[int
,float
,Tensor
]