Output Reranker¶

This agent provides an abstract implementation of a “re-ranker” object, as well as an abstract implementation of a TransformerGeneratorAgent that utilizes the re-ranker. A re-ranker can be used to re-rank outputs according to some other model’s predictor score. The below steps outline how to build a re-ranker for your task.

How to build your own re-ranker.¶

1. Train a classifier or ranker model.¶

The first step is to train a model – e.g. transformer/biencoder, transformer/polyencoder, or transformer/classifier – on a desired classification or ranking task.

2. Subclass `AbstractReranker`¶

To create your task-specific re-ranker, you can subclass the AbstractReranker in reranker.py, and implement the following methods:

get_class_to_rerank_for(observation: Message, full_context: str) -> str –> This function will return the target class that the re-ranker should aim to maximize. In a contradiction setting, this might be entails.
is_context(utt: str) -> bool –> This function will return whether a given utterance is an element of the “context” given to a model. This varies for different tasks; for example, in ConvAI2, this may return True for an utterance with “your persona: …”; for LIGHT, this would return True for any utterance describing the setting or the characters.
get_predictor_label_candidates(observation: Message, context: str) -> List[str] –> This function will return the candidates the re-ranker must rank/classify, given an incoming context and observation.

3. Subclass `AbstractGeneratorRerankAgent`¶

Finally, subclass the AbstractGeneratorRerankAgent in reranker.py, and implement one method:

get_reranker_class() –> This method returns the class for the re-ranker.

Case study: Classifier Re-Ranking.¶

If you want, you can use a standard classifier for re-ranking, where the classifier takes the candidate outputs and chooses based on maximizing the probability of a given provided class.
This is already implemented in classifier_reranker.py in this directory, which can thus be used via the flags “-m reranker/classifier_reranker –target-label positive_class_name”.

Case study: LIGHT RPA Re-Ranking.¶

1. Train a classifier or ranker model.¶

For the LIGHT RPA Re-ranking task, the goal is to train a classifier than can predict which character is speaking in conversation. To do so, we train a poly-encoder on the RPA task:

parlai train_model \
-m transformer/polyencoder --init-model zoo:pretrained_transformers/poly_model_huge_reddit/model \
-t projects.light_whoami.task.agents.WhoIsSpeakingLeftToRightTeacher ...

2. Subclass `AbstractReranker`.¶

In this file, we implement the RPAReranker object, subclassing AbstractReranker.

get_class_to_rerank_for –> this extracts the self character from the context.
is_context –> returns True for any line starting with _ (indicating LIGHT context)
get_predictor_label_candidates –> extracts character names from the conversation, and returns the list.

3. Subclass `AbstractGeneratorRerankAgent`¶

In the same file as above, we implement RPARerankAgent, which only implements the get_reranker_class to return RPAReranker as built in step 2.

AbstractGeneratorRerankAgent Options¶

optional arguments

Argument	Description
`--gpu-beam-blocking`	Set to use CUDA kernel for beam search ngram blocking Default: `False`.
`--verbose-topk`	Return the topk logits in the act message, if verbose mode is set. Default: `-1`.

Transformer Arguments

Argument	Description
`--embedding-size`, `--esz`	Size of all embedding layers. Must be a multiple of –n-heads. Default: `300`.
`--n-layers`, `--nl`	Number of transformer layers. Default: `2`.
`--ffn-size`, `--hid`	Hidden size of the FFN layers Default: `300`.
`--dropout`	Dropout used around embeddings and before layer layer normalizations. This is used in Vaswani 2017 and works well on large datasets. Default: `0.0`.
`--attention-dropout`	Dropout used after attention softmax. This is not used in Vaswani 2017. Default: `0.0`.
`--relu-dropout`	Dropout used after the ReLU in the FFN. Not used in Vaswani 2017, but used in Tensor2Tensor. Default: `0.0`.
`--n-heads`	Number of multihead attention heads Default: `2`.
`--learn-positional-embeddings`	If off, sinusoidal embeddings are used. If on, position embeddings are learned from scratch. Default: `False`.
`--embeddings-scale`	Default: `True`.
`--n-segments`	The number of segments that support the model. If zero no segment and no langs_embedding. Default: `0`.
`--variant`	Chooses locations of layer norms, etc. prelayernorm is used to match some fairseq models Choices: `xlm`, `prelayernorm`, `bart`, `aiayn`. Default: `aiayn`. Recommended: `xlm`.
`--activation`	Nonlinear activation to use. AIAYN uses relu, but more recent papers prefer gelu. Choices: `gelu`, `relu`. Default: `relu`. Recommended: `gelu`.
`--output-scaling`	Scale the output of every transformer by this quantity. Default: `1.0`.
`--share-word-embeddings`	Share word embeddings table for candidate and contextin the memory network Default: `True`.
`--n-encoder-layers`, `--nel`	This will overidde the n-layers for asymmetrical transformers Default: `-1`.
`--n-decoder-layers`, `--ndl`	This will overidde the n-layers for asymmetrical transformers Default: `-1`.
`--model-parallel`	Shard the layers across multiple GPUs. Default: `False`.
`--checkpoint-activations`	Recompute activations on backward pass to conserve memory. Default: `False`.

Torch Generator Agent

Argument	Description
`--beam-size`	Beam size, if 1 then greedy search Default: `1`.
`--beam-min-length`	Minimum length of prediction to be generated by the beam search Default: `1`.
`--beam-context-block-ngram`	Size n-grams to block in beam search from the context. val <= 0 implies no blocking Default: `-1`.
`--beam-block-ngram`	Size n-grams to block in beam search. val <= 0 implies no blocking Default: `-1`.
`--beam-block-full-context`	Block n-grams from the full history context. Specify False to block up to m tokens in the past, where m is truncation parameter for agent Default: `True`.
`--beam-length-penalty`	Applies a length penalty. Set to 0 for no penalty. Default: `0.65`.
`--inference`	Generation algorithm Choices: `beam`, `nucleus`, `delayedbeam`, `greedy`, `delayednucleusbeam`, `topk`, `factual_nucleus`. Default: `greedy`.
`--topk`	K used in Top K sampling Default: `10`.
`--topp`	P used in nucleus sampling Default: `0.9`.
`--beam-delay`	Used in delayedbeam search Default: `30`.
`--lambda-decay`	Decay factor in factual nucleus sampling Default: `0.9`.
`--omega-bound`	Lower bound in factual nucleus sampling Default: `0.3`.
`--p-reset`	Whether to reset p value in factual nucleus at full stops Default: `True`.
`--beam-block-list-filename`	Load a text file of hard blocks for beam search to never say.
`--temperature`	Temperature to add during decoding Default: `1.0`.
`--compute-tokenized-bleu`	If true, compute tokenized bleu scores Default: `False`.

TorchAgent Arguments

Argument	Description
`--interactive-mode`, `--i`	Whether in full interactive mode or not, which means generating text or retrieving from a full set of candidates, which is necessary to actually do full dialogue. However, during training or quick validation (e.g. PPL for generation or ranking a few candidates for ranking models) you might want these set to off. Typically, scripts can set their preferred default behavior at the start, e.g. eval scripts. Default: `False`.
`--embedding-type`, `--emb`	Choose between different strategies for initializing word embeddings. Default is random, but can also preinitialize from Glove or Fasttext. Preinitialized embeddings can also be fixed so they are not updated during training. Choices: `random`, `glove`, `glove-fixed`, `fasttext`, `fasttext-fixed`, `fasttext_cc`, `fasttext_cc-fixed`. Default: `random`.
`--embedding-projection`, `--embp`	If pretrained embeddings have a different dimensionality than your embedding size, strategy for projecting to the correct size. If the dimensions are the same, this is ignored unless you append “-force” to your choice. Default: `random`.
`--fp16`	Use fp16 computations. Default: `False`.
`--fp16-impl`	Implementation of FP16 to use Choices: `safe`, `mem_efficient`. Default: `safe`.
`--rank-candidates`, `--rc`	Whether the model should parse candidates for ranking. Default: `False`.
`--truncate`, `--tr`	Truncate input lengths to increase speed / use less memory. Default: `-1`.
`--text-truncate`	Text input truncation length: if not specified, this will default to `truncate`
`--label-truncate`	Label truncation length: if not specified, this will default to `truncate`
`--history-reversed`	Reverse the history Default: `False`.
`--history-size`, `--histsz`	Number of past dialog utterances to remember. Default: `-1`.
`--person-tokens`, `--pt`	Add person tokens to history. adds p1 in front of input text and p2 in front of past labels when available or past utterances generated by the model. these are added to the dictionary during initialization. Default: `False`.
`--split-lines`	Split the dialogue history on newlines and save in separate vectors Default: `False`.
`--delimiter`	Join history lines with this token, defaults to newline Default: `\n`.
`--special-tok-lst`	Comma separated list of special tokens. In case of ambiguous parses from special tokens, the ordering provided in this arg sets precedence.
`-gpu`, `--gpu`	Which GPU to use Default: `-1`.
`--no-cuda`	Disable GPUs even if available. otherwise, will use GPUs if available on the device. Default: `False`.

Optimizer Arguments

Argument	Description
`--optimizer`, `--opt`	Optimizer choice. Possible values: adadelta, adagrad, adam, adamw, sparseadam, adamax, asgd, sgd, radam, rprop, rmsprop, optimizer, nadam, lbfgs, mem_eff_adam, adafactor. Choices: `adadelta`, `adagrad`, `adam`, `adamw`, `sparseadam`, `adamax`, `asgd`, `sgd`, `radam`, `rprop`, `rmsprop`, `optimizer`, `nadam`, `lbfgs`, `mem_eff_adam`, `adafactor`. Default: `sgd`.
`--learningrate`, `--lr`	Learning rate Default: `1`.
`--gradient-clip`, `--clip`	Gradient clipping using l2 norm Default: `0.1`.
`--adafactor-eps`	Epsilon values for adafactor optimizer: regularization constants for square gradient and parameter scale respectively Default: `1e-30,1e-3`. Recommended: `1e-30,1e-3`.
`--momentum`, `--mom`	If applicable, momentum value for optimizer. Default: `0`.
`--nesterov`	If applicable, whether to use nesterov momentum. Default: `True`.
`--nus`, `--nu`	If applicable, nu value(s) for optimizer. can use a single value like 0.7 or a comma-separated tuple like 0.7,1.0 Default: `0.7`.
`--betas`, `--beta`	If applicable, beta value(s) for optimizer. can use a single value like 0.9 or a comma-separated tuple like 0.9,0.999 Default: `0.9,0.999`.
`--weight-decay`, `--wdecay`	Weight decay on the weights.

BPEHelper Arguments

Argument	Description
`--bpe-vocab`	Path to pre-trained tokenizer vocab
`--bpe-merge`	Path to pre-trained tokenizer merge
`--bpe-dropout`	Use BPE dropout during training.

Learning Rate Scheduler

Argument	Description
`--lr-scheduler`	Learning rate scheduler. Choices: `reduceonplateau`, `none`, `fixed`, `invsqrt`, `cosine`, `linear`. Default: `reduceonplateau`.
`--lr-scheduler-patience`	LR scheduler patience. In number of validation runs. If using fixed scheduler, LR is decayed every validations. Default: `3`.
`--lr-scheduler-decay`	Decay factor for LR scheduler, or how much LR is multiplied by when it is lowered. Default: `0.5`.
`--invsqrt-lr-decay-gamma`	Constant used only to find the lr multiplier for the invsqrt scheduler. Must be set for –lr-scheduler invsqrt Default: `-1`.

Generator Rerank Agent

Argument	Description
`--inference-strategies`	Comma-separated list of inference strategies. if specified, re-rank over several inference strategies
`--debug-mode`	Specify to enable certain debugging procedures. Default: `False`.
`--inference-opt-key`	Specify inference opt key for dialogue response model Default: `inference`.

AbstractReranker Args

Argument	Description
`--normalize-candidates`	Remove spaces and add capitalization as per ParlAI normalize_reply() function Default: `False`.
`--predictor-model-file`	Path to model whose prediction score will be used to rerank, usually a classifier or ranker
`--reranker-strategy`	Which strategy to use when re-ranking response candidates. Choices: sum_scores,hard_choice,reranker_score,none Default: `reranker_score`.
`--reranker-delimiter`	Delimiter for the reranker

AbstractGpt2RerankAgent Options¶

optional arguments

Argument	Description
`--gpu-beam-blocking`	Set to use CUDA kernel for beam search ngram blocking Default: `False`.
`--verbose-topk`	Return the topk logits in the act message, if verbose mode is set. Default: `-1`.

Gpt2 Args

Argument	Description
`--model-name`	Any GPT-2 model names.
`--gpt2-size`	Which size model to initialize. Choices: `small`, `medium`, `large`, `xl`, `distilgpt2`. Default: `small`.
`--add-special-tokens`	Add special tokens (like PAD, etc.). If False, Can only use with batch size 1. Default: `True`.
`--add-start-token`	Add start tokens when finetuning. Default: `False`.

Torch Generator Agent

Argument	Description
`--beam-size`	Beam size, if 1 then greedy search Default: `1`.
`--beam-min-length`	Minimum length of prediction to be generated by the beam search Default: `1`.
`--beam-context-block-ngram`	Size n-grams to block in beam search from the context. val <= 0 implies no blocking Default: `-1`.
`--beam-block-ngram`	Size n-grams to block in beam search. val <= 0 implies no blocking Default: `-1`.
`--beam-block-full-context`	Block n-grams from the full history context. Specify False to block up to m tokens in the past, where m is truncation parameter for agent Default: `True`.
`--beam-length-penalty`	Applies a length penalty. Set to 0 for no penalty. Default: `0.65`.
`--inference`	Generation algorithm Choices: `beam`, `nucleus`, `delayedbeam`, `greedy`, `delayednucleusbeam`, `topk`, `factual_nucleus`. Default: `greedy`.
`--topk`	K used in Top K sampling Default: `10`.
`--topp`	P used in nucleus sampling Default: `0.9`.
`--beam-delay`	Used in delayedbeam search Default: `30`.
`--lambda-decay`	Decay factor in factual nucleus sampling Default: `0.9`.
`--omega-bound`	Lower bound in factual nucleus sampling Default: `0.3`.
`--p-reset`	Whether to reset p value in factual nucleus at full stops Default: `True`.
`--beam-block-list-filename`	Load a text file of hard blocks for beam search to never say.
`--temperature`	Temperature to add during decoding Default: `1.0`.
`--compute-tokenized-bleu`	If true, compute tokenized bleu scores Default: `False`.

TorchAgent Arguments

Argument	Description
`--interactive-mode`, `--i`	Whether in full interactive mode or not, which means generating text or retrieving from a full set of candidates, which is necessary to actually do full dialogue. However, during training or quick validation (e.g. PPL for generation or ranking a few candidates for ranking models) you might want these set to off. Typically, scripts can set their preferred default behavior at the start, e.g. eval scripts. Default: `False`.
`--embedding-type`, `--emb`	Choose between different strategies for initializing word embeddings. Default is random, but can also preinitialize from Glove or Fasttext. Preinitialized embeddings can also be fixed so they are not updated during training. Choices: `random`, `glove`, `glove-fixed`, `fasttext`, `fasttext-fixed`, `fasttext_cc`, `fasttext_cc-fixed`. Default: `random`.
`--embedding-projection`, `--embp`	If pretrained embeddings have a different dimensionality than your embedding size, strategy for projecting to the correct size. If the dimensions are the same, this is ignored unless you append “-force” to your choice. Default: `random`.
`--fp16`	Use fp16 computations. Default: `False`.
`--fp16-impl`	Implementation of FP16 to use Choices: `safe`, `mem_efficient`. Default: `safe`.
`--rank-candidates`, `--rc`	Whether the model should parse candidates for ranking. Default: `False`.
`--truncate`, `--tr`	Truncate input lengths to increase speed / use less memory. Default: `-1`.
`--text-truncate`	Text input truncation length: if not specified, this will default to `truncate` Default: `768`.
`--label-truncate`	Label truncation length: if not specified, this will default to `truncate` Default: `256`.
`--history-reversed`	Reverse the history Default: `False`.
`--history-size`, `--histsz`	Number of past dialog utterances to remember. Default: `-1`.
`--person-tokens`, `--pt`	Add person tokens to history. adds p1 in front of input text and p2 in front of past labels when available or past utterances generated by the model. these are added to the dictionary during initialization. Default: `False`.
`--split-lines`	Split the dialogue history on newlines and save in separate vectors Default: `False`.
`--delimiter`	Join history lines with this token, defaults to newline Default: `\n`.
`--special-tok-lst`	Comma separated list of special tokens. In case of ambiguous parses from special tokens, the ordering provided in this arg sets precedence.
`-gpu`, `--gpu`	Which GPU to use Default: `-1`.
`--no-cuda`	Disable GPUs even if available. otherwise, will use GPUs if available on the device. Default: `False`.

Optimizer Arguments

Argument	Description
`--optimizer`, `--opt`	Optimizer choice. Possible values: adadelta, adagrad, adam, adamw, sparseadam, adamax, asgd, sgd, radam, rprop, rmsprop, optimizer, nadam, lbfgs, mem_eff_adam, adafactor. Choices: `adadelta`, `adagrad`, `adam`, `adamw`, `sparseadam`, `adamax`, `asgd`, `sgd`, `radam`, `rprop`, `rmsprop`, `optimizer`, `nadam`, `lbfgs`, `mem_eff_adam`, `adafactor`. Default: `sgd`.
`--learningrate`, `--lr`	Learning rate Default: `1`.
`--gradient-clip`, `--clip`	Gradient clipping using l2 norm Default: `0.1`.
`--adafactor-eps`	Epsilon values for adafactor optimizer: regularization constants for square gradient and parameter scale respectively Default: `1e-30,1e-3`. Recommended: `1e-30,1e-3`.
`--momentum`, `--mom`	If applicable, momentum value for optimizer. Default: `0`.
`--nesterov`	If applicable, whether to use nesterov momentum. Default: `True`.
`--nus`, `--nu`	If applicable, nu value(s) for optimizer. can use a single value like 0.7 or a comma-separated tuple like 0.7,1.0 Default: `0.7`.
`--betas`, `--beta`	If applicable, beta value(s) for optimizer. can use a single value like 0.9 or a comma-separated tuple like 0.9,0.999 Default: `0.9,0.999`.
`--weight-decay`, `--wdecay`	Weight decay on the weights.

BPEHelper Arguments

Argument	Description
`--bpe-vocab`	Path to pre-trained tokenizer vocab
`--bpe-merge`	Path to pre-trained tokenizer merge
`--bpe-dropout`	Use BPE dropout during training.

Learning Rate Scheduler

Argument	Description
`--lr-scheduler`	Learning rate scheduler. Choices: `reduceonplateau`, `none`, `fixed`, `invsqrt`, `cosine`, `linear`. Default: `reduceonplateau`.
`--lr-scheduler-patience`	LR scheduler patience. In number of validation runs. If using fixed scheduler, LR is decayed every validations. Default: `3`.
`--lr-scheduler-decay`	Decay factor for LR scheduler, or how much LR is multiplied by when it is lowered. Default: `0.5`.
`--invsqrt-lr-decay-gamma`	Constant used only to find the lr multiplier for the invsqrt scheduler. Must be set for –lr-scheduler invsqrt Default: `-1`.

Generator Rerank Agent

Argument	Description
`--inference-strategies`	Comma-separated list of inference strategies. if specified, re-rank over several inference strategies
`--debug-mode`	Specify to enable certain debugging procedures. Default: `False`.
`--inference-opt-key`	Specify inference opt key for dialogue response model Default: `inference`.

AbstractReranker Args

Argument	Description
`--normalize-candidates`	Remove spaces and add capitalization as per ParlAI normalize_reply() function Default: `False`.
`--predictor-model-file`	Path to model whose prediction score will be used to rerank, usually a classifier or ranker
`--reranker-strategy`	Which strategy to use when re-ranking response candidates. Choices: sum_scores,hard_choice,reranker_score,none Default: `reranker_score`.
`--reranker-delimiter`	Delimiter for the reranker

Gpt2Agent Options¶

optional arguments

Argument	Description
`--gpu-beam-blocking`	Set to use CUDA kernel for beam search ngram blocking Default: `False`.
`--verbose-topk`	Return the topk logits in the act message, if verbose mode is set. Default: `-1`.

Gpt2 Args

Argument	Description
`--model-name`	Any GPT-2 model names.
`--gpt2-size`	Which size model to initialize. Choices: `small`, `medium`, `large`, `xl`, `distilgpt2`. Default: `small`.
`--add-special-tokens`	Add special tokens (like PAD, etc.). If False, Can only use with batch size 1. Default: `True`.
`--add-start-token`	Add start tokens when finetuning. Default: `False`.

Torch Generator Agent

Argument	Description
`--beam-size`	Beam size, if 1 then greedy search Default: `1`.
`--beam-min-length`	Minimum length of prediction to be generated by the beam search Default: `1`.
`--beam-context-block-ngram`	Size n-grams to block in beam search from the context. val <= 0 implies no blocking Default: `-1`.
`--beam-block-ngram`	Size n-grams to block in beam search. val <= 0 implies no blocking Default: `-1`.
`--beam-block-full-context`	Block n-grams from the full history context. Specify False to block up to m tokens in the past, where m is truncation parameter for agent Default: `True`.
`--beam-length-penalty`	Applies a length penalty. Set to 0 for no penalty. Default: `0.65`.
`--inference`	Generation algorithm Choices: `beam`, `nucleus`, `delayedbeam`, `greedy`, `delayednucleusbeam`, `topk`, `factual_nucleus`. Default: `greedy`.
`--topk`	K used in Top K sampling Default: `10`.
`--topp`	P used in nucleus sampling Default: `0.9`.
`--beam-delay`	Used in delayedbeam search Default: `30`.
`--lambda-decay`	Decay factor in factual nucleus sampling Default: `0.9`.
`--omega-bound`	Lower bound in factual nucleus sampling Default: `0.3`.
`--p-reset`	Whether to reset p value in factual nucleus at full stops Default: `True`.
`--beam-block-list-filename`	Load a text file of hard blocks for beam search to never say.
`--temperature`	Temperature to add during decoding Default: `1.0`.
`--compute-tokenized-bleu`	If true, compute tokenized bleu scores Default: `False`.

TorchAgent Arguments

Argument	Description
`--interactive-mode`, `--i`	Whether in full interactive mode or not, which means generating text or retrieving from a full set of candidates, which is necessary to actually do full dialogue. However, during training or quick validation (e.g. PPL for generation or ranking a few candidates for ranking models) you might want these set to off. Typically, scripts can set their preferred default behavior at the start, e.g. eval scripts. Default: `False`.
`--embedding-type`, `--emb`	Choose between different strategies for initializing word embeddings. Default is random, but can also preinitialize from Glove or Fasttext. Preinitialized embeddings can also be fixed so they are not updated during training. Choices: `random`, `glove`, `glove-fixed`, `fasttext`, `fasttext-fixed`, `fasttext_cc`, `fasttext_cc-fixed`. Default: `random`.
`--embedding-projection`, `--embp`	If pretrained embeddings have a different dimensionality than your embedding size, strategy for projecting to the correct size. If the dimensions are the same, this is ignored unless you append “-force” to your choice. Default: `random`.
`--fp16`	Use fp16 computations. Default: `False`.
`--fp16-impl`	Implementation of FP16 to use Choices: `safe`, `mem_efficient`. Default: `safe`.
`--rank-candidates`, `--rc`	Whether the model should parse candidates for ranking. Default: `False`.
`--truncate`, `--tr`	Truncate input lengths to increase speed / use less memory. Default: `-1`.
`--text-truncate`	Text input truncation length: if not specified, this will default to `truncate` Default: `768`.
`--label-truncate`	Label truncation length: if not specified, this will default to `truncate` Default: `256`.
`--history-reversed`	Reverse the history Default: `False`.
`--history-size`, `--histsz`	Number of past dialog utterances to remember. Default: `-1`.
`--person-tokens`, `--pt`	Add person tokens to history. adds p1 in front of input text and p2 in front of past labels when available or past utterances generated by the model. these are added to the dictionary during initialization. Default: `False`.
`--split-lines`	Split the dialogue history on newlines and save in separate vectors Default: `False`.
`--delimiter`	Join history lines with this token, defaults to newline Default: `\n`.
`--special-tok-lst`	Comma separated list of special tokens. In case of ambiguous parses from special tokens, the ordering provided in this arg sets precedence.
`-gpu`, `--gpu`	Which GPU to use Default: `-1`.
`--no-cuda`	Disable GPUs even if available. otherwise, will use GPUs if available on the device. Default: `False`.

Optimizer Arguments

Argument	Description
`--optimizer`, `--opt`	Optimizer choice. Possible values: adadelta, adagrad, adam, adamw, sparseadam, adamax, asgd, sgd, radam, rprop, rmsprop, optimizer, nadam, lbfgs, mem_eff_adam, adafactor. Choices: `adadelta`, `adagrad`, `adam`, `adamw`, `sparseadam`, `adamax`, `asgd`, `sgd`, `radam`, `rprop`, `rmsprop`, `optimizer`, `nadam`, `lbfgs`, `mem_eff_adam`, `adafactor`. Default: `sgd`.
`--learningrate`, `--lr`	Learning rate Default: `1`.
`--gradient-clip`, `--clip`	Gradient clipping using l2 norm Default: `0.1`.
`--adafactor-eps`	Epsilon values for adafactor optimizer: regularization constants for square gradient and parameter scale respectively Default: `1e-30,1e-3`. Recommended: `1e-30,1e-3`.
`--momentum`, `--mom`	If applicable, momentum value for optimizer. Default: `0`.
`--nesterov`	If applicable, whether to use nesterov momentum. Default: `True`.
`--nus`, `--nu`	If applicable, nu value(s) for optimizer. can use a single value like 0.7 or a comma-separated tuple like 0.7,1.0 Default: `0.7`.
`--betas`, `--beta`	If applicable, beta value(s) for optimizer. can use a single value like 0.9 or a comma-separated tuple like 0.9,0.999 Default: `0.9,0.999`.
`--weight-decay`, `--wdecay`	Weight decay on the weights.

BPEHelper Arguments

Argument	Description
`--bpe-vocab`	Path to pre-trained tokenizer vocab
`--bpe-merge`	Path to pre-trained tokenizer merge
`--bpe-dropout`	Use BPE dropout during training.

Learning Rate Scheduler

Argument	Description
`--lr-scheduler`	Learning rate scheduler. Choices: `reduceonplateau`, `none`, `fixed`, `invsqrt`, `cosine`, `linear`. Default: `reduceonplateau`.
`--lr-scheduler-patience`	LR scheduler patience. In number of validation runs. If using fixed scheduler, LR is decayed every validations. Default: `3`.
`--lr-scheduler-decay`	Decay factor for LR scheduler, or how much LR is multiplied by when it is lowered. Default: `0.5`.
`--invsqrt-lr-decay-gamma`	Constant used only to find the lr multiplier for the invsqrt scheduler. Must be set for –lr-scheduler invsqrt Default: `-1`.

LongAbstractGeneratorRerankAgent Options¶

optional arguments

Argument	Description
`--gpu-beam-blocking`	Set to use CUDA kernel for beam search ngram blocking Default: `False`.
`--verbose-topk`	Return the topk logits in the act message, if verbose mode is set. Default: `-1`.

Transformer Arguments

Argument	Description
`--embedding-size`, `--esz`	Size of all embedding layers. Must be a multiple of –n-heads. Default: `300`.
`--n-layers`, `--nl`	Number of transformer layers. Default: `2`.
`--ffn-size`, `--hid`	Hidden size of the FFN layers Default: `300`.
`--dropout`	Dropout used around embeddings and before layer layer normalizations. This is used in Vaswani 2017 and works well on large datasets. Default: `0.0`.
`--attention-dropout`	Dropout used after attention softmax. This is not used in Vaswani 2017. Default: `0.0`.
`--relu-dropout`	Dropout used after the ReLU in the FFN. Not used in Vaswani 2017, but used in Tensor2Tensor. Default: `0.0`.
`--n-heads`	Number of multihead attention heads Default: `2`.
`--learn-positional-embeddings`	If off, sinusoidal embeddings are used. If on, position embeddings are learned from scratch. Default: `False`.
`--embeddings-scale`	Default: `True`.
`--n-segments`	The number of segments that support the model. If zero no segment and no langs_embedding. Default: `0`.
`--variant`	Chooses locations of layer norms, etc. prelayernorm is used to match some fairseq models Choices: `xlm`, `prelayernorm`, `bart`, `aiayn`. Default: `aiayn`. Recommended: `xlm`.
`--activation`	Nonlinear activation to use. AIAYN uses relu, but more recent papers prefer gelu. Choices: `gelu`, `relu`. Default: `relu`. Recommended: `gelu`.
`--output-scaling`	Scale the output of every transformer by this quantity. Default: `1.0`.
`--share-word-embeddings`	Share word embeddings table for candidate and contextin the memory network Default: `True`.
`--n-encoder-layers`, `--nel`	This will overidde the n-layers for asymmetrical transformers Default: `-1`.
`--n-decoder-layers`, `--ndl`	This will overidde the n-layers for asymmetrical transformers Default: `-1`.
`--model-parallel`	Shard the layers across multiple GPUs. Default: `False`.
`--checkpoint-activations`	Recompute activations on backward pass to conserve memory. Default: `False`.

Torch Generator Agent

Argument	Description
`--beam-size`	Beam size, if 1 then greedy search Default: `1`.
`--beam-min-length`	Minimum length of prediction to be generated by the beam search Default: `1`.
`--beam-context-block-ngram`	Size n-grams to block in beam search from the context. val <= 0 implies no blocking Default: `-1`.
`--beam-block-ngram`	Size n-grams to block in beam search. val <= 0 implies no blocking Default: `-1`.
`--beam-block-full-context`	Block n-grams from the full history context. Specify False to block up to m tokens in the past, where m is truncation parameter for agent Default: `True`.
`--beam-length-penalty`	Applies a length penalty. Set to 0 for no penalty. Default: `0.65`.
`--inference`	Generation algorithm Choices: `beam`, `nucleus`, `delayedbeam`, `greedy`, `delayednucleusbeam`, `topk`, `factual_nucleus`. Default: `greedy`.
`--topk`	K used in Top K sampling Default: `10`.
`--topp`	P used in nucleus sampling Default: `0.9`.
`--beam-delay`	Used in delayedbeam search Default: `30`.
`--lambda-decay`	Decay factor in factual nucleus sampling Default: `0.9`.
`--omega-bound`	Lower bound in factual nucleus sampling Default: `0.3`.
`--p-reset`	Whether to reset p value in factual nucleus at full stops Default: `True`.
`--beam-block-list-filename`	Load a text file of hard blocks for beam search to never say.
`--temperature`	Temperature to add during decoding Default: `1.0`.
`--compute-tokenized-bleu`	If true, compute tokenized bleu scores Default: `False`.

TorchAgent Arguments

Argument	Description
`--interactive-mode`, `--i`	Whether in full interactive mode or not, which means generating text or retrieving from a full set of candidates, which is necessary to actually do full dialogue. However, during training or quick validation (e.g. PPL for generation or ranking a few candidates for ranking models) you might want these set to off. Typically, scripts can set their preferred default behavior at the start, e.g. eval scripts. Default: `False`.
`--embedding-type`, `--emb`	Choose between different strategies for initializing word embeddings. Default is random, but can also preinitialize from Glove or Fasttext. Preinitialized embeddings can also be fixed so they are not updated during training. Choices: `random`, `glove`, `glove-fixed`, `fasttext`, `fasttext-fixed`, `fasttext_cc`, `fasttext_cc-fixed`. Default: `random`.
`--embedding-projection`, `--embp`	If pretrained embeddings have a different dimensionality than your embedding size, strategy for projecting to the correct size. If the dimensions are the same, this is ignored unless you append “-force” to your choice. Default: `random`.
`--fp16`	Use fp16 computations. Default: `False`.
`--fp16-impl`	Implementation of FP16 to use Choices: `safe`, `mem_efficient`. Default: `safe`.
`--rank-candidates`, `--rc`	Whether the model should parse candidates for ranking. Default: `False`.
`--truncate`, `--tr`	Truncate input lengths to increase speed / use less memory. Default: `-1`.
`--text-truncate`	Text input truncation length: if not specified, this will default to `truncate`
`--label-truncate`	Label truncation length: if not specified, this will default to `truncate`
`--history-reversed`	Reverse the history Default: `False`.
`--history-size`, `--histsz`	Number of past dialog utterances to remember. Default: `-1`.
`--person-tokens`, `--pt`	Add person tokens to history. adds p1 in front of input text and p2 in front of past labels when available or past utterances generated by the model. these are added to the dictionary during initialization. Default: `False`.
`--split-lines`	Split the dialogue history on newlines and save in separate vectors Default: `False`.
`--delimiter`	Join history lines with this token, defaults to newline Default: `\n`.
`--special-tok-lst`	Comma separated list of special tokens. In case of ambiguous parses from special tokens, the ordering provided in this arg sets precedence.
`-gpu`, `--gpu`	Which GPU to use Default: `-1`.
`--no-cuda`	Disable GPUs even if available. otherwise, will use GPUs if available on the device. Default: `False`.

Optimizer Arguments

Argument	Description
`--optimizer`, `--opt`	Optimizer choice. Possible values: adadelta, adagrad, adam, adamw, sparseadam, adamax, asgd, sgd, radam, rprop, rmsprop, optimizer, nadam, lbfgs, mem_eff_adam, adafactor. Choices: `adadelta`, `adagrad`, `adam`, `adamw`, `sparseadam`, `adamax`, `asgd`, `sgd`, `radam`, `rprop`, `rmsprop`, `optimizer`, `nadam`, `lbfgs`, `mem_eff_adam`, `adafactor`. Default: `sgd`.
`--learningrate`, `--lr`	Learning rate Default: `1`.
`--gradient-clip`, `--clip`	Gradient clipping using l2 norm Default: `0.1`.
`--adafactor-eps`	Epsilon values for adafactor optimizer: regularization constants for square gradient and parameter scale respectively Default: `1e-30,1e-3`. Recommended: `1e-30,1e-3`.
`--momentum`, `--mom`	If applicable, momentum value for optimizer. Default: `0`.
`--nesterov`	If applicable, whether to use nesterov momentum. Default: `True`.
`--nus`, `--nu`	If applicable, nu value(s) for optimizer. can use a single value like 0.7 or a comma-separated tuple like 0.7,1.0 Default: `0.7`.
`--betas`, `--beta`	If applicable, beta value(s) for optimizer. can use a single value like 0.9 or a comma-separated tuple like 0.9,0.999 Default: `0.9,0.999`.
`--weight-decay`, `--wdecay`	Weight decay on the weights.

BPEHelper Arguments

Argument	Description
`--bpe-vocab`	Path to pre-trained tokenizer vocab
`--bpe-merge`	Path to pre-trained tokenizer merge
`--bpe-dropout`	Use BPE dropout during training.

Learning Rate Scheduler

Argument	Description
`--lr-scheduler`	Learning rate scheduler. Choices: `reduceonplateau`, `none`, `fixed`, `invsqrt`, `cosine`, `linear`. Default: `reduceonplateau`.
`--lr-scheduler-patience`	LR scheduler patience. In number of validation runs. If using fixed scheduler, LR is decayed every validations. Default: `3`.
`--lr-scheduler-decay`	Decay factor for LR scheduler, or how much LR is multiplied by when it is lowered. Default: `0.5`.
`--invsqrt-lr-decay-gamma`	Constant used only to find the lr multiplier for the invsqrt scheduler. Must be set for –lr-scheduler invsqrt Default: `-1`.

Generator Rerank Agent

Argument	Description
`--inference-strategies`	Comma-separated list of inference strategies. if specified, re-rank over several inference strategies
`--debug-mode`	Specify to enable certain debugging procedures. Default: `False`.
`--inference-opt-key`	Specify inference opt key for dialogue response model Default: `inference`.

AbstractReranker Args

Argument	Description
`--normalize-candidates`	Remove spaces and add capitalization as per ParlAI normalize_reply() function Default: `False`.
`--predictor-model-file`	Path to model whose prediction score will be used to rerank, usually a classifier or ranker
`--reranker-strategy`	Which strategy to use when re-ranking response candidates. Choices: sum_scores,hard_choice,reranker_score,none Default: `reranker_score`.
`--reranker-delimiter`	Delimiter for the reranker

TransformerGeneratorAgent Options¶

optional arguments

Argument	Description
`--gpu-beam-blocking`	Set to use CUDA kernel for beam search ngram blocking Default: `False`.
`--verbose-topk`	Return the topk logits in the act message, if verbose mode is set. Default: `-1`.

Transformer Arguments

Argument	Description
`--embedding-size`, `--esz`	Size of all embedding layers. Must be a multiple of –n-heads. Default: `300`.
`--n-layers`, `--nl`	Number of transformer layers. Default: `2`.
`--ffn-size`, `--hid`	Hidden size of the FFN layers Default: `300`.
`--dropout`	Dropout used around embeddings and before layer layer normalizations. This is used in Vaswani 2017 and works well on large datasets. Default: `0.0`.
`--attention-dropout`	Dropout used after attention softmax. This is not used in Vaswani 2017. Default: `0.0`.
`--relu-dropout`	Dropout used after the ReLU in the FFN. Not used in Vaswani 2017, but used in Tensor2Tensor. Default: `0.0`.
`--n-heads`	Number of multihead attention heads Default: `2`.
`--learn-positional-embeddings`	If off, sinusoidal embeddings are used. If on, position embeddings are learned from scratch. Default: `False`.
`--embeddings-scale`	Default: `True`.
`--n-segments`	The number of segments that support the model. If zero no segment and no langs_embedding. Default: `0`.
`--variant`	Chooses locations of layer norms, etc. prelayernorm is used to match some fairseq models Choices: `xlm`, `prelayernorm`, `bart`, `aiayn`. Default: `aiayn`. Recommended: `xlm`.
`--activation`	Nonlinear activation to use. AIAYN uses relu, but more recent papers prefer gelu. Choices: `gelu`, `relu`. Default: `relu`. Recommended: `gelu`.
`--output-scaling`	Scale the output of every transformer by this quantity. Default: `1.0`.
`--share-word-embeddings`	Share word embeddings table for candidate and contextin the memory network Default: `True`.
`--n-encoder-layers`, `--nel`	This will overidde the n-layers for asymmetrical transformers Default: `-1`.
`--n-decoder-layers`, `--ndl`	This will overidde the n-layers for asymmetrical transformers Default: `-1`.
`--model-parallel`	Shard the layers across multiple GPUs. Default: `False`.
`--checkpoint-activations`	Recompute activations on backward pass to conserve memory. Default: `False`.

Torch Generator Agent

Argument	Description
`--beam-size`	Beam size, if 1 then greedy search Default: `1`.
`--beam-min-length`	Minimum length of prediction to be generated by the beam search Default: `1`.
`--beam-context-block-ngram`	Size n-grams to block in beam search from the context. val <= 0 implies no blocking Default: `-1`.
`--beam-block-ngram`	Size n-grams to block in beam search. val <= 0 implies no blocking Default: `-1`.
`--beam-block-full-context`	Block n-grams from the full history context. Specify False to block up to m tokens in the past, where m is truncation parameter for agent Default: `True`.
`--beam-length-penalty`	Applies a length penalty. Set to 0 for no penalty. Default: `0.65`.
`--inference`	Generation algorithm Choices: `beam`, `nucleus`, `delayedbeam`, `greedy`, `delayednucleusbeam`, `topk`, `factual_nucleus`. Default: `greedy`.
`--topk`	K used in Top K sampling Default: `10`.
`--topp`	P used in nucleus sampling Default: `0.9`.
`--beam-delay`	Used in delayedbeam search Default: `30`.
`--lambda-decay`	Decay factor in factual nucleus sampling Default: `0.9`.
`--omega-bound`	Lower bound in factual nucleus sampling Default: `0.3`.
`--p-reset`	Whether to reset p value in factual nucleus at full stops Default: `True`.
`--beam-block-list-filename`	Load a text file of hard blocks for beam search to never say.
`--temperature`	Temperature to add during decoding Default: `1.0`.
`--compute-tokenized-bleu`	If true, compute tokenized bleu scores Default: `False`.

TorchAgent Arguments

Argument	Description
`--interactive-mode`, `--i`	Whether in full interactive mode or not, which means generating text or retrieving from a full set of candidates, which is necessary to actually do full dialogue. However, during training or quick validation (e.g. PPL for generation or ranking a few candidates for ranking models) you might want these set to off. Typically, scripts can set their preferred default behavior at the start, e.g. eval scripts. Default: `False`.
`--embedding-type`, `--emb`	Choose between different strategies for initializing word embeddings. Default is random, but can also preinitialize from Glove or Fasttext. Preinitialized embeddings can also be fixed so they are not updated during training. Choices: `random`, `glove`, `glove-fixed`, `fasttext`, `fasttext-fixed`, `fasttext_cc`, `fasttext_cc-fixed`. Default: `random`.
`--embedding-projection`, `--embp`	If pretrained embeddings have a different dimensionality than your embedding size, strategy for projecting to the correct size. If the dimensions are the same, this is ignored unless you append “-force” to your choice. Default: `random`.
`--fp16`	Use fp16 computations. Default: `False`.
`--fp16-impl`	Implementation of FP16 to use Choices: `safe`, `mem_efficient`. Default: `safe`.
`--rank-candidates`, `--rc`	Whether the model should parse candidates for ranking. Default: `False`.
`--truncate`, `--tr`	Truncate input lengths to increase speed / use less memory. Default: `-1`.
`--text-truncate`	Text input truncation length: if not specified, this will default to `truncate`
`--label-truncate`	Label truncation length: if not specified, this will default to `truncate`
`--history-reversed`	Reverse the history Default: `False`.
`--history-size`, `--histsz`	Number of past dialog utterances to remember. Default: `-1`.
`--person-tokens`, `--pt`	Add person tokens to history. adds p1 in front of input text and p2 in front of past labels when available or past utterances generated by the model. these are added to the dictionary during initialization. Default: `False`.
`--split-lines`	Split the dialogue history on newlines and save in separate vectors Default: `False`.
`--delimiter`	Join history lines with this token, defaults to newline Default: `\n`.
`--special-tok-lst`	Comma separated list of special tokens. In case of ambiguous parses from special tokens, the ordering provided in this arg sets precedence.
`-gpu`, `--gpu`	Which GPU to use Default: `-1`.
`--no-cuda`	Disable GPUs even if available. otherwise, will use GPUs if available on the device. Default: `False`.

Optimizer Arguments

Argument	Description
`--optimizer`, `--opt`	Optimizer choice. Possible values: adadelta, adagrad, adam, adamw, sparseadam, adamax, asgd, sgd, radam, rprop, rmsprop, optimizer, nadam, lbfgs, mem_eff_adam, adafactor. Choices: `adadelta`, `adagrad`, `adam`, `adamw`, `sparseadam`, `adamax`, `asgd`, `sgd`, `radam`, `rprop`, `rmsprop`, `optimizer`, `nadam`, `lbfgs`, `mem_eff_adam`, `adafactor`. Default: `sgd`.
`--learningrate`, `--lr`	Learning rate Default: `1`.
`--gradient-clip`, `--clip`	Gradient clipping using l2 norm Default: `0.1`.
`--adafactor-eps`	Epsilon values for adafactor optimizer: regularization constants for square gradient and parameter scale respectively Default: `1e-30,1e-3`. Recommended: `1e-30,1e-3`.
`--momentum`, `--mom`	If applicable, momentum value for optimizer. Default: `0`.
`--nesterov`	If applicable, whether to use nesterov momentum. Default: `True`.
`--nus`, `--nu`	If applicable, nu value(s) for optimizer. can use a single value like 0.7 or a comma-separated tuple like 0.7,1.0 Default: `0.7`.
`--betas`, `--beta`	If applicable, beta value(s) for optimizer. can use a single value like 0.9 or a comma-separated tuple like 0.9,0.999 Default: `0.9,0.999`.
`--weight-decay`, `--wdecay`	Weight decay on the weights.

BPEHelper Arguments

Argument	Description
`--bpe-vocab`	Path to pre-trained tokenizer vocab
`--bpe-merge`	Path to pre-trained tokenizer merge
`--bpe-dropout`	Use BPE dropout during training.

Learning Rate Scheduler

Argument	Description
`--lr-scheduler`	Learning rate scheduler. Choices: `reduceonplateau`, `none`, `fixed`, `invsqrt`, `cosine`, `linear`. Default: `reduceonplateau`.
`--lr-scheduler-patience`	LR scheduler patience. In number of validation runs. If using fixed scheduler, LR is decayed every validations. Default: `3`.
`--lr-scheduler-decay`	Decay factor for LR scheduler, or how much LR is multiplied by when it is lowered. Default: `0.5`.
`--invsqrt-lr-decay-gamma`	Constant used only to find the lr multiplier for the invsqrt scheduler. Must be set for –lr-scheduler invsqrt Default: `-1`.

TransformerVariantAgent Options¶

optional arguments

Argument	Description
`--gpu-beam-blocking`	Set to use CUDA kernel for beam search ngram blocking Default: `False`.
`--verbose-topk`	Return the topk logits in the act message, if verbose mode is set. Default: `-1`.

Transformer Arguments

Argument	Description
`--embedding-size`, `--esz`	Size of all embedding layers. Must be a multiple of –n-heads. Default: `300`.
`--n-layers`, `--nl`	Number of transformer layers. Default: `2`.
`--ffn-size`, `--hid`	Hidden size of the FFN layers Default: `300`.
`--dropout`	Dropout used around embeddings and before layer layer normalizations. This is used in Vaswani 2017 and works well on large datasets. Default: `0.0`.
`--attention-dropout`	Dropout used after attention softmax. This is not used in Vaswani 2017. Default: `0.0`.
`--relu-dropout`	Dropout used after the ReLU in the FFN. Not used in Vaswani 2017, but used in Tensor2Tensor. Default: `0.0`.
`--n-heads`	Number of multihead attention heads Default: `2`.
`--learn-positional-embeddings`	If off, sinusoidal embeddings are used. If on, position embeddings are learned from scratch. Default: `False`.
`--embeddings-scale`	Default: `True`.
`--n-segments`	The number of segments that support the model. If zero no segment and no langs_embedding. Default: `0`.
`--variant`	Chooses locations of layer norms, etc. prelayernorm is used to match some fairseq models Choices: `xlm`, `prelayernorm`, `bart`, `aiayn`. Default: `aiayn`. Recommended: `xlm`.
`--activation`	Nonlinear activation to use. AIAYN uses relu, but more recent papers prefer gelu. Choices: `gelu`, `relu`. Default: `relu`. Recommended: `gelu`.
`--output-scaling`	Scale the output of every transformer by this quantity. Default: `1.0`.
`--share-word-embeddings`	Share word embeddings table for candidate and contextin the memory network Default: `True`.
`--n-encoder-layers`, `--nel`	This will overidde the n-layers for asymmetrical transformers Default: `-1`.
`--n-decoder-layers`, `--ndl`	This will overidde the n-layers for asymmetrical transformers Default: `-1`.
`--model-parallel`	Shard the layers across multiple GPUs. Default: `False`.
`--checkpoint-activations`	Recompute activations on backward pass to conserve memory. Default: `False`.

Torch Generator Agent

Argument	Description
`--beam-size`	Beam size, if 1 then greedy search Default: `1`.
`--beam-min-length`	Minimum length of prediction to be generated by the beam search Default: `1`.
`--beam-context-block-ngram`	Size n-grams to block in beam search from the context. val <= 0 implies no blocking Default: `-1`.
`--beam-block-ngram`	Size n-grams to block in beam search. val <= 0 implies no blocking Default: `-1`.
`--beam-block-full-context`	Block n-grams from the full history context. Specify False to block up to m tokens in the past, where m is truncation parameter for agent Default: `True`.
`--beam-length-penalty`	Applies a length penalty. Set to 0 for no penalty. Default: `0.65`.
`--inference`	Generation algorithm Choices: `beam`, `nucleus`, `delayedbeam`, `greedy`, `delayednucleusbeam`, `topk`, `factual_nucleus`. Default: `greedy`.
`--topk`	K used in Top K sampling Default: `10`.
`--topp`	P used in nucleus sampling Default: `0.9`.
`--beam-delay`	Used in delayedbeam search Default: `30`.
`--lambda-decay`	Decay factor in factual nucleus sampling Default: `0.9`.
`--omega-bound`	Lower bound in factual nucleus sampling Default: `0.3`.
`--p-reset`	Whether to reset p value in factual nucleus at full stops Default: `True`.
`--beam-block-list-filename`	Load a text file of hard blocks for beam search to never say.
`--temperature`	Temperature to add during decoding Default: `1.0`.
`--compute-tokenized-bleu`	If true, compute tokenized bleu scores Default: `False`.

TorchAgent Arguments

Argument	Description
`--interactive-mode`, `--i`	Whether in full interactive mode or not, which means generating text or retrieving from a full set of candidates, which is necessary to actually do full dialogue. However, during training or quick validation (e.g. PPL for generation or ranking a few candidates for ranking models) you might want these set to off. Typically, scripts can set their preferred default behavior at the start, e.g. eval scripts. Default: `False`.
`--embedding-type`, `--emb`	Choose between different strategies for initializing word embeddings. Default is random, but can also preinitialize from Glove or Fasttext. Preinitialized embeddings can also be fixed so they are not updated during training. Choices: `random`, `glove`, `glove-fixed`, `fasttext`, `fasttext-fixed`, `fasttext_cc`, `fasttext_cc-fixed`. Default: `random`.
`--embedding-projection`, `--embp`	If pretrained embeddings have a different dimensionality than your embedding size, strategy for projecting to the correct size. If the dimensions are the same, this is ignored unless you append “-force” to your choice. Default: `random`.
`--fp16`	Use fp16 computations. Default: `False`.
`--fp16-impl`	Implementation of FP16 to use Choices: `safe`, `mem_efficient`. Default: `safe`.
`--rank-candidates`, `--rc`	Whether the model should parse candidates for ranking. Default: `False`.
`--truncate`, `--tr`	Truncate input lengths to increase speed / use less memory. Default: `-1`.
`--text-truncate`	Text input truncation length: if not specified, this will default to `truncate`
`--label-truncate`	Label truncation length: if not specified, this will default to `truncate`
`--history-reversed`	Reverse the history Default: `False`.
`--history-size`, `--histsz`	Number of past dialog utterances to remember. Default: `-1`.
`--person-tokens`, `--pt`	Add person tokens to history. adds p1 in front of input text and p2 in front of past labels when available or past utterances generated by the model. these are added to the dictionary during initialization. Default: `False`.
`--split-lines`	Split the dialogue history on newlines and save in separate vectors Default: `False`.
`--delimiter`	Join history lines with this token, defaults to newline Default: `\n`.
`--special-tok-lst`	Comma separated list of special tokens. In case of ambiguous parses from special tokens, the ordering provided in this arg sets precedence.
`-gpu`, `--gpu`	Which GPU to use Default: `-1`.
`--no-cuda`	Disable GPUs even if available. otherwise, will use GPUs if available on the device. Default: `False`.

Optimizer Arguments

Argument	Description
`--optimizer`, `--opt`	Optimizer choice. Possible values: adadelta, adagrad, adam, adamw, sparseadam, adamax, asgd, sgd, radam, rprop, rmsprop, optimizer, nadam, lbfgs, mem_eff_adam, adafactor. Choices: `adadelta`, `adagrad`, `adam`, `adamw`, `sparseadam`, `adamax`, `asgd`, `sgd`, `radam`, `rprop`, `rmsprop`, `optimizer`, `nadam`, `lbfgs`, `mem_eff_adam`, `adafactor`. Default: `sgd`.
`--learningrate`, `--lr`	Learning rate Default: `1`.
`--gradient-clip`, `--clip`	Gradient clipping using l2 norm Default: `0.1`.
`--adafactor-eps`	Epsilon values for adafactor optimizer: regularization constants for square gradient and parameter scale respectively Default: `1e-30,1e-3`. Recommended: `1e-30,1e-3`.
`--momentum`, `--mom`	If applicable, momentum value for optimizer. Default: `0`.
`--nesterov`	If applicable, whether to use nesterov momentum. Default: `True`.
`--nus`, `--nu`	If applicable, nu value(s) for optimizer. can use a single value like 0.7 or a comma-separated tuple like 0.7,1.0 Default: `0.7`.
`--betas`, `--beta`	If applicable, beta value(s) for optimizer. can use a single value like 0.9 or a comma-separated tuple like 0.9,0.999 Default: `0.9,0.999`.
`--weight-decay`, `--wdecay`	Weight decay on the weights.

BPEHelper Arguments

Argument	Description
`--bpe-vocab`	Path to pre-trained tokenizer vocab
`--bpe-merge`	Path to pre-trained tokenizer merge
`--bpe-dropout`	Use BPE dropout during training.

Learning Rate Scheduler

Argument	Description
`--lr-scheduler`	Learning rate scheduler. Choices: `reduceonplateau`, `none`, `fixed`, `invsqrt`, `cosine`, `linear`. Default: `reduceonplateau`.
`--lr-scheduler-patience`	LR scheduler patience. In number of validation runs. If using fixed scheduler, LR is decayed every validations. Default: `3`.
`--lr-scheduler-decay`	Decay factor for LR scheduler, or how much LR is multiplied by when it is lowered. Default: `0.5`.
`--invsqrt-lr-decay-gamma`	Constant used only to find the lr multiplier for the invsqrt scheduler. Must be set for –lr-scheduler invsqrt Default: `-1`.

Output Reranker¶

How to build your own re-ranker.¶

1. Train a classifier or ranker model.¶

2. Subclass AbstractReranker¶

3. Subclass AbstractGeneratorRerankAgent¶

Case study: Classifier Re-Ranking.¶

Case study: LIGHT RPA Re-Ranking.¶

1. Train a classifier or ranker model.¶

2. Subclass AbstractReranker.¶

3. Subclass AbstractGeneratorRerankAgent¶

AbstractGeneratorRerankAgent Options¶

AbstractGpt2RerankAgent Options¶

Gpt2Agent Options¶

LongAbstractGeneratorRerankAgent Options¶

TransformerGeneratorAgent Options¶

TransformerVariantAgent Options¶

2. Subclass `AbstractReranker`¶

3. Subclass `AbstractGeneratorRerankAgent`¶

2. Subclass `AbstractReranker`.¶

3. Subclass `AbstractGeneratorRerankAgent`¶