Language Model

The language model is an agent which trains an RNN on a language modeling task. It is adapted from the language model featured in Pytorch’s examples repo here.

Basic Examples

Train a language model with embedding size 128 and sequence length 30 on the Persona-Chat task.

python examples/train_model.py -m language_model -t personachat -esz 128 -sl 30 -mf /tmp/LM_personachat_test.mdl

After training, load and evaluate that model on the Persona-Chat test set.

python examples/eval_model.py -m language_model -t personachat -mf /tmp/LM_personachat_test.mdl -dt test

DictionaryAgent Options

BPEHelper Arguments

--bpe-vocab

Path to pre-trained tokenizer vocab

--bpe-merge

Path to pre-trained tokenizer merge

LanguageModelAgent Options

Language Model Arguments

--init-model

Load dict/features/weights/opts from this file

-hs, --hiddensize

Size of the hidden layers

Default: 200.

-esz, --embeddingsize

Size of the token embeddings

Default: 200.

-nl, --numlayers

Number of hidden layers

Default: 2.

-dr, --dropout

Dropout rate

Default: 0.2.

-clip, --gradient-clip

Gradient clipping

Default: 0.25.

--no-cuda

Disable GPUs even if available

Default: False.

-rnn, --rnn-class

Type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)

Default: LSTM.

-sl, --seq-len

Sequence length

Default: 35.

-tied, --emb-tied

Tie the word embedding and softmax weights

Default: False.

-seed, --random-seed

Random seed

Default: 1111.

--gpu

Which GPU device to use

Default: -1.

-tr, --truncate-pred

Truncate predictions

Default: 50.

-rf, --report-freq

Report frequency of prediction during eval

Default: 0.1.

-pt, --person-tokens

Append person1 and person2 tokens to text

Default: True.

-lr, --learningrate

Initial learning rate

Default: 20.

-lrf, --lr-factor

Mutliply learning rate by this factor when the validation loss does not decrease

Default: 1.0.

-lrp, --lr-patience

Wait before decreasing learning rate

Default: 10.

-lrm, --lr-minimum

Minimum learning rate

Default: 0.1.

-sm, --sampling-mode

Sample when generating tokens instead of taking the max and do not produce UNK token (when bs=1)

Default: False.

BPEHelper Arguments

--bpe-vocab

Path to pre-trained tokenizer vocab

--bpe-merge

Path to pre-trained tokenizer merge