Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain Identity

Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston


State-of-the-art dialogue models still often stumble with regards to factual accuracy and self-contradiction. Anecdotally, they have been observed to fail to maintain character identity throughout discourse; and more specifically, may take on the role of their interlocutor. In this work we formalize and quantify this deficiency, and show experimentally through human evaluations that this is indeed a problem. In contrast, we show that discriminative models trained specifically to recognize who is speaking can perform well; and further, these can be used as automated metrics. Finally, we evaluate a wide variety of mitigation methods, including changes to model architecture, training protocol, and decoding strategy. Our best models reduce mistaken identity issues by nearly 65% according to human annotators, while simultaneously improving engagingness. Despite these results, we find that maintaining character identity still remains a challenging problem.


Link to arXiv


RPA Classifier Training

Full Datasplit:

parlai dd -t projects.light_whoami.task.agents:WhoIsSpeakingTeacher

Left to Right:

parlai dd -t projects.light_whoami.task.agents:WhoIsSpeakingLeftToRightTeacher

RPA Evaluation of Model Responses

parlai dd -t projects.light_whoami.task.agents:ResponseClassifierTeacher

Multi-Objective Training

parlai dd -t projects.light_whoami.task.agents:MultiObjectiveTeacher

Agent Code

NOTE: Each agent specified below can be used in tandem with the long-context generator agents from the MSC project by simply adding Long in front of the final agent name. E.g., projects.light_whoami.agents.rpa_rerank:RPARerankAgent becomes projects.light_whoami.agents.rpa_rerank:LongRPARerankAgent`, and so on.

RPA Re-ranker Agents

These agents will re-rank beams from the base model according to RPA score. One must specify a --predictor-model-file pointing to an RPA Classifier.

parlai i -m projects.light_whoami.agents.rpa_rerank:RPARerankAgent \
-mf <path_to_model> --predictor-model-file <path_to_predictor_model>

If you'd like to use a predictor model file other than that used for RPA re-ranking, please see instructions here for how to implement your own re-ranker. Then, subclass the AbstractGeneratorRerankAgent, implementing the get_reranker_class method to point to your re-ranker.

PACER Agents

In addition to re-ranking the final beams according to RPA score, these models will apply ranking on partial sequences. Use the following parameters to control this level of ranking:

If you'd like to use a predictor model file other than that used for RPA Re-Ranking, simply subclass the PacerAgent and implement get_reranker_class() to return your constructed re-ranker object (see steps here).

Unlikelihood Agents

One can apply RPA Unlikelihood in order to discourage the agent from generating tokens that yield the wrong predicted speaker. This agent requires a predictor model file as well. The following parameters are important for controlling training:

Multi-Objective Agents

One can utilize the multi-objective agents to train both the generator NLL loss and a character prediction ranking loss. Important parameters:

Profile Expanded Decoder Attention

Use these agents in an "expanded" attention scenario, where a portion of the input (or something otherwise specified) is attended to in a third round of attention in the decoder (following self-attention and encoder-attention). The following parameters are useful:

To set the context from which to pull expanded attention input - --expanded-attention-input-key: Key in the teacher message to pull from for expanded attention - --expanded-attention-input-extractor-phrases: If specified, the input for expanded attention will consist only of pieces of the delimited input that contain these phrases. - --expanded-attention-num-rounds: How many rounds to apply the expanded attention.

parlai train_model -m projects.light_whoami.agents.expanded_attention:ExpandedDecoderAttentionAgent \
--init-model <path_to_init_model> ...

Automated Expanded Decoder Attention

To automatically learn what to re-attend to within the context, you can use the same agent as above, but specify --expanded-attention-type <automated_classifier/automated_trainable_mask>. For automated_trainable_mask, there are no additional parameters required. For automated_classifier, one must specify the --predictor-model-file as before.

Automated Expanded Decoder Attention + Multi-Objective Training

To leverage multi-objective training within an automated expanded attention scenario, simply set --expanded-attention-type automated_trainable_mask, and the proper agent, along with any desired multi-objective arguments from above:

parlai train_model -m projects.light_whoami.agents.expanded_attention:ExpandedDecoderAttentionAndMultiObjectiveAgent \
--expanded-attention-type automated_trainable_mask --init-model <path_to_init_model> \

Expanded Decoder Attention + RPA Re-ranking / PACER

The following agents combine expanded decoder attention with RPA Re-Ranking or PACER Re-Ranking functionality:

parlai i -m projects.light_whoami.agents.expanded_attention:ExpandedDecoderAttentionAndRPARerankerAgent \
--model-file <path_to_expanded_attention_agent> --predictor-model-file <path_to_predictor_model_file >...

parlai i -m projects.light_whoami.agents.expanded_attention:ExpandedDecoderAttentionAndPacerAgent \
--model-file <path_to_expanded_attention_agent> --predictor-model-file <path_to_predictor_model_file >...

Pre-Trained Models

The following table provides the zoo paths for the released pre-trained models (used in --model-file or --init-model):

Model | RPA | Mistaken Identity | Zoo Path ------|------------------------:| ------------------------:|------------------------:| LTR RPA Re-Ranker | - | - | zoo:light_whoami/rpa_reranker/model | 128-Truncate Vanilla Baseline | 87.61 | 6.45% | zoo:light_whoami/vanilla_128/model | 1024-Truncate Vanilla Baseline | 87.71 | 7.35% | zoo:light_whoami/vanilla_1024/model | 128-Truncate RPA Unlikelihood (Top1) | 87.48 | 7.13% | zoo:light_whoami/rpa_ul_128/model | 1024-Truncate RPA Unlikelihood (Top1) | - | - | zoo:light_whoami/rpa_ul_1024/model | Multi-Objective (Vanilla, Dec. Only) | 87.67 | 10.00% | zoo:light_whoami/multiobjective/model | Profile Expanded Attention (128, 2 rounds over ABC) | 91.70 | 4.82% | zoo:light_whoami/profile_expanded_attention_128/model | Profile Expanded Attention (1024, 2 rounds over ABCD) | 92.18 | 4.00% | zoo:light_whoami/profile_expanded_attention_1024/model | Automated Expanded Attention (1024, Classifier Attn.) | 90.93 | 5.51% | zoo:light_whoami/automated_expanded_attention_1024/model | Automated Expanded Attention + Multi-Objective (1024, Dec. Only) | 88.95 | 4.43% | zoo:light_whoami/expanded_and_multiobjective_1024/model |