Tasks

List of ParlAI tasks defined in the file task_list.py.

They consist of: (1) QA tasks; (2) Cloze tasks; (3) Goal tasks; (4) ChitChat tasks; (5) Negotiation tasks; and (6) Visual tasks.

QA Tasks

AQuA Dataset containing algebraic word problems with rationales for their answers. From Ling et. al. 2017, Link: https://arxiv.org/pdf/1705.04146.pdf [ task:aqua tags:#AQuA, #All, #QA ]

bAbI 1k 20 synthetic tasks that each test a unique aspect of text and reasoning, and hence test different capabilities of learning models. From Weston et al. ‘16. Link: http://arxiv.org/abs/1502.05698 [ task:babi:All1k tags:#bAbI-1k, #All, #QA ]

bAbI 10k 20 synthetic tasks that each test a unique aspect of text and reasoning, and hence test different capabilities of learning models. From Weston et al. ‘16. Link: http://arxiv.org/abs/1502.05698 [ task:babi:All10k tags:#bAbI-10k, #All, #QA ]

MCTest Questions about short children’s stories, from Richardson et al. ‘13. Link: https://www.microsoft.com/en-us/research/publication/mctest-challenge-dataset-open-domain-machine-comprehension-text/ [ task:mctest tags:#MCTest, #All, #QA ]

Movie Dialog QA Closed-domain QA dataset asking templated questions about movies, answerable from Wikipedia, similar to WikiMovies. From Dodge et al. ‘15. Link: https://arxiv.org/abs/1511.06931 [ task:moviedialog:Task:1 tags:#MovieDD-QA, #All, #QA, #MovieDD ]

Movie Dialog Recommendations Questions asking for movie recommendations. From Dodge et al. ‘15. Link: https://arxiv.org/abs/1511.06931 [ task:moviedialog:Task:2 tags:#MovieDD-Recs, #All, #QA, #MovieDD ]

MTurk WikiMovies Closed-domain QA dataset asking MTurk-derived questions about movies, answerable from Wikipedia. From Li et al. ‘16. Link: https://arxiv.org/abs/1611.09823 [ task:mturkwikimovies tags:#MTurkWikiMovies, #All, #QA ]

NarrativeQA A dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. From Kočiský et. al. ‘17. Link: https://arxiv.org/abs/1712.07040‘ [ task:narrative_qa tags:#NarrativeQA, #All, #QA ]

Simple Questions Open-domain QA dataset based on Freebase triples from Bordes et al. ‘15. Link: https://arxiv.org/abs/1506.02075 [ task:simplequestions tags:#SimpleQuestions, #All, #QA ]

SQuAD2 Open-domain QA dataset answerable from a given paragraph from Wikipedia, from Rajpurkar & Jia et al. ‘18. Link: http://arxiv.org/abs/1806.03822 [ task:squad2 tags:#SQuAD2, #All, #QA ]

SQuAD Open-domain QA dataset answerable from a given paragraph from Wikipedia, from Rajpurkar et al. ‘16. Link: https://arxiv.org/abs/1606.05250 [ task:squad tags:#SQuAD, #All, #QA ]

TriviaQA Open-domain QA dataset with question-answer-evidence triples, from Joshi et al. ‘17. Link: https://arxiv.org/abs/1705.03551 [ task:triviaqa tags:#TriviaQA, #All, #QA ]

Web Questions Open-domain QA dataset from Web queries from Berant et al. ‘13. Link: http://www.aclweb.org/anthology/D13-1160 [ task:webquestions tags:#WebQuestions, #All, #QA ]

WikiMovies Closed-domain QA dataset asking templated questions about movies, answerable from Wikipedia. From Miller et al. ‘16. Link: https://arxiv.org/abs/1606.03126 [ task:wikimovies tags:#WikiMovies, #All, #QA ]

WikiQA Open domain QA from Wikipedia dataset from Yang et al. ‘15. Link: https://www.microsoft.com/en-us/research/publication/wikiqa-a-challenge-dataset-for-open-domain-question-answering/ [ task:wikiqa tags:#WikiQA, #All, #QA ]

InsuranceQA Task which requires agents to identify high quality answers composed by professionals with deep domain knowledge. From Feng et al. ‘15. Link: https://arxiv.org/abs/1508.01585 [ task:insuranceqa tags:#InsuranceQA, #All, #QA ]

MS_MARCO A large scale Machine Reading Comprehension Dataset with questions sampled from real anonymized user queries and contexts from web documents. From Nguyen et al. ‘16. Link: https://arxiv.org/abs/1611.09268 [ task:ms_marco tags:#MS_MARCO, #All, #QA ]

QAngaroo Reading Comprehension with Multiple Hop. Including two datasets: WIKIHOP built on on wikipedia, MEDHOP built on paper abstracts from PubMed. Link to dataset: http://qangaroo.cs.ucl.ac.uk/ [ task:qangaroo tags:#QAngaroo, #All, #QA ]

Cloze Tasks

BookTest Sentence completion given a few sentences as context from a book. A larger version of CBT. From Bajgar et al., 16. Link: https://arxiv.org/abs/1610.00956 [ task:booktest tags:#BookTest, #All, #Cloze ]

Children’s Book Test (CBT) Sentence completion given a few sentences as context from a children’s book. From Hill et al., ‘16. Link: https://arxiv.org/abs/1511.02301 [ task:cbt tags:#CBT, #All, #Cloze ]

QA CNN Cloze dataset based on a missing (anonymized) entity phrase from a CNN article, Hermann et al. ‘15. Link: https://arxiv.org/abs/1506.03340 [ task:qacnn tags:#QACNN, #All, #Cloze ]

QA Daily Mail Cloze dataset based on a missing (anonymized) entity phrase from a Daily Mail article, Hermann et al. ‘15. Link: https://arxiv.org/abs/1506.03340 [ task:qadailymail tags:#QADailyMail, #All, #Cloze ]

Goal Tasks

Dialog Based Language Learning: bAbI Task Short dialogs based on the bAbI tasks, but in the form of a question from a teacher, the answer from the student, and finally a comment on the answer from the teacher. The aim is to find learning models that use the comments to improve. From Weston ‘16. Link: https://arxiv.org/abs/1604.06045. Tasks can be accessed with a format like: ‘python examples/display_data.py -t dbll_babi:task:2_p0.5’ which specifies task 2, and policy with 0.5 answers correct, see the paper for more details of the tasks. [ task:dbll_babi tags:#DBLL-bAbI, #All, #Goal ]

Dialog Based Language Learning: WikiMovies Task Short dialogs based on WikiMovies, but in the form of a question from a teacher, the answer from the student, and finally a comment on the answer from the teacher. The aim is to find learning models that use the comments to improve. From Weston ‘16. Link: https://arxiv.org/abs/1604.06045 [ task:dbll_movie tags:#DBLL-Movie, #All, #Goal ]

Dialog bAbI Simulated dialogs of restaurant booking, from Bordes et al. ‘16. Link: https://arxiv.org/abs/1605.07683 [ task:dialog_babi tags:#dialog-bAbI, #All, #Goal ]

Dialog bAbI+ bAbI+ is an extension of the bAbI Task 1 dialogues with everyday incremental dialogue phenomena (hesitations, restarts, and corrections) which model the disfluencies and communication problems in everyday spoken interaction in real-world environments. See https://www.researchgate.net/publication/319128941_Challenging_Neural_Dialogue_Models_with_Natural_Data_Memory_Networks_Fail_on_Incremental_Phenomena,http://aclweb.org/anthology/D17-1235 [ task:dialog_babi_plus tags:#dialog-bAbI-plus, #All, #Goal ]

MutualFriends Task where two agents must discover which friend of theirs is mutual based on the friends’s attributes. From He He et al. ‘17. Link: https://stanfordnlp.github.io/cocoa/ [ task:mutualfriends tags:#MutualFriends, #All, #Goal ]

Movie Dialog QA Recommendations Dialogs discussing questions about movies as well as recommendations. From Dodge et al. ‘15. Link: https://arxiv.org/abs/1511.06931 [ task:moviedialog:Task:3 tags:#MovieDD-QARecs, #All, #Goal, #MovieDD ]

Personalized Dialog Full Set Simulated dataset of restaurant booking focused on personalization based on user profiles. From Joshi et al. ‘17. Link: https://arxiv.org/abs/1706.07503 [ task:personalized_dialog:AllFull tags:#personalized-dialog-full, #All, #Goal, #Personalization ]

Personalized Dialog Small Set Simulated dataset of restaurant booking focused on personalization based on user profiles. From Joshi et al. ‘17. Link: https://arxiv.org/abs/1706.07503 [ task:personalized_dialog:AllSmall tags:#personalized-dialog-small, #All, #Goal, #Personalization ]

Task N’ Talk Dataset of synthetic shapes described by attributes, for agents to play a cooperative QA game, from Kottur et al. ‘17. Link: https://arxiv.org/abs/1706.08502 [ task:taskntalk tags:#TaskNTalk, #All, #Goal ]

SCAN SCAN is a set of simple language-driven navigation tasks for studying compositional learning and zero-shot generalization. The SCAN tasks were inspired by the CommAI environment, which is the origin of the acronym (Simplified versions of the CommAI Navigation tasks). See the paper: https://arxiv.org/abs/1711.00350 or data: https://github.com/brendenlake/SCAN [ task:scan tags:#SCAN, #Goal, #All ]

ChitChat Tasks

Cornell Movie Fictional conversations extracted from raw movie scripts. Danescu-Niculescu-Mizil & Lee, ‘11. Link: https://arxiv.org/abs/1106.3077 [ task:cornell_movie tags:#CornellMovie, #All, #ChitChat ]

Movie Dialog Reddit Dialogs discussing Movies from Reddit (the Movies SubReddit). From Dodge et al. ‘15. Link: https://arxiv.org/abs/1511.06931 [ task:moviedialog:Task:4 tags:#MovieDD-Reddit, #All, #ChitChat, #MovieDD ]

Open Subtitles Dataset of dialogs from movie scripts. Version 2018: http://opus.lingfil.uu.se/OpenSubtitles2018.php, version 2009: http://opus.lingfil.uu.se/OpenSubtitles.php. A variant of the dataset used in Vinyals & Le ‘15, https://arxiv.org/abs/1506.05869. [ task:opensubtitles tags:#OpenSubtitles, #All, #ChitChat ]

Ubuntu Dialogs between an Ubuntu user and an expert trying to fix issue, from Lowe et al. ‘15. Link: https://arxiv.org/abs/1506.08909 [ task:ubuntu tags:#Ubuntu, #All, #ChitChat ]

ConvAI2 A chit-chat dataset based on PersonaChat (https://arxiv.org/abs/1801.07243) for a NIPS 2018 competition. Link: http://convai.io/. [ task:convai2 tags:#ConvAI2, #All, #ChitChat ]

ConvAI_ChitChat Human-bot dialogues containing free discussions of randomly chosen paragraphs from SQuAD. Link to dataset: http://convai.io/data/ [ task:convai_chitchat tags:#ConvAI_ChitChat, #All, #ChitChat ]

Persona-Chat A chit-chat dataset where paired Turkers are given assigned personas and chat to try to get to know each other. See the paper: https://arxiv.org/abs/1801.07243 [ task:personachat tags:#Persona-Chat, #ChitChat, #All ]

Twitter Twitter data from: https://github.com/Marsan-Ma/chat_corpus/. No train/valid/test split was provided so 10k for valid and 10k for test was chosen at random. [ task:twitter tags:#Twitter, #All, #ChitChat ]

ConvAI2_wild_evaluation Dataset collected during the wild evaluation of ConvaAI2 participants bots (http://convai.io). 60% train, 20% valid and 20% test is chosen at random from the whole dataset. [ task:convai2_wild_evaluation tags:#ConvAI2_wild_evaluation, #All, #ChitChat ]

Image_Chat 202k dialogues and 401k utterances over 202k images from the YFCC100m dataset(https://multimediacommons.wordpress.com/yfcc100m-core-dataset/)using 215 possible personality traitssee https://klshuster.github.io/image_chat/ for more information. [ task:image_chat tags:#Image_Chat, #All, #Visual, #ChitChat ]

Wizard_of_Wikipedia A dataset with conversations directly grounded with knowledge retrieved from Wikipedia. Contains 201k utterances from 22k dialogues spanning over 1300 diverse topics, split into train, test, and valid sets. The test and valid sets are split into two sets each: one with overlapping topics with the train set, and one with unseen topics.See https://arxiv.org/abs/1811.01241 for more information. [ task:wizard_of_wikipedia tags:#Wizard_of_Wikipedia, #All, #ChitChat ]

Negotiation Tasks

Deal or No Deal End-to-end negotiation task which requires two agents to agree on how to divide a set of items, with each agent assigning different values to each item. From Lewis et al. ‘17. Link: https://arxiv.org/abs/1706.05125 [ task:dealnodeal tags:#DealNoDeal, #All, #Negotiation ]

Visual Tasks

FVQA The FVQA, a VQA dataset which requires, and supports, much deeper reasoning. We extend a conventional visual question answering dataset, which contains image-question-answer triplets, through additional image-question-answer-supporting fact tuples. The supporting fact is represented as a structural triplet, such as <Cat,CapableOf,ClimbingTrees>. Link: https://arxiv.org/abs/1606.05433 [ task:fvqa tags:#FVQA, #All, #Visual ]

VQAv1 Open-ended question answering about visual content. From Agrawal et al. ‘15. Link: https://arxiv.org/abs/1505.00468 [ task:vqa_v1 tags:#VQAv1, #All, #Visual ]

VQAv2 Bigger, more balanced version of the original VQA dataset. From Goyal et al. ‘16. Link: https://arxiv.org/abs/1612.00837 [ task:vqa_v2 tags:#VQAv2, #All, #Visual ]

VisDial Task which requires agents to hold a meaningful dialog about visual content. From Das et al. ‘16. Link: https://arxiv.org/abs/1611.08669 [ task:visdial tags:#VisDial, #All, #Visual ]

MNIST_QA Task which requires agents to identify which number they are seeing. From the MNIST dataset. [ task:mnist_qa tags:#MNIST_QA, #All, #Visual ]

CLEVR A visual reasoning dataset that tests abilities such as attribute identification, counting, comparison, spatial relationships, and logical operations. From Johnson et al. ‘16. Link: https://arxiv.org/abs/1612.06890 [ task:clevr tags:#CLEVR, #All, #Visual ]

nlvr Cornell Natural Language Visual Reasoning (NLVR) is a language grounding dataset based on pairs of natural language statements grounded in synthetic images. From Suhr et al. ‘17. Link: http://lic.nlp.cornell.edu/nlvr/ [ task:nlvr tags:#nlvr, #All, #Visual ]

Flickr30k 30k captioned images pulled from Flickr compiled by UIUC: http://web.engr.illinois.edu/~bplumme2/Flickr30kEntities/. Based off of these papers: https://arxiv.org/abs/1505.04870v2, http://aclweb.org/anthology/Q14-1006 [ task:flickr30k tags:#Flickr30k, #All, #Visual ]

COCO_Captions COCO annotations derived from the 2015 COCO Caption Competition. Link to dataset: http://cocodataset.org/#download [ task:coco_caption tags:#COCO_Captions, #All, #Visual ]

Personality_Captions 200k images from the YFCC100m dataset (https://multimediacommons.wordpress.com/yfcc100m-core-dataset/), with captions conditioned on one of 215 personalities. See https://arxiv.org/abs/1810.10665 for more information. [ task:personality_captions tags:#Personality_Captions, #All, #Visual ]

Image_Chat 202k dialogues and 401k utterances over 202k images from the YFCC100m dataset(https://multimediacommons.wordpress.com/yfcc100m-core-dataset/)using 215 possible personality traitssee https://klshuster.github.io/image_chat/ for more information. [ task:image_chat tags:#Image_Chat, #All, #Visual, #ChitChat ]