Generating Model Cards¶
Author: Wendy Zhang
What is a model card?¶
Think of model cards as a condensed medical card for models. It is a great way for people who might not have the time to read a paper in detail to get the gist of what a model is doing, the datasets involved, how it is performing, and any concerns that the author might have about the model.
You can check out the Model Cards for Model Reporting paper, and here’s a sample model card for the Blenderbot2.0 2.7B model. In addition, here is a link to some more model card examples.
The Process¶
There are two steps in generating the model cards.
For both steps, we should specify the following arguments:
--model-file / -mf
: the model file--folder-to-save / -fts
: the location where we’re saving reports
and add the command --mode gen
to signify we’re in (report) generation mode.
Step 1: Generating reports¶
In general, we can use a command like this for report generation:
# template
parlai gmc -mf <model file> -fts <folder name> --mode gen
# sample
parlai gmc -mf zoo:dialogue_safety/multi_turn/model -fts safety_single --mode gen
However, depending on the situation, we might need to add these arguments as well:
--wrapper / -w
only if the model is a generation modelcheck the safety bench for more info
--model-type / -mt
only if the model isn’t added to or already inmodel_list.py
possible choices include
ranker
,generator
,classifier
,retriever
--task / -t
and--evaltask/-et
only if the original model.opt used task/datasets not in the form of a teacher or if the task/dataset is no longer accessibletasks starting with
fromfile
orjsonfile
will be ignored unless--ignore-unfound-tasks
is set to False (by default, it’s true)
In addition, if the model itself needs certain arguments (ie. --search-server
), we should specify them at this stage too. We can also add --batchsize
for faster generation.
Check out the section about generating reports for more information on the report generation process and how to generate single reports (very useful for debugging).
Step 2: Model Card Generation¶
If some kind of model description has already been added to the model_list.py (distinguished by path
, which should be the same as model_file
), and reports were sucessfully generated in the step before, then we can simply run the following command
# template
parlai gmc -mf <model file> -fts <folder to save>
# example
parlai gmc -mf zoo:dialogue_safety/multi_turn/model -fts safety_multi
Examples¶
Here are some samples commands:
Dialogue Safety (multi-turn)
parlai gmc -mf zoo:dialogue_safety/multi_turn/model -fts safety_multi -bs 128 --mode gen -t dialogue_safety:wikiToxicComments,dialogue_safety:adversarial:round-only=False:round=1,dialogue_safety:multiturn -et dialogue_safety:wikiToxicComments,dialogue_safety:adversarial:round-only=False:round=1,dialogue_safety:multiturn --data-parallel False
parlai gmc -mf zoo:dialogue_safety/multi_turn/model -fts safety_multi
Blenderbot 90M
parlai gmc -mf zoo:blender/blender_90M/model -fts blenderbot_90M -w blenderbot_90M -bs 128 --mode gen
parlai gmc -mf zoo:blender/blender_90M/model -fts blenderbot_90M
Report Generation Details¶
In the end, it should generate the following reports under the --folder-to-save
a folder
data_stats/
that contains the data stats of the training seta
eval_results.json
that contains the evaluation results based on the evaltasksa
sample.json
file contain a sample input and output from the modelfor generators, it should generate a folder
safety_bench_res
that contains the safety_bench results (click here to learn more about the safety bench).
Here are some images of the expected behavior:
Successful generations should end with a green message like this:
Unsucessful generations should tell us which reports are missing and why.
When tasks are dropped due to being unaccessible or in a
fromfile
orjsonfile
format, it should look like this (w/o the blackout)
Generating single reports¶
Sometimes, you might want to generate only certain reports. In this case, instead of using --mode gen
, we should use following possibilites:
--mode gen:data_stats
to generate thedata_stats/
folder--mode gen:eval
to generate theeval_results.json
file (evaluation results)--mode gen:safety
to generate thesafety_bench_res
folder--mode gen:sample
to generate thesample.json
file
Optional Customizations¶
Use
--evaluation-report-file
to specify the location of your own evaluation report file.Use
--mode editing/final
to specify which mode you would like to use for model card generation.Currently, there are two different modes
editing
orfinal
for step 2. For theediting
mode, the code will generate messages like this::warning: missing section name: Probably need to be grabbed from paper & added to model_list.py by u (the creator) :warning:
In
final
mode, such messages will not exist. By default, themode
isediting
.
Using --extra-args-path
¶
We can use --extra-args-path
to pass in longer arguments. By default, the ---extra-args-path
will be <folder-to-save>/args.json
, so if we create a file at that location, we don’t need to add args.json
.
Adding Custom Dataset and Model Info¶
By default, the code will try to find a sections in model_list.py
. However, instead of changing model_list.py
, we can also pass in a .json
file to --extra-args-path
with out new section. Here’s us trying to add the intended use section
# args.json
{
"extra_models": {
"zoo:blender/blender_90M/model": {
# section name (lowercased and underscores removed): section content
"privacy": "Our model is intended for research purposes only, and is not yet production ready...."
}
}
}
Similarly, if we don’t want to touch task_list.py
(information about the tasks), we can also pass the details via --extra-args-path
. Here’s us trying add a description for dummy_task
:
# args.json
{
"extra_tasks": {
"dummy_task": {
# type of info: info
"description": "This is a dummy task, not a real task"
}
}
}
The information passed via this method can partially overwrite what’s written in task_list.py
and model_list.py
.
Add Custom Sections or Changing Section Order¶
To add sections, there’s two ways to do this.
After we generate the inital model card, we can directly edit the generated markdown file.
If there’s a lot section movement or deletion, use add a
user_sections
key to specify the entire section order to the.json
file that we pass to--extra-args-path
. For instance, this is the default order of sections:section_list = [ "model_details", "model_details:_quick_usage", "model_details:_sample_input_and_output", "intended_use", "limitations", "privacy", "datasets_used", "evaluation", "extra_analysis", "related_paper", "hyperparameters", "feedback", ]
Note that adding
:_
implies that it’s a subsection, and I would advise to use underscore_
in place of spaces (don’t worry; they’ll be changed back to spaces for the section title).Here’s us trying to to reverse the order and remove the model_details section (for kudos):
# args.json { "user_sections": [ "feedback", "hyperparameters", "related_paper", "extra_analysis", "evaluation", "datasets_used", "privacy", "limitations", "intended_use" ] }