parlai.core.metrics¶
Provides standard metric evaluations for dialog, as well as an aggregator.
-
class
parlai.core.metrics.
MetricDisplayData
(title, description)[source]¶ Bases:
tuple
-
title
¶ Alias for field number 0
-
description
¶ Alias for field number 1
-
-
class
parlai.core.metrics.
Metric
[source]¶ Bases:
abc.ABC
Base class for storing metrics.
Subclasses should define .value(). Examples are provided for each subclass.
-
property
is_global
¶ Indicates whether this metric should be reported globally or per-task.
-
property
macro_average
¶ Indicates whether this metric should be macro-averaged when globally reported.
-
classmethod
many
(*objs: List[Union[List[Union[int, float, torch.Tensor]], torch.Tensor]]) → List[parlai.core.metrics.Metric][source]¶ Construct many of a Metric from the base parts.
Useful if you separately compute numerators and denominators, etc.
-
classmethod
from_mask
(metric_per_token: torch.Tensor, mask: torch.Tensor) → List[parlai.core.metrics.Metric][source]¶ From token-level metrics, returns an aggregate MyMetric per example in the batch.
- Parameters
metric_per_token – a (batchsize x num_tokens) Tensor
mask – a (batchsize x num_tokens) Tensor to mask out tokens that should not be considered in the aggregate metric calculation.
- Returns
a (batchsize) Tensor
-
property
-
class
parlai.core.metrics.
FixedMetric
(value: Union[int, float, torch.Tensor])[source]¶ Bases:
parlai.core.metrics.Metric
Fixed metrics are verified to be the same when combined, or throw an error.
FixedMetric is used for things like total_train_updates, which should not be combined across different multitasks or different workers.
-
class
parlai.core.metrics.
SumMetric
(sum_: Union[int, float, torch.Tensor] = 0)[source]¶ Bases:
parlai.core.metrics.Metric
Class that keeps a running sum of some metric.
Examples of SumMetric include things like “exs”, the number of examples seen since the last report, which depends exactly on a teacher.
-
class
parlai.core.metrics.
AverageMetric
(numer: Union[int, float, torch.Tensor], denom: Union[int, float, torch.Tensor] = 1)[source]¶ Bases:
parlai.core.metrics.Metric
Class that keeps a running average of some metric.
Examples of AverageMetrics include hits@1, F1, accuracy, etc. These metrics all have per-example values that can be directly mapped back to a teacher.
-
property
macro_average
¶ Indicates whether this metric should be macro-averaged when globally reported.
-
property
-
class
parlai.core.metrics.
MacroAverageMetric
(metrics: Dict[str, parlai.core.metrics.Metric])[source]¶ Bases:
parlai.core.metrics.Metric
Class that represents the macro average of several numbers.
Used for aggregating task level metrics. It is only used for things that are AverageMetrics already.
-
class
parlai.core.metrics.
TimerMetric
(value: Union[int, float, torch.Tensor], start_time: Optional[float] = None, end_time: Optional[float] = None)[source]¶ Bases:
parlai.core.metrics.Metric
A timer metric keep tracks of the first/last times it was used.
-
class
parlai.core.metrics.
GlobalMetric
[source]¶ Bases:
object
A global metric is one that should not be aggregated across different tasks.
Examples of global metric include things like learning rate and updates. These need to be accumulated or averaged over multiple parleys, but cannot be correlated with a single task.
Key to it is the notion that any one worker or any one task already has a global view of the value, and so no combinations should be done. Note this is different then a FixedMetric, in that a GlobalMetric can be still averaged across multiple parleys(), but a FixedMetric is always fixed.
-
class
parlai.core.metrics.
GlobalFixedMetric
(value: Union[int, float, torch.Tensor])[source]¶ Bases:
parlai.core.metrics.GlobalMetric
,parlai.core.metrics.FixedMetric
Global fixed metric.
Used for things like total_train_updates.
-
class
parlai.core.metrics.
GlobalSumMetric
(sum_: Union[int, float, torch.Tensor] = 0)[source]¶ Bases:
parlai.core.metrics.GlobalMetric
,parlai.core.metrics.SumMetric
Global sum metric.
Used for ‘exs’ and ‘updates’.
-
class
parlai.core.metrics.
GlobalAverageMetric
(numer: Union[int, float, torch.Tensor], denom: Union[int, float, torch.Tensor] = 1)[source]¶ Bases:
parlai.core.metrics.GlobalMetric
,parlai.core.metrics.AverageMetric
Global Average metric.
Used for things like learning rate, and many agent-specific metrics.
-
class
parlai.core.metrics.
LegacyMetric
(numer: Union[int, float, torch.Tensor], denom: Union[int, float, torch.Tensor] = 1)[source]¶ Bases:
parlai.core.metrics.GlobalAverageMetric
Legacy Metrics are reported by agent as float.
-
class
parlai.core.metrics.
GlobalTimerMetric
(value: Union[int, float, torch.Tensor], start_time: Optional[float] = None, end_time: Optional[float] = None)[source]¶ Bases:
parlai.core.metrics.GlobalMetric
,parlai.core.metrics.TimerMetric
-
class
parlai.core.metrics.
F1Metric
(numer: Union[int, float, torch.Tensor], denom: Union[int, float, torch.Tensor] = 1)[source]¶ Bases:
parlai.core.metrics.AverageMetric
Helper class which computes token-level F1.
-
class
parlai.core.metrics.
ExactMatchMetric
(numer: Union[int, float, torch.Tensor], denom: Union[int, float, torch.Tensor] = 1)[source]¶
-
class
parlai.core.metrics.
BleuMetric
(numer: Union[int, float, torch.Tensor], denom: Union[int, float, torch.Tensor] = 1)[source]¶
-
class
parlai.core.metrics.
FairseqBleuMetric
(pred: Union[torch.Tensor, List[int]], ref: Union[torch.Tensor, List[int]], pad_idx: int, eos_idx: int, unk_idx: int, order: int)[source]¶ Bases:
parlai.core.metrics.Metric
Re-implementation of https://github.com/pytorch/fairseq/blob/main/fairseq/scoring/bleu.py.
-
__init__
(pred: Union[torch.Tensor, List[int]], ref: Union[torch.Tensor, List[int]], pad_idx: int, eos_idx: int, unk_idx: int, order: int)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
property
macro_average
¶ Indicates whether this metric should be macro-averaged when globally reported.
-
-
class
parlai.core.metrics.
RougeMetric
(numer: Union[int, float, torch.Tensor], denom: Union[int, float, torch.Tensor] = 1)[source]¶ Bases:
parlai.core.metrics.AverageMetric
-
static
compute_many
(guess: str, answers: List[str], measure: str = 'r') → Tuple[Optional[parlai.core.metrics.RougeMetric], Optional[parlai.core.metrics.RougeMetric], Optional[parlai.core.metrics.RougeMetric]][source]¶ Compute ROUGE score between guess and any answer.
Done with compute_many due to increased efficiency.
- Returns
(rouge-1, rouge-2, rouge-L)
-
static
-
class
parlai.core.metrics.
IntraDistinctMetric
(numer: Union[int, float, torch.Tensor], denom: Union[int, float, torch.Tensor] = 1)[source]¶ Bases:
parlai.core.metrics.AverageMetric
Compute intra-distinct (per-utterance).
-
class
parlai.core.metrics.
InterDistinctMetric
(counts: Counter[Tuple])[source]¶ Bases:
parlai.core.metrics.Metric
Compute inter-distinct metric over corpus-level.
-
parlai.core.metrics.
normalize_answer
(s)[source]¶ Lower text and remove punctuation, articles and extra whitespace.
-
parlai.core.metrics.
aggregate_named_reports
(named_reports: Dict[str, Dict[str, parlai.core.metrics.Metric]], micro_average: bool = False) → Dict[str, parlai.core.metrics.Metric][source]¶ Aggregate metrics from multiple reports.
- Parameters
reports – Dict of tasks -> metrics.
micro_average – If true, top level metrics will be the micro average. By default, we use macro average.
- Returns
The aggregated report
-
parlai.core.metrics.
aggregate_unnamed_reports
(reports: List[Dict[str, parlai.core.metrics.Metric]]) → Dict[str, parlai.core.metrics.Metric][source]¶ Combines metrics without regard for tracking provenence.
-
class
parlai.core.metrics.
Metrics
(threadsafe=False, shared=None)[source]¶ Bases:
object
Metrics aggregator.
-
__init__
(threadsafe=False, shared=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
parlai.core.metrics.
TeacherMetrics
(metrics_list: str = 'default', shared: Optional[Dict[str, Any]] = None)[source]¶ Bases:
parlai.core.metrics.Metrics
Helper container which encapsulates standard metrics (F1, BLEU, …).