Moderation¶

class opik.evaluation.metrics.Moderation(model: str | OpikBaseModel | None = None, name: str = 'moderation_metric', few_shot_examples: List[FewShotExampleModeration] | None = None, track: bool = True, project_name: str | None = None)¶

Bases: BaseMetric

A metric that evaluates the moderation level of an input-output pair using an LLM.

This metric uses a language model to assess the moderation level of the given input and output. It returns a score between 0.0 and 1.0, where higher values indicate more appropriate content.

Parameters:

model – The language model to use for moderation. Can be a string (model name) or an opik.evaluation.models.OpikBaseModel subclass instance. opik.evaluation.models.LiteLLMChatModel is used by default.
name – The name of the metric. Defaults to “moderation_metric”.
few_shot_examples – A list of few-shot examples to be used in the query. If None, default examples will be used.
track – Whether to track the metric. Defaults to True.
project_name – Optional project name to track the metric in for the cases when there are no parent span/trace to inherit project name from.

Example

>>> from opik.evaluation.metrics import Moderation
>>> moderation_metric = Moderation()
>>> result = moderation_metric.score("Hello, how can I help you?")
>>> print(result.value)  # A float between 0.0 and 1.0
>>> print(result.reason)  # Explanation for the score

score(output: str, **ignored_kwargs: Any) → ScoreResult¶

Calculate the moderation score for the given input-output pair.

Parameters:

output – The output text to be evaluated.
**ignored_kwargs (Any) – Additional keyword arguments that are ignored.

Returns:

A ScoreResult object containing the moderation score (between 0.0 and 1.0) and a reason for the score.

Return type:

score_result.ScoreResult

async ascore(output: str, **ignored_kwargs: Any) → ScoreResult¶

Asynchronously calculate the moderation score for the given input-output pair.

This method is the asynchronous version of score(). For detailed documentation, please refer to the score() method.

Parameters:

output – The output text to be evaluated.
**ignored_kwargs – Additional keyword arguments that are ignored.

Returns:

A ScoreResult object with the moderation score and reason.

Return type:

score_result.ScoreResult