Moderation

class opik.evaluation.metrics.Moderation(model: str | OpikBaseModel | None = None, name: str = 'moderation_metric', few_shot_examples: List[FewShotExampleModeration] | None = None, track: bool = True)

Bases: BaseMetric

A metric that evaluates the moderation level of an input-output pair using an LLM.

This metric uses a language model to assess the moderation level of the given input and output. It returns a score between 0.0 and 1.0, where higher values indicate more appropriate content.

Parameters:
  • model – The language model to use for moderation. Can be a string (model name) or an opik.evaluation.models.OpikBaseModel subclass instance. opik.evaluation.models.LiteLLMChatModel is used by default.

  • name – The name of the metric. Defaults to “moderation_metric”.

  • few_shot_examples – A list of few-shot examples to be used in the query. If None, default examples will be used.

  • track – Whether to track the metric. Defaults to True.

Example

>>> from opik.evaluation.metrics import Moderation
>>> moderation_metric = Moderation()
>>> result = moderation_metric.score("Hello, how can I help you?")
>>> print(result.value)  # A float between 0.0 and 1.0
>>> print(result.reason)  # Explanation for the score
score(output: str, **ignored_kwargs: Any) ScoreResult

Calculate the moderation score for the given input-output pair.

Parameters:
  • output – The output text to be evaluated.

  • **ignored_kwargs (Any) – Additional keyword arguments that are ignored.

Returns:

A ScoreResult object containing the moderation score (between 0.0 and 1.0) and a reason for the score.

Return type:

score_result.ScoreResult

async ascore(output: str, **ignored_kwargs: Any) ScoreResult

Asynchronously calculate the moderation score for the given input-output pair.

This method is the asynchronous version of score(). For detailed documentation, please refer to the score() method.

Parameters:
  • output – The output text to be evaluated.

  • **ignored_kwargs – Additional keyword arguments that are ignored.

Returns:

A ScoreResult object with the moderation score and reason.

Return type:

score_result.ScoreResult