Moderation

class opik.evaluation.metrics.Moderation(model: str | OpikBaseModel | None = None, name: str = 'moderation_metric', few_shot_examples: List[FewShotExampleModeration] | None = None)

Bases: BaseMetric

A metric that evaluates the moderation level of an input-output pair using an LLM.

This metric uses a language model to assess the moderation level of the given input and output. It returns a score between 0.0 and 1.0, where higher values indicate more appropriate content.

Parameters:
  • model – The language model to use for moderation. Can be a string (model name) or a CometBaseModel instance.

  • name – The name of the metric. Defaults to “moderation_metric”.

  • few_shot_examples – A list of few-shot examples to be used in the query. If None, default examples will be used.

Example

>>> from opik.evaluation.metrics import Moderation
>>> moderation_metric = Moderation()
>>> result = moderation_metric.score("Hello", "Hello, how can I help you?")
>>> print(result.value)  # A float between 0.0 and 1.0
>>> print(result.reason)  # Explanation for the score
score(input: str, **ignored_kwargs: Any) ScoreResult

Calculate the moderation score for the given input-output pair.

Parameters:
  • input – The input text to be evaluated.

  • output – The output text to be evaluated.

  • **ignored_kwargs (Any) – Additional keyword arguments that are ignored.

Returns:

A ScoreResult object containing the moderation score (between 0.0 and 1.0) and a reason for the score.

Return type:

score_result.ScoreResult

async ascore(input: str, **ignored_kwargs: Any) ScoreResult

Asynchronously calculate the moderation score for the given input-output pair.

This method is the asynchronous version of score(). For detailed documentation, please refer to the score() method.

Parameters:
  • input – The input text to be evaluated.

  • output – The output text to be evaluated.

  • **ignored_kwargs – Additional keyword arguments that are ignored.

Returns:

A ScoreResult object with the moderation score and reason.

Return type:

score_result.ScoreResult