AnswerRelevance

class opik.evaluation.metrics.AnswerRelevance(model: str | OpikBaseModel | None = None, name: str = 'answer_relevance_metric', few_shot_examples: List[FewShotExampleAnswerRelevance] | None = None)

Bases: BaseMetric

A metric that evaluates the relevance of an answer to a given input using an LLM.

This metric uses a language model to assess how well the given output (answer) addresses the provided input (question) within the given context. It returns a score between 0.0 and 1.0, where higher values indicate better answer relevance.

Parameters:
  • model – The language model to use for evaluation. Can be a string (model name) or an opik.evaluation.models.OpikBaseModel subclass instance. opik.evaluation.models.LiteLLMChatModel is used by default.

  • name – The name of the metric. Defaults to “AnswerRelevanceMetric”.

Example

>>> from opik.evaluation.metrics import AnswerRelevance
>>> answer_relevance_metric = AnswerRelevance()
>>> result = answer_relevance_metric.score("What's the capital of France?", "The capital of France is Paris.", ["France is a country in Europe."])
>>> print(result.value)
0.9
>>> print(result.reason)
The answer directly addresses the user's query by correctly identifying Paris as the capital of France. ...
score(input: str, output: str, context: List[str], **ignored_kwargs: Any) ScoreResult

Calculate the answer relevance score for the given input-output pair.

Parameters:
  • input – The input text (question) to be evaluated.

  • output – The output text (answer) to be evaluated.

  • context – A list of context strings relevant to the input.

  • **ignored_kwargs – Additional keyword arguments that are ignored.

Returns:

A ScoreResult object containing the answer relevance score (between 0.0 and 1.0) and a reason for the score.

Return type:

score_result.ScoreResult

async ascore(input: str, output: str, context: List[str], **ignored_kwargs: Any) ScoreResult

Asynchronously calculate the answer relevance score for the given input-output pair.

This method is the asynchronous version of score(). For detailed documentation, please refer to the score() method.

Parameters:
  • input – The input text (question) to be evaluated.

  • output – The output text (answer) to be evaluated.

  • context – A list of context strings relevant to the input.

  • **ignored_kwargs – Additional keyword arguments that are ignored.

Returns:

A ScoreResult object with the answer relevance score and reason.

Return type:

score_result.ScoreResult