AnswerRelevance¶
- class opik.evaluation.metrics.AnswerRelevance(model: str | OpikBaseModel | None = None, name: str = 'answer_relevance_metric', few_shot_examples: List[FewShotExampleWithContextAnswerRelevance] | None = None, few_shot_examples_no_context: List[FewShotExampleNoContextAnswerRelevance] | None = None, require_context: bool = True, track: bool = True)¶
Bases:
BaseMetric
A metric that evaluates the relevance of an answer to a given input using an LLM.
This metric uses a language model to assess how well the given output (answer) addresses the provided input (question) within the given context. It returns a score between 0.0 and 1.0, where higher values indicate better answer relevance.
- Parameters:
model – The language model to use for evaluation. Can be a string (model name) or an opik.evaluation.models.OpikBaseModel subclass instance. opik.evaluation.models.LiteLLMChatModel is used by default.
name – The name of the metric. Defaults to “AnswerRelevanceMetric”.
few_shot_examples – A list of dict to include as examples to the prompt query. Context key is required. If not provided, Opik’s generic examples will be used.
few_shot_examples_no_context – A list of dict to include as examples to the prompt query in no-context mode (so, ‘context’ key is not needed). If not provided, Opik’s generic examples will be used.
require_context – if set to False, execution in no-context mode is allowed. Default is True.
track – Whether to track the metric. Defaults to True.
Example
>>> from opik.evaluation.metrics import AnswerRelevance >>> answer_relevance_metric = AnswerRelevance() >>> result = answer_relevance_metric.score("What's the capital of France?", "The capital of France is Paris.", ["France is a country in Europe."]) >>> print(result.value) 0.9 >>> print(result.reason) The answer directly addresses the user's query by correctly identifying Paris as the capital of France. ...
- score(input: str, output: str, context: List[str] | None = None, **ignored_kwargs: Any) ScoreResult ¶
Calculate the answer relevance score for the given input-output pair.
- Parameters:
input – The input text (question) to be evaluated.
output – The output text (answer) to be evaluated.
context – A list of context strings relevant to the input. If no context is given, the metric is calculated in no-context mode (the prompt template will not refer to context at all)
**ignored_kwargs – Additional keyword arguments that are ignored.
- Returns:
A ScoreResult object containing the answer relevance score (between 0.0 and 1.0) and a reason for the score.
- Return type:
score_result.ScoreResult
- async ascore(input: str, output: str, context: List[str] | None = None, **ignored_kwargs: Any) ScoreResult ¶
Asynchronously calculate the answer relevance score for the given input-output pair.
This method is the asynchronous version of
score()
. For detailed documentation, please refer to thescore()
method.- Parameters:
input – The input text (question) to be evaluated.
output – The output text (answer) to be evaluated.
context – A list of context strings relevant to the input.
**ignored_kwargs – Additional keyword arguments that are ignored.
- Returns:
A ScoreResult object with the answer relevance score and reason.
- Return type:
score_result.ScoreResult