ContextPrecision¶

class opik.evaluation.metrics.ContextPrecision(model: str | OpikBaseModel | None = None, name: str = 'context_precision_metric', few_shot_examples: List[FewShotExampleContextPrecision] | None = None, track: bool = True, project_name: str | None = None)¶

Bases: BaseMetric

A metric that evaluates the context precision of an input-output pair using an LLM.

This metric uses a language model to assess how well the given output aligns with the provided context for the given input. It returns a score between 0.0 and 1.0, where higher values indicate better context precision.

Parameters:

model – The language model to use for evaluation. Can be a string (model name) or an opik.evaluation.models.OpikBaseModel subclass instance. opik.evaluation.models.LiteLLMChatModel is used by default.
name – The name of the metric. Defaults to “context_precision_metric”.
few_shot_examples – A list of few-shot examples to provide to the model. If None, uses the default few-shot examples.
track – Whether to track the metric. Defaults to True.
project_name – Optional project name to track the metric in for the cases when there are no parent span/trace to inherit project name from.

Example

>>> from opik.evaluation.metrics import ContextPrecision
>>> context_precision_metric = ContextPrecision()
>>> result = context_precision_metric.score("What's the capital of France?", "The capital of France is Paris.", "Paris", ["France is a country in Europe."])
>>> print(result.value)
1.0
>>> print(result.reason)
The provided output perfectly matches the expected output of 'Paris' and accurately identifies it as the capital of France. ...

score(input: str, output: str, expected_output: str, context: List[str], **ignored_kwargs: Any) → ScoreResult¶

Calculate the context precision score for the given input-output pair.

Parameters:

input – The input text to be evaluated.
output – The output text to be evaluated.
expected_output – The expected output for the given input.
context – A list of context strings relevant to the input.
**ignored_kwargs – Additional keyword arguments that are ignored.

Returns:

A ScoreResult object containing the context precision score (between 0.0 and 1.0) and a reason for the score.

Return type:

score_result.ScoreResult

async ascore(input: str, output: str, expected_output: str, context: List[str], **ignored_kwargs: Any) → ScoreResult¶

Asynchronously calculate the context precision score for the given input-output pair.

This method is the asynchronous version of score(). For detailed documentation, please refer to the score() method.

Parameters:

input – The input text to be evaluated.
output – The output text to be evaluated.
expected_output – The expected output for the given input.
context – A list of context strings relevant to the input.
**ignored_kwargs – Additional keyword arguments that are ignored.

Returns:

A ScoreResult object with the context precision score and reason.

Return type:

score_result.ScoreResult