Answer relevance
The Answer Relevance metric allows you to evaluate how relevant and appropriate the LLM’s response is to the given input question or prompt. To assess the relevance of the answer, you will need to provide the LLM input (question or prompt) and the LLM output (generated answer). Unlike the Hallucination metric, the Answer Relevance metric focuses on the appropriateness and pertinence of the response rather than factual accuracy.
You can use the AnswerRelevance
metric as follows:
Asynchronous scoring is also supported with the ascore
scoring method.
Detecting answer relevance
Opik uses an LLM as a Judge to detect answer relevance, for this we have a prompt template that is used to generate the prompt for the LLM. By default, the gpt-4o
model is used to detect hallucinations but you can change this to any model supported by LiteLLM by setting the model
parameter. You can learn more about customizing models in the Customize models for LLM as a Judge metrics section.
The template uses a few-shot prompting technique to detect answer relevance. The template is as follows: