Overview
Opik provides a set of built-in evaluation metrics that can be used to evaluate the output of your LLM calls. These metrics are broken down into two main categories:
- Heuristic metrics
- LLM as a Judge metrics
Heuristic metrics are deterministic and are often statistical in nature. LLM as a Judge metrics are non-deterministic and are based on the idea of using an LLM to evaluate the output of another LLM.
Opik provides the following built-in evaluation metrics:
Metric | Type | Description | Documentation |
---|---|---|---|
Equals | Heuristic | Checks if the output exactly matches an expected string | Equals |
Contains | Heuristic | Check if the output contains a specific substring, can be both case sensitive or case insensitive | Contains |
RegexMatch | Heuristic | Checks if the output matches a specified regular expression pattern | RegexMatch |
IsJson | Heuristic | Checks if the output is a valid JSON object | IsJson |
Levenshtein | Heuristic | Calculates the Levenshtein distance between the output and an expected string | Levenshtein |
Hallucination | LLM as a Judge | Check if the output contains any hallucinations | Hallucination |
Moderation | LLM as a Judge | Check if the output contains any harmful content | Moderation |
AnswerRelevance | LLM as a Judge | Check if the output is relevant to the question | AnswerRelevance |
ContextRecall | LLM as a Judge | Check if the output contains any hallucinations | ContextRecall |
ContextPrecision | LLM as a Judge | Check if the output contains any hallucinations | ContextPrecision |
You can also create your own custom metric, learn more about it in the Custom Metric section.