Skip to main content

Overview

Opik provides a set of built-in evaluation metrics that can be used to evaluate the output of your LLM calls. These metrics are broken down into two main categories:

  1. Heuristic metrics
  2. LLM as a Judge metrics

Heuristic metrics are deterministic and are often statistical in nature. LLM as a Judge metrics are non-deterministic and are based on the idea of using an LLM to evaluate the output of another LLM.

Opik provides the following built-in evaluation metrics:

MetricTypeDescriptionDocumentation
EqualsHeuristicChecks if the output exactly matches an expected stringEquals
ContainsHeuristicCheck if the output contains a specific substring, can be both case sensitive or case insensitiveContains
RegexMatchHeuristicChecks if the output matches a specified regular expression patternRegexMatch
IsJsonHeuristicChecks if the output is a valid JSON objectIsJson
LevenshteinHeuristicCalculates the Levenshtein distance between the output and an expected stringLevenshtein
HallucinationLLM as a JudgeCheck if the output contains any hallucinationsHallucination
ModerationLLM as a JudgeCheck if the output contains any harmful contentModeration
AnswerRelevanceLLM as a JudgeCheck if the output is relevant to the questionAnswerRelevance
ContextRecallLLM as a JudgeCheck if the output contains any hallucinationsContextRecall
ContextPrecisionLLM as a JudgeCheck if the output contains any hallucinationsContextPrecision

You can also create your own custom metric, learn more about it in the Custom Metric section.