Overview
Opik provides a set of built-in evaluation metrics that can be used to evaluate the output of your LLM calls. These metrics are broken down into two main categories:
- Heuristic metrics
- LLM as a Judge metrics
Heuristic metrics are deterministic and are often statistical in nature. LLM as a Judge metrics are non-deterministic and are based on the idea of using an LLM to evaluate the output of another LLM.
Opik provides the following built-in evaluation metrics:
You can also create your own custom metric, learn more about it in the Custom Metric section.
Customizing LLM as a Judge metrics
By default, Opik uses GPT-4o from OpenAI as the LLM to evaluate the output of other LLMs. However, you can easily switch to another LLM provider by specifying a different model
in the model_name
parameter of each LLM as a Judge metric.
This functionality is based on LiteLLM framework, you can find a full list of supported LLM providers and how to configure them in the LiteLLM Providers guide.