Ragas
The Opik SDK provides a simple way to integrate with Ragas, a framework for evaluating RAG systems.
There are two main ways to use Ragas with Opik:
- Using Ragas to score traces or spans.
- Using Ragas to evaluate a RAG pipeline.
Getting started
You will first need to install the opik
and ragas
packages:
In addition, you can configure Opik using the opik configure
command which will prompt you for the correct local server address or if you are using the Cloud platform your API key:
Using Ragas to score traces or spans
Ragas provides a set of metrics that can be used to evaluate the quality of a RAG pipeline, a full list of the supported metrics can be found in the Ragas documentation.
In addition to being able to track these feedback scores in Opik, you can also use the OpikTracer
callback to keep track of the score calculation in Opik.
Due to the asynchronous nature of the score calculation, we will need to define a coroutine to compute the score:
Once the compute_metric
function is defined, you can use it to score a trace or span:
In the Opik UI, you will be able to see the full trace including the score calculation:

Using Ragas metrics to evaluate a RAG pipeline
In order to use a Ragas metric within the Opik evaluation framework, we will need to wrap it in a custom scoring method. In the example below we will:
- Define the Ragas metric
- Create a scoring metric wrapper
- Use the scoring metric wrapper within the Opik evaluation framework
1. Define the Ragas metric
We will start by defining the Ragas metric, in this example we will use AnswerRelevancy
:
2. Create a scoring metric wrapper
Once we have this metric, we will need to create a wrapper to be able to use it with the Opik evaluate
function. As Ragas is an async framework, we will need to use asyncio
to run the score calculation:
If you are running within a Jupyter notebook, you will need to add the following line to the top of your notebook:
3. Use the scoring metric wrapper within the Opik evaluation framework
You can now use the scoring metric wrapper within the Opik evaluation framework: