Using Ragas to evaluate RAG pipelines
In this notebook, we will showcase how to use Opik with Ragas for monitoring and evaluation of RAG (Retrieval-Augmented Generation) pipelines.
There are two main ways to use Opik with Ragas:
- Using Ragas metrics to score traces
- Using the Ragas
evaluate
function to score a dataset
Creating an account on Comet.com
Comet provides a hosted version of the Opik platform, simply create an account and grab you API Key.
You can also run the Opik platform locally, see the installation guide for more information.
Preparing our environment
First, we will configure the OpenAI API key.
Integrating Opik with Ragas
Using Ragas metrics to score traces
Ragas provides a set of metrics that can be used to evaluate the quality of a RAG pipeline, including but not limited to: answer_relevancy
, answer_similarity
, answer_correctness
, context_precision
, context_recall
, context_entity_recall
, summarization_score
. You can find a full list of metrics in the Ragas documentation.
These metrics can be computed on the fly and logged to traces or spans in Opik. For this example, we will start by creating a simple RAG pipeline and then scoring it using the answer_relevancy
metric.
Create the Ragas metric
In order to use the Ragas metric without using the evaluate
function, you need to initialize the metric with a RunConfig
object and an LLM provider. For this example, we will use LangChain as the LLM provider with the Opik tracer enabled.
We will first start by initializing the Ragas metric:
Once the metric is initialized, you can use it to score a sample question. Given that the metric scoring is done asynchronously, you need to use the asyncio
library to run the scoring function.
If you now navigate to Opik, you will be able to see that a new trace has been created in the Default Project
project.
Score traces
You can score traces by using the update_current_trace
function.
The advantage of this approach is that the scoring span is added to the trace allowing for a more fine-grained analysis of the RAG pipeline. It will however run the Ragas metric calculation synchronously and so might not be suitable for production use-cases.
Evaluating datasets using the Opik evaluate
function
You can use Ragas metrics with the Opik evaluate
function. This will compute the metrics on all the rows of the dataset and return a summary of the results.
As Ragas metrics are only async, we will need to create a wrapper to be able to use them with the Opik evaluate
function.
Evaluating datasets using the Ragas evaluate
function
If you looking at evaluating a dataset, you can use the Ragas evaluate
function. When using this function, the Ragas library will compute the metrics on all the rows of the dataset and return a summary of the results.
You can use the OpikTracer
callback to log the results of the evaluation to the Opik platform: