Using Opik with Langchain
For this guide, we will be performing a text to sql query generation task using LangChain. We will be using the Chinook database which contains the SQLite database of a music store with both employee, customer and invoice data.
We will highlight three different parts of the workflow:
- Creating a synthetic dataset of questions
- Creating a LangChain chain to generate SQL queries
- Automating the evaluation of the SQL queries on the synthetic dataset
Creating an account on Comet.com
Comet provides a hosted version of the Opik platform, simply create an account and grab you API Key.
You can also run the Opik platform locally, see the installation guide for more information.
Preparing our environment
First, we will download the Chinook database and set up our different API keys.
Creating a synthetic dataset
In order to create our synthetic dataset, we will be using the OpenAI API to generate 20 different questions that a user might ask based on the Chinook database.
In order to ensure that the OpenAI API calls are being tracked, we will be using the track_openai
function from the opik
library.
Now that we have our synthetic dataset, we can create a dataset in Comet and insert the questions into it.
Since the insert methods in the SDK deduplicates items, we can insert 20 items and if the items already exist, Opik will automatically remove them.
Creating a LangChain chain
We will be using the create_sql_query_chain
function from the langchain
library to create a SQL query to answer the question.
We will be using the OpikTracer
class from the opik
library to ensure that the LangChan trace are being tracked in Comet.
Automating the evaluation
In order to ensure our LLM application is working correctly, we will test it on our synthetic dataset.
For this we will be using the evaluate
function from the opik
library. We will evaluate the application using a custom metric that checks if the SQL query is valid.
The evaluation results are now uploaded to the Opik platform and can be viewed in the UI.