Using Opik with Langchain
For this guide, we will be performing a text to sql query generation task using LangChain. We will be using the Chinook database which contains the SQLite database of a music store with both employee, customer and invoice data.
We will highlight three different parts of the workflow:
- Creating a synthetic dataset of questions
- Creating a LangChain chain to generate SQL queries
- Automating the evaluation of the SQL queries on the synthetic dataset
Creating an account on Comet.com
Comet provides a hosted version of the Opik platform, simply create an account and grab you API Key.
You can also run the Opik platform locally, see the installation guide for more information.
%pip install --upgrade --quiet opik langchain langchain-community langchain-openai
import opik
opik.configure(use_local=False)
Preparing our environment
First, we will download the Chinook database and set up our different API keys.
import os
import getpass
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
# Download the relevant data
import os
from langchain_community.utilities import SQLDatabase
import requests
import os
url = "https://github.com/lerocha/chinook-database/raw/master/ChinookDatabase/DataSources/Chinook_Sqlite.sqlite"
filename = "./data/chinook/Chinook_Sqlite.sqlite"
folder = os.path.dirname(filename)
if not os.path.exists(folder):
os.makedirs(folder)
if not os.path.exists(filename):
response = requests.get(url)
with open(filename, "wb") as file:
file.write(response.content)
print("Chinook database downloaded")
db = SQLDatabase.from_uri(f"sqlite:///{filename}")
Creating a synthetic dataset
In order to create our synthetic dataset, we will be using the OpenAI API to generate 20 different questions that a user might ask based on the Chinook database.
In order to ensure that the OpenAI API calls are being tracked, we will be using the track_openai
function from the opik
library.
from opik.integrations.openai import track_openai
from openai import OpenAI
import json
os.environ["OPIK_PROJECT_NAME"] = "langchain-integration-demo"
client = OpenAI()
openai_client = track_openai(client)
prompt = """
Create 20 different example questions a user might ask based on the Chinook Database.
These questions should be complex and require the model to think. They should include complex joins and window functions to answer.
Return the response as a json object with a "result" key and an array of strings with the question.
"""
completion = openai_client.chat.completions.create(
model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}]
)
print(completion.choices[0].message.content)
Now that we have our synthetic dataset, we can create a dataset in Comet and insert the questions into it.
Since the insert methods in the SDK deduplicates items, we can insert 20 items and if the items already exist, Opik will automatically remove them.
# Create the synthetic dataset
import opik
synthetic_questions = json.loads(completion.choices[0].message.content)["result"]
client = opik.Opik()
dataset = client.get_or_create_dataset(name="synthetic_questions")
dataset.insert([{"question": question} for question in synthetic_questions])
Creating a LangChain chain
We will be using the create_sql_query_chain
function from the langchain
library to create a SQL query to answer the question.
We will be using the OpikTracer
class from the opik
library to ensure that the LangChan trace are being tracked in Comet.
# Use langchain to create a SQL query to answer the question
from langchain.chains import create_sql_query_chain
from langchain_openai import ChatOpenAI
from opik.integrations.langchain import OpikTracer
opik_tracer = OpikTracer(tags=["simple_chain"])
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
chain = create_sql_query_chain(llm, db).with_config({"callbacks": [opik_tracer]})
response = chain.invoke({"question": "How many employees are there ?"})
response
print(response)
Automating the evaluation
In order to ensure our LLM application is working correctly, we will test it on our synthetic dataset.
For this we will be using the evaluate
function from the opik
library. We will evaluate the application using a custom metric that checks if the SQL query is valid.
from opik import Opik, track
from opik.evaluation import evaluate
from opik.evaluation.metrics import base_metric, score_result
from typing import Any
class ValidSQLQuery(base_metric.BaseMetric):
def __init__(self, name: str, db: Any):
self.name = name
self.db = db
def score(self, output: str, **ignored_kwargs: Any):
# Add you logic here
try:
db.run(output)
return score_result.ScoreResult(
name=self.name, value=1, reason="Query ran successfully"
)
except Exception as e:
return score_result.ScoreResult(name=self.name, value=0, reason=str(e))
valid_sql_query = ValidSQLQuery(name="valid_sql_query", db=db)
client = Opik()
dataset = client.get_dataset("synthetic_questions")
@track()
def llm_chain(input: str) -> str:
response = chain.invoke({"question": input})
return response
def evaluation_task(item):
response = llm_chain(item["question"])
return {"output": response}
res = evaluate(
experiment_name="SQL question answering",
dataset=dataset,
task=evaluation_task,
scoring_metrics=[valid_sql_query],
nb_samples=20,
)
The evaluation results are now uploaded to the Opik platform and can be viewed in the UI.