Update existing experiment | Opik Documentation

Sometimes you may want to update an existing experiment with new scores, or update existing scores for an experiment. You can do this using the evaluate_experiment function.

This function will re-run the scoring metrics on the existing experiment items and update the scores:

{pytest_codeblocks_skip=true}

1 from opik.evaluation import evaluate_experiment
2 from opik.evaluation.metrics import Hallucination
3 
4 hallucination_metric = Hallucination()
5 
6 # Replace "my-experiment" with the name of your experiment which can be found in the Opik UI
7 evaluate_experiment(experiment_name="my-experiment", scoring_metrics=[hallucination_metric])

The evaluate_experiment function can be used to update existing scores for an experiment. If you use a scoring metric with the same name as an existing score, the scores will be updated with the new values.

Example

Create an experiment

Suppose you are building a chatbot and want to compute the hallucination scores for a set of example conversations. For this you would create a first experiment with the evaluate function:

1 from opik import Opik, track
2 from opik.evaluation import evaluate
3 from opik.evaluation.metrics import Hallucination
4 from opik.integrations.openai import track_openai
5 import openai
6 
7 # Define the task to evaluate
8 openai_client = track_openai(openai.OpenAI())
9 
10 MODEL = "gpt-3.5-turbo"
11 
12 @track
13 def your_llm_application(input: str) -> str:
14     response = openai_client.chat.completions.create(
15         model=MODEL,
16         messages=[{"role": "user", "content": input}],
17     )
18 
19     return response.choices[0].message.content
20 
21 # Define the evaluation task
22 def evaluation_task(x):
23     return {
24         "output": your_llm_application(x['input'])
25     }
26 
27 # Create a simple dataset
28 client = Opik()
29 dataset = client.get_or_create_dataset(name="Existing experiment dataset")
30 dataset.insert([
31     {"input": "What is the capital of France?"},
32     {"input": "What is the capital of Germany?"},
33 ])
34 
35 # Define the metrics
36 hallucination_metric = Hallucination()
37 
38 
39 evaluation = evaluate(
40     experiment_name="Existing experiment example",
41     dataset=dataset,
42     task=evaluation_task,
43     scoring_metrics=[hallucination_metric],
44     experiment_config={
45         "model": MODEL
46     }
47 )
48 
49 experiment_name = evaluation.experiment_name
50 print(f"Experiment name: {experiment_name}")

Learn more about the evaluate function in our LLM evaluation guide.

Update the experiment

Once the first experiment is created, you realise that you also want to compute a moderation score for each example. You could re-run the experiment with new scoring metrics but this means re-running the output. Instead, you can simply update the experiment with the new scoring metrics:

{pytest_codeblocks_skip=true}

1 from opik.evaluation import evaluate_experiment
2 from opik.evaluation.metrics import Moderation
3 
4 moderation_metric = Moderation()
5 
6 evaluate_experiment(experiment_name="already_existing_experiment", scoring_metrics=[moderation_metric])