Haystack

Haystack is an open-source framework for building production-ready LLM applications, retrieval-augmented generative pipelines and state-of-the-art search systems that work intelligently over large document collections.

Opik integrates with Haystack to log traces for all Haystack pipelines.

Getting started

First, ensure you have both opik and haystack-ai installed:

$ pip install opik haystack-ai

In addition, you can configure Opik using the opik configure command which will prompt you for the correct local server address or if you are using the Cloud platform your API key:

{pytest_codeblocks_skip=true}

$ opik configure

Logging Haystack pipeline runs

To log a Haystack pipeline run, you can use the OpikConnector. This connector will log the pipeline run to the Opik platform and add a tracer key to the pipeline run response with the trace ID:

1 import os
2 
3 os.environ["HAYSTACK_CONTENT_TRACING_ENABLED"] = "true"
4 
5 from haystack import Pipeline
6 from haystack.components.builders import ChatPromptBuilder
7 from haystack.components.generators.chat import OpenAIChatGenerator
8 from haystack.dataclasses import ChatMessage
9 
10 from opik.integrations.haystack import OpikConnector
11 
12 pipe = Pipeline()
13 
14 # Add the OpikConnector component to the pipeline
15 pipe.add_component(
16     "tracer", OpikConnector("Chat example")
17 )
18 
19 # Continue building the pipeline
20 pipe.add_component("prompt_builder", ChatPromptBuilder())
21 pipe.add_component("llm", OpenAIChatGenerator(model="gpt-3.5-turbo"))
22 
23 pipe.connect("prompt_builder.prompt", "llm.messages")
24 
25 messages = [
26     ChatMessage.from_system(
27         "Always respond in German even if some input data is in other languages."
28     ),
29     ChatMessage.from_user("Tell me about {{location}}"),
30 ]
31 
32 response = pipe.run(
33     data={
34         "prompt_builder": {
35             "template_variables": {"location": "Berlin"},
36             "template": messages,
37         }
38     }
39 )
40 
41 print(response["llm"]["replies"][0])

Each pipeline run will now be logged to the Opik platform:

In order to ensure the traces are correctly logged, make sure you set the environment variable HAYSTACK_CONTENT_TRACING_ENABLED to true before running the pipeline.

Advanced usage

Disabling automatic flushing of traces

By default the OpikConnector will flush the trace to the Opik platform after each component in a thread blocking way. As a result, you may want to disable flushing the data after each component by setting the HAYSTACK_OPIK_ENFORCE_FLUSH environent variable to false.

In order to make sure that all traces are logged to the Opik platform before you exit a script, you can use the flush method:

1 from opik.integrations.haystack import OpikConnector
2 from haystack.tracing import tracer
3 from haystack import Pipeline
4 
5 pipe = Pipeline()
6 
7 # Add the OpikConnector component to the pipeline
8 pipe.add_component(
9     "tracer", OpikConnector("Chat example")
10 )
11 
12 # Pipeline definition
13 tracer.actual_tracer.flush()

Disabling this feature may result in data loss if the program crashes before the data is sent to Opik. Make sure you will call the flush() method explicitly before the program exits.

Updating logged traces

The OpikConnector returns the logged trace ID in the pipeline run response. You can use this ID to update the trace with feedback scores or other metadata:

{pytest_codeblocks_skip=true}

1 import opik
2 
3 response = pipe.run(
4     data={
5         "prompt_builder": {
6             "template_variables": {"location": "Berlin"},
7             "template": messages,
8         }
9     }
10 )
11 
12 # Get the trace ID from the pipeline run response
13 trace_id = response["tracer"]["trace_id"]
14 
15 # Log the feedback score
16 opik_client = opik.Opik()
17 opik_client.log_traces_feedback_scores([
18     {"id": trace_id, "name": "user-feedback", "value": 0.5}
19 ])