Export data

When working with Opik, it is important to be able to export traces and spans so that you can use them to fine-tune your models or run deeper analysis.

You can export the traces you have logged to the Opik platform using:

  1. Using the Opik SDK: You can use the Opik.search_traces and Opik.search_spans methods to export traces and spans.
  2. Using the Opik REST API: You can use the /traces and /spans endpoints to export traces and spans.
  3. Using the UI: Once you have selected the traces or spans you want to export, you can click on the Export CSV button in the Actions dropdown.

The recommended way to export traces is to use the Opik.search_traces and Opik.search_spans methods in the Opik SDK.

Using the Opik SDK

Exporting traces

The Opik.search_traces method allows you to both export all the traces in a project or search for specific traces and export them.

Exporting all traces

To export all traces, you will need to specify a max_results value that is higher than the total number of traces in your project:

1import opik
2
3client = opik.Opik()
4
5traces = client.search_traces(project_name="Default project", max_results=1000000)

Search for specific traces

You can use the filter_string parameter to search for specific traces:

1import opik
2
3client = opik.Opik()
4
5traces = client.search_traces(
6 project_name="Default project",
7 filter_string='input contains "Opik"'
8)
9
10# Convert to Dict if required
11traces = [trace.dict() for trace in traces]

The filter_string parameter should be a string in the following format:

"<COLUMN> <OPERATOR> <VALUE> [and <COLUMN> <OPERATOR> <VALUE>]*"

where:

  1. <COLUMN>: The column name to filter on, these can be:
    • name
    • input
    • output
    • start_time
    • end_time
    • metadata
    • feedback_scores
    • tags
    • usage.total_tokens
    • usage.prompt_tokens
    • usage.completion_tokens.
  2. <OPERATOR>: The operator to use for the filter, this can be =, !=, >, >=, <, <=, contains, not_contains. Not that not all operators are supported for all columns.
  3. <VALUE>: The value to use in the comparison to <COLUMN>. If the value is a string, you will need to wrap it in double quotes.

You can add as many and clauses as required.

If a <COLUMN> item refers to a nested object, then you can use the dot notation to access contained values by using its key. For example, you could use:

"feedback_scores.accuracy > 0.5"

Here are some full examples of using filter_string values in searches:

1import opik
2
3client = opik.Opik(
4 project_name="Default project"
5)
6
7# Search for traces where the input contains text
8traces = client.search_traces(
9 filter_string='input contains "Opik"'
10)
11
12# Search for traces that were logged after a specific date
13traces = client.search_traces(filter_string='start_time >= "2024-01-01T00:00:00Z"')
14
15# Search for traces that have a specific tag
16traces = client.search_traces(filter_string='tags contains "production"')
17
18# Search for traces based on the number of tokens used
19traces = client.search_traces(filter_string='usage.total_tokens > 1000')
20
21# Search for traces based on the model used
22traces = client.search_traces(filter_string='metadata.model = "gpt-4o"')

If your feedback_scores key contains spaces, you will need to wrap it in double quotes:

'feedback_score."My Score" > 0'

If the feedback_score key contains both spaces and double quotes, you will need to escape the double quotes as "":

'feedback_score."Score ""with"" Quotes" > 0'

or by using different quotes, surrounding in triple-quotes, like this:

'''feedback_scores.'Accuracy "Happy Index"' < 0.8'''

Exporting spans

You can export spans using the Opik.search_spans method. This methods allows you to search for spans based on trace_id or based on a filter string.

Exporting spans based on trace_id

To export all the spans associated with a specific trace, you can use the trace_id parameter:

1import opik
2
3client = opik.Opik()
4
5spans = client.search_spans(
6 project_name="Default project",
7 trace_id="067092dc-e639-73ff-8000-e1c40172450f"
8)

Search for specific spans

You can use the filter_string parameter to search for specific spans:

1import opik
2
3client = opik.Opik()
4
5spans = client.search_spans(
6 project_name="Default project",
7 filter_string='input contains "Opik"'
8)

The filter_string parameter should follow the same format as the filter_string parameter in the Opik.search_traces method as defined above.

Using the Opik REST API

To export traces using the Opik REST API, you can use the /traces endpoint and the /spans endpoint. These endpoints are paginated so you will need to make multiple requests to retrieve all the traces or spans you want.

To search for specific traces or spans, you can use the filter parameter. While this is a string parameter, it does not follow the same format as the filter_string parameter in the Opik SDK. Instead it is a list of json objects with the following format:

1[
2 {
3 "field": "name",
4 "type": "string",
5 "operator": "=",
6 "value": "Opik"
7 }
8]

The filter parameter was designed to be used with the Opik UI and has therefore limited flexibility. If you need more flexibility, please raise an issue on GitHub so we can help.

Using the UI

To export traces as a CSV file from the UI, you can simply select the traces or spans you wish to export and click on Export CSV in the Actions dropdown:

The UI only allows you to export up to 100 traces or spans at a time as it is linked to the page size of the traces table. If you need to export more traces or spans, we recommend using the Opik SDK.