Dataset

class opik.Dataset(name: str, description: str | None, rest_client: OpikApi)

Bases: object

__init__(name: str, description: str | None, rest_client: OpikApi) None

A Dataset object. This object should not be created directly, instead use opik.Opik.create_dataset() or opik.Opik.get_dataset().

property name: str

The name of the dataset.

property description: str | None

The description of the dataset.

insert(items: Sequence[Dict[str, Any]]) None

Insert new items into the dataset.

Parameters:

items – List of dicts (which will be converted to dataset items) to add to the dataset.

update(items: List[Dict[str, Any]]) None

Update existing items in the dataset.

Parameters:

items – List of DatasetItem objects to update in the dataset. You need to provide the full item object as it will override what has been supplied previously.

Raises:

DatasetItemUpdateOperationRequiresItemId – If any item in the list is missing an id.

delete(items_ids: List[str]) None

Delete items from the dataset.

Parameters:

items_ids – List of item ids to delete.

clear() None

Delete all items from the given dataset.

to_pandas() pd.DataFrame

Requires: pandas library to be installed.

Convert the dataset to a pandas DataFrame.

Returns:

A pandas DataFrame containing all items in the dataset.

to_json() str

Convert the dataset to a JSON string.

Returns:

A JSON string representation of all items in the dataset.

get_items(nb_samples: int | None = None) List[Dict[str, Any]]

Retrieve a fixed set number of dataset items dictionaries.

Parameters:

nb_samples – The number of samples to retrieve. If not set - all items are returned.

Returns:

A list of dictionries objects representing the samples.

insert_from_json(json_array: str, keys_mapping: Dict[str, str] | None = None, ignore_keys: List[str] | None = None) None
Parameters:
  • json_array – json string of format: “[{…}, {…}, {…}]” where every dictionary is to be transformed into dataset item

  • keys_mapping – dictionary that maps json keys to item fields names Example: {‘Expected output’: ‘expected_output’}

  • ignore_keys – if your json dicts contain keys that are not needed for DatasetItem construction - pass them as ignore_keys argument

read_jsonl_from_file(file_path: str, keys_mapping: Dict[str, str] | None = None, ignore_keys: List[str] | None = None) None

Read JSONL from a file and insert it into the dataset.

Parameters:
  • file_path – Path to the JSONL file

  • keys_mapping – dictionary that maps json keys to item fields names Example: {‘Expected output’: ‘expected_output’}

  • ignore_keys – if your json dicts contain keys that are not needed for DatasetItem construction - pass them as ignore_keys argument

insert_from_pandas(dataframe: pd.DataFrame, keys_mapping: Dict[str, str] | None = None, ignore_keys: List[str] | None = None) None

Requires: pandas library to be installed.

Parameters:
  • dataframe – pandas dataframe

  • keys_mapping – Dictionary that maps dataframe column names to dataset item field names. Example: {‘Expected output’: ‘expected_output’}

  • ignore_keys – if your dataframe contains columns that are not needed for DatasetItem construction - pass them as ignore_keys argument