Dataset¶
- class opik.Dataset(name: str, description: str | None, rest_client: OpikApi)¶
Bases:
object
- __init__(name: str, description: str | None, rest_client: OpikApi) None ¶
A Dataset object. This object should not be created directly, instead use
opik.Opik.create_dataset()
oropik.Opik.get_dataset()
.
- property name: str¶
The name of the dataset.
- property description: str | None¶
The description of the dataset.
- insert(items: Sequence[Dict[str, Any]]) None ¶
Insert new items into the dataset.
- Parameters:
items – List of dicts (which will be converted to dataset items) to add to the dataset.
- update(items: List[Dict[str, Any]]) None ¶
Update existing items in the dataset.
- Parameters:
items – List of DatasetItem objects to update in the dataset. You need to provide the full item object as it will override what has been supplied previously.
- Raises:
DatasetItemUpdateOperationRequiresItemId – If any item in the list is missing an id.
- delete(items_ids: List[str]) None ¶
Delete items from the dataset.
- Parameters:
items_ids – List of item ids to delete.
- clear() None ¶
Delete all items from the given dataset.
- to_pandas() pd.DataFrame ¶
Requires: pandas library to be installed.
Convert the dataset to a pandas DataFrame.
- Returns:
A pandas DataFrame containing all items in the dataset.
- to_json() str ¶
Convert the dataset to a JSON string.
- Returns:
A JSON string representation of all items in the dataset.
- get_items(nb_samples: int | None = None) List[Dict[str, Any]] ¶
Retrieve a fixed set number of dataset items dictionaries.
- Parameters:
nb_samples – The number of samples to retrieve. If not set - all items are returned.
- Returns:
A list of dictionries objects representing the samples.
- insert_from_json(json_array: str, keys_mapping: Dict[str, str] | None = None, ignore_keys: List[str] | None = None) None ¶
- Parameters:
json_array – json string of format: “[{…}, {…}, {…}]” where every dictionary is to be transformed into dataset item
keys_mapping – dictionary that maps json keys to item fields names Example: {‘Expected output’: ‘expected_output’}
ignore_keys – if your json dicts contain keys that are not needed for DatasetItem construction - pass them as ignore_keys argument
- read_jsonl_from_file(file_path: str, keys_mapping: Dict[str, str] | None = None, ignore_keys: List[str] | None = None) None ¶
Read JSONL from a file and insert it into the dataset.
- Parameters:
file_path – Path to the JSONL file
keys_mapping – dictionary that maps json keys to item fields names Example: {‘Expected output’: ‘expected_output’}
ignore_keys – if your json dicts contain keys that are not needed for DatasetItem construction - pass them as ignore_keys argument
- insert_from_pandas(dataframe: pd.DataFrame, keys_mapping: Dict[str, str] | None = None, ignore_keys: List[str] | None = None) None ¶
Requires: pandas library to be installed.
- Parameters:
dataframe – pandas dataframe
keys_mapping – Dictionary that maps dataframe column names to dataset item field names. Example: {‘Expected output’: ‘expected_output’}
ignore_keys – if your dataframe contains columns that are not needed for DatasetItem construction - pass them as ignore_keys argument