October 8, 2024
OpenAI’s Python API is quickly becoming one of the most-downloaded Python packages. With…
Comet Artifacts is a new set of tools that provides ML teams a convenient way to log, version, and browse data from all parts of their experimentation pipelines.
Machine learning typically involves experimenting with different models, hyperparameters and different versions of datasets.
In addition to the metrics and parameters that are being measured and tested, machine learning also involves keeping track of the inputs and outputs produced by an experiment. An experiment run can produce all sorts of interesting output data. These data artifacts can be files containing model predictions, model weights, and much more.
Often, the outputs from one experiment can be used as the inputs for other experiments—this can become complex to track without the right structure or a single source of truth.
We built Comet Artifacts to solve these specific challenges.
An Artifact is a versioned object, where each version is an immutable snapshot of files & assets, arranged in a folder-like logical structure. This snapshot can be tracked using metadata, a version number, tags, and aliases. A version tracks which experiments consumed it, and which experiment produced it.
This means that with Artifacts, you can structure your experiments as multi-stage pipelines or DAGs (Directed Acyclic Graphs), and ensure centralized, managed and versioned access to any of the intermediate data produced in the process.
Specifically, Artifacts enable you and your team to:
It takes only 3 lines of code to register an Artifact of any size in Comet:
artifact = Artifact("artifact-name", "dataset") artifact.add("path/to/my/file.csv") experiment.log_artifact(artifact)
And then just 2 lines of code to download and use a logged Artifact in an Experiment:
logged_artifact = experiment.get_artifact("artifact-name") local_artifact = logged_artifact.download()
For a deeper dive into working with Artifacts checkout these additional resources: