Python SDK for Artifacts Overview¶
Artifacts live in a Comet Workspace and are identified by their name. Each artifact can have multiple versions identified by their version string number.
How to add an asset to an Artifact¶
To log an artifact, you need to first create an Artifact() instance. When you create such an Artifact instance and don't provide an artifact version number string, a new version will be automatically created for you. If it is the first time you have logged an Artifact with this name in this particular Workspace, it will receive the version string number "1.0.0". Otherwise, it will receive the next major version number. For example, if you log a new version of an artifact that currently has a version of "2.5.14", then the new version number will be "3.0.0".
After creating an Artifact instance, you then can add asset files or a
remote URL to the Artifact. When you are ready to send the Artifact to
the cloud, you will log it with
Experiment.log_artifact(ARTIFACT).
You can also add aliases when creating a new Artifact() with the
aliases=["alias1", "alias2"]
argument.
Let's take a look at a specific example.
NOTE: all of these examples assume that you have set your Comet API key via one of the methods. See Python Configuration for more information.
```python from comet_ml import Artifact, Experiment
experiment = Experiment() artifact = Artifact("artifact-name", "dataset") artifact.add("./local-file")
experiment.log_artifact(artifact) experiment.end() ```
In the above example, we create an Artifact with the name "artifact-name" and type "dataset". These are completely arbitrary strings. However, it would be useful to you to name the artifacts in a way that will make sense to you. Typical artifact types could be "dataset", "image", "training-data", "validation-data", "testing-data", etc.
You can update all the Artifact attributes before logging the artifact object:
```python import datetime from comet_ml import Artifact, Experiment experiment = Experiment() artifact = Artifact("artifact-name", "dataset")
artifact.name = "my-specific-artifact-name" artifact.artifact_type = "training-dataset" artifact.metadata.update({"current_date": datetime.datetime.utcnow().isoformat()}) artifact.version = "1.4.5" artifact.aliases |= {"staging"} # Aliases are stored a set artifact.tags |= {"customer:1"} # Tags are stored a set ```
How to add a remote asset to an Artifact¶
Sometimes you might want to log a reference to an asset rather than the asset itself. For example, consider that you have a very large dataset (say, hundreds of gigabytes) that lives in an S3 storage bucket. In this case, it would make sense to log this as a "remote" asset. A remote asset URI can be any string; no particular format is expected.
```python from comet_ml import Artifact, Experiment
experiment = Experiment() artifact = Artifact("artifact-name", "artifact-type") artifact.add_remote( "s3://bucket/dir/train.csv", )
experiment.log_artifact(artifact) experiment.end() ```
How to get a Logged Artifact Version¶
You can retrieve a logged artifact from any workspace that you have permission to access, and a workspace name with the Experiment.get_artifact() method:
python
logged_artifact = experiment.get_artifact(NAME, WORKSPACE, version_or_alias=VERSION_OR_ALIAS)
You can retrieve a logged artifact in three ways in the Python SDK:
- Get the latest artifact version by leaving out the
version
andalias
arguments - Get a specific artifact version by passing the
version
argument - Get an aliased artifact version by passing the
alias
argument
The Experiment.assets
attribute contains all the logged assets for a
given artifact version. You can distinguish between remote and
non-remote assets using the remote
attribute of each asset.
```python from comet_ml import Experiment
experiment = Experiment() logged_artifact = experiment.get_artifact( "artifact-name", WORKSPACE, )
for asset in logged_artifact.assets: if asset.remote: print(asset.link) else: print(asset.logical_path) print(asset.size) print(asset.metadata) print(asset.asset_type) print(asset.id) print(asset.artifact_version_id) print(asset.artifact_id) ```
How to download a Logged Artifact¶
Downloading a logged artifact gives you all of the non-remote assets on your local disk. This will also record that the new experiment has accessed the artifact, for tracking the data flow in your pipeline.
```python from comet_ml import Experiment
experiment = Experiment() logged_artifact = experiment.get_artifact( "artifact-name", WORKSPACE, )
Download the artifact:¶
local_artifact = logged_artifact.download("/data/input") for asset in local_artifact.assets: if asset.remote: print(asset.link) else: print(asset.logical_path) print(asset.size) print(asset.metadata) print(asset.asset_type) print(asset.id) print(asset.artifact_version_id) print(asset.artifact_id) ```
This will download only non-remote assets. You can access remote
assets through the assets
attribute of the logged artifact object and
retrieve a remote asset link through the link
attribute.
Update an Artifact Version¶
Here is how you can retrieve an existing artifact version, add a new file, compute the new version and log it:
```python from comet_ml import Experiment
experiment = Experiment()
logged_artifact = experiment.get_artifact("artifact-name", WORKSPACE)
local_artifact = logged_artifact.download("/data/input")
local_artifact.add("./new-file") local_artifact.version = logged_artifact.version.next_minor()
experiment.log_artifact(local_artifact) ```
See Also¶
Some related topics:
- Artifacts User Interface
- Artifact: the class to use when assembling artifacts to log
- ArtifactAsset: the artifact asset class when logging assets
- LoggedArtifact: the type of artifact returned from Experiment.log_artifact()
- LoggedArtifactAsset: the logged artifact asset class when accessing logged assets