October 8, 2024
OpenAI’s Python API is quickly becoming one of the most-downloaded Python packages. With…
When conducting machine learning (ML) experiments, often you’re in an ML hackathon or you’re building ML solutions for an organization. You will probably want to make sure you keep track of some pieces of the block that are involved in that experiment. For example, the dataset used, model types, etc. Also, you might want to track granular information such as the random state value that generated a particular metric, hyperparameters, etc.
Instead of manually tracking this information on Excel, you can leverage Comet which automates and simplifies the process. This way, you can focus on improving your model’s performance without spending time on tracking details.
In this post, we will discuss how to keep track of one of the precious assets of an ML experiment: a dataset (which Comet classifies as Artifacts). Without further ado, let’s get started!
pip install comet-ml
Artifact in Comet is a keyword to describe any data files or datasets you use in your ML experiments. If you’re building a model that requires multiple datasets or using different versions of a specific dataset, it’s important to keep track of these datasets so you know which ones were used to train certain models. Comet offers functionality to help you do this.
Now that we understand what Artifacts are, the next step is to learn how to log them to the Comet platform. You will learn how to log Artifacts to either a new experiment we will create on the platform or to an existing experiment that has been created previously on the platform.
So the Artifacts we want to log to a new experiment we just want to create is a dataset on my local machine called bankerchurners.csv
. Now to log the Artifact, we will leverage the following code:
Let’s go over the above code:
artifact.add()
it to the Artifact instance. Finally, we logged the artifact to the Comet platform and ended the experiment.So say we have an experiment that has been created previously on the Comet platform and then we want to log Artifacts to it. We can leverage the ExistingExperiment()
object in Comet to achieve that. Check out the below code for details.
Let’s go over what we have in the above code:
So the above code is somewhat the same as the previous one except for the code at line 9:
#initialize the ExistingExperiment Object and pass in the name of project the experiment belong also with it key
experiment = ExistingExperiment(project_name = "customer_churn", experiment_key="85d50008eb9042788a0ea9037737df79")
So we make use of the ExistingExperiment object in Comet. We then pass in the name of the existing project we want to use which in our case is "customer_churn"
. Now, a project can have a lot of experiments, but each experiment in that project has a key to them so you will need to specify the experiment you want to log the Artifact to. Firstly you will click on the experiment you want to log in to as shown below:
Once that’s done it is then possible to log the Artifact to the existing experiment.
There could be cases when you will need to download Artifacts from the Comet platform to your local machine. For example, it could be you’re collaborating on the Comet platform and your colleague have pushed some version of a dataset to the platform but you don’t have it with you on your machine. So you will want to download it then to your machine.
To download Artifacts from the Comet platform to your machine, use the following code:
Let’s go over the above code:
We initialized the experiment object. Then we use the experiment.get_artifact()
method to get the Artifact in our workspace. What this does is it will scan through all the experiments that are present in our workspace to find the artifact we want to get and then assign it to the variable. After that, we can use the .download()
method to download the artifact. We will need to specify the path we want the artifact to be downloaded.
In this tutorial you’ve learned how to log an Artifact to the Comet platform. You learned to log Artifacts to a new experiment and also an existing experiment. Also, you learned how to download an Artifact that exists in the Comet platform. There are also several operations you can do with an Artifact on Comet.
Artifacts make it easy to spend more time working on improving your experiments with automatic tracking and logging. You can access the GitHub link and also learn more about Artifacts.
The above code is a modified version of the code available on the Comet docs.