August 30, 2024
A guest post from Fabrício Ceolin, DevOps Engineer at Comet. Inspired by the growing demand…
This post was written in collaboration with Aleksey Boligur from the Quilt Data team. Follow Aleksey on Twitter and his personal website here. Follow Quilt here
The term machine learning ‘pipeline’ can suggest a one-way flow of data and transformations, but in reality, machine learning pipelines are cyclical and iterative. For a given project, a data scientist can try hundreds and thousands of experiments before arriving at a champion model to put in production.
With each iteration, it becomes harder to manage subsets and variations of your data and models. Keeping track of which model iteration ran on which dataset is key to reproducibility.
In this article, we’ll show you how to build a simple and reproducible end-to-end machine learning pipeline using a Keras image multi-class classification model and a custom dataset crafted from Google Open Images using Quilt T4 and comet.ml
You can access the full tutorial in this Github repository. For a walk-through of the tutorial, continue reading below ⬇.️
The Open Images Dataset is an attractive target for building image recognition algorithms because it is one of the largest, most accurate, and most easily accessible image recognition datasets. For image recognition tasks, Open Images contains 15 million bounding boxes for 600 categories of objects on 1.75 million images. Image labeling tasks meanwhile enjoy 30 million labels across almost 20,000 categories.
The images come from Flickr and are of highly variable quality, as would be realistic in an applied machine learning setting.
Downloading the entire Google Open Images corpus is possible and potentially necessary if you want to build a general purpose image classifier or bounding box algorithm. However downloading everything is a waste if you just want a small categorical subset of the data in the corpus. For this tutorial, we are just interested in downloading and working with fruit images.
The src/openimager
subfolder in the Github repository provided contains a small module that handles downloading a categorical subset of the Open Images corpus: just the images corresponding with a user-selected group of labels, and just from the set of images with bounding box information attached. Instead of using the zipped blob files it does so by downloading the source images from Flickr directly.
This script will allow you to download any subset of the 600 labels that do. Here’s a taste of what’s possible:
football
,
, toy
, bird
, cat
,vase
, lemon
, dog
, elephant
, shark
flower
, furniture
, airplane
, spoon
, bench
, swan
, peanut
, camera
, flute
, helmet
, pomegranate
, crown
…
For the purposes of this article, we’ll limit ourselves to just fruit classes including:
apple
, banana
, cantaloupe
, common_fig
, grape
, lemon
, mango
, orange
, peach
, pear
, pineapple
, pomegranate
, strawberry
, tomato
, watermelon
For more information on Open Images, check out the article ‘How to classify photos in 600 classes using nine million Open Images’.
This annotated Jupyter notebook in the demo GitHub repository does this work. After running the notebook code, we will have an images_cropped
folder on disk containing all of the cropped images.
It’s easy to access the package of fruit class data along with the pre-processed images is via the Quilt T4 package . In order to access the data, simply run this command:
! pip install t4
t4.Package.install('quilt/open_fruit', registry='s3://quilt-example', dest='some/path/some/where')
Looking closely at the fruit data, we can see that there is a class imbalance. There are over 26,000 samples of bananas but then only a few hundred labelled common fig or pear examples. This skew is important to note as we approach building our image classifier.
Now that we’ve downloaded our fruit data from Quilt, we can begin building our image classification model! As with any machine learning project, we’ll go through a few experiments to try to maximize our model’s validation accuracy:
The material for this tutorial was inspired by Francois Chollet’s excellent post ‘Building powerful image classification models using very little data’. We’ve expanded upon Chollet’s example and adjusted to reflect our multi-class classification problem space.
Along with having proper data versioning from Quilt, we’ll also make sure to track our results, code, and environment for our different model iterations as this is critical to building a reproducible machine learning model pipeline.
Note: We’ll be using Jupyter notebooks for this tutorial, but comet.ml has native support for both Jupyter notebooks and scripts.
For our baseline model, we are using a small CNN with three convolution layers, using a ReLU activation, followed by a max-pooling layer. We’ll include data augmentation and fairly aggressive dropout to prevent overfitting. Remember, we’re not expecting our best accuracy here, so if you’d like to skip this section and go straight to the pre-trained model, simply proceed to the next section below.
Here’s the experiment details for our small CNN model:
Not surprisingly, our simple CNN model did not perform that well on the multi-classification task (which puts us in a multi-dimensional space). The model was originally meant to support a binary classification task, so having more than three times the number of classes means trivially you need more nodes to get the same performance. Here are the metrics for one run of our model (link here):
To log your experiment results from training, set up your comet.ml account here. For each run of the model, we initialize the Comet experiment object and provide our API Key and project name.
Once you run model.fit()
, you’ll be able to see your different model runs in comet.ml through the direct experiment URL. As an example for this tutorial, we have created a Comet project that you can view and interact with here.
Since we’re using Keras, Comet’s auto-logging for popular machine learning frameworks allows us to automatically capture model details such as metrics like accuracy and loss, the model’s graph definition, and package dependencies — this significantly reduces the amount of manual logging we have to do from our end.
A popular starting point for building image classifiers these days is to use a pre-trained network and fine-tune it with new classes of data. Let’s use this approach to build our image classifier (just make sure to take note of these implementation details for pre-trained models).
There are several popular CNN architectures such as VGGNet, ResNet, and AlexNet along with a wealth of resources to read more about CNNs (see here and here). Keras enables users to easily access these pre-trained models (ie. their weights pre-trained on ImageNet) through keras.applications.
We selected InceptionV3 since it’s both a smaller model compared to VGGNet and because it’s documented to provide a higher accuracy for benchmark datasets. Transfer learning with InceptionV3 essentially means that we re-use the feature extraction portion of the model that has been trained with the ImageNet dataset and re-train the classification portion on our fruit dataset.
Here’s the code plus experiment details for our fine-tuned InceptionV3 model:
Once we begin training with model.fit()
, we can use Comet to track how the model is performing in real-time. We can also check to make sure that we’re properly using our GPUs in the System Metrics tab. The experiment charts in Comet update with our model’s accuracy and loss metrics:
We’ll make sure to log our model weights at the end of the training process to Comet so we can reproduce the model in the future if we need to.
# save locally
model.save_weights('inceptionv3_tuned.h5')
# save to Comet Asset Tab
# you can retrieve these weights later via the REST API
experiment.log_asset(file_path='./inceptionv3_tuned.h5', file_name='inceptionv3_tuned.h5')
If you want to retrieve the model code and have trained your model from a git directory, simply use the Reproduce button in the Comet experiment view.
The Reproduce dropdown will surface key pieces of information about your environment, git commit, and everything you need to reproduce your experiment, including the actual run commands or notebook file. If you have uncommitted changes, we also provide you with a patch for applying your changes later.
In order to evaluate our image classifier model, it’s useful to generate a few sample predictions and plot a confusion matrix so we can see where our model classified certain fruits correctly and incorrectly.
These images and figures would also be useful to share with teammates, so we can log them to Comet even after the experiment is complete using the Experiment.log_figure()
and Experiment.log_image()
methods (see more here).
See this great resource on evaluating machine learning models from Jeremy Jordan: https://www.jeremyjordan.me/evaluating-a-machine-learning-model/
There are several ways we could approach improving our model. Here is an non-exhaustive list of things we could try to adjust:
As you try these different optimizations, comet.ml allows you to create visualizations like bar charts and line plots to track your experiments with along with parallel coordinate charts. These experiment-level and project-level visualizations help you quickly identify your best-performing models and understand your parameter space.
If you had to share your model results or intermediate work with your fellow data scientist today. How would you do it?
The benefits of using Quilt for data versioning and Comet for model versioning is that by combining these best-in-breed tools you can simultaneously make your machine learning model experiments easily accessible, trackable, and reproducible.
Sharing a model and the code used to generate it? Link your collaborator to the Comet experiment page. Sharing the data you used? Share a link to the Quilt T4 package.
Reproducing the result locally, or using an old experiment as the starting point for a new one? Get back to where you left off with this code:
# GET THE CODE
git clone https://github.com/comet-ml/keras-fruit-classifer
cd open_fruit/
# GET THE DATA
python -c "import t4; t4.Package.install('quilt/open_fruit', 's3://quilt-example', dest='keras-fruit-classifier/')"
# GET THE ENVIRONMENT
# There are a *lot* of ways to do this: a pip requirements.txt, a
# conda environment.yml, a Docker container...
# Here's one cool way - cloning the Comet runtime
PY_VERSION=$(python -c "import comet_ml; print(comet_ml.API().get_experiment_system_details('01e427cedce145f8bc69f19ae9fb45bb')['python_version'])")
conda create -n my_test_env python=$PY_VERSION
conda activate my_test_env
python -c "import comet_ml; print('n'.join(comet_ml.API().get_experiment_installed_packages('01e427cedce145f8bc69f19ae9fb45bb')))" > requirements.txt
pip install -r requirements.txt
# You can also get this from comet.ml by clicking on the Download button
# GET DEVELOPING
jupyter notebook
Congratulations! You’ve gone beyond building a multi-class image classifier model to building a fully reproducible (and shareable) machine learning pipeline with data, code, and environment details ⭐️
Thanks to Gideon Mendels and Aleksey Bilogur.