August 30, 2024
A guest post from Fabrício Ceolin, DevOps Engineer at Comet. Inspired by the growing demand…
AI tools such as ChatGPT, DALL-E, and Midjourney are increasingly becoming a part of our daily lives. These tools were developed with deep learning techniques. Deep learning is a subfield of AI that aims to extract knowledge from data. Today, I’ll walk you through how to perform an end-to-end deep learning project using PyTorch, Comet ML, and Gradio. Here are the topics we’ll cover in this blog:
By the end of this article, you’ll learn step-by-step the life cycle of a deep learning project on how to perform image classification using the cat vs. dog dataset. After finishing the project, our app will look like this:
Before diving into the project, let me explain the libraries I’m going to use in this analysis. Let’s start with PyTorch:
Two frameworks are generally used for deep learning: TensorFlow and PyTorch. TensorFlow is mostly used in industry, while PyTorch is used for academic research. I’m going to use PyTorch for this project because of its user-friendliness, flexibility, and robust community support. Let’s move on and take a look at another library.
When implementing deep learning projects, you’ll need to track your hyperparameters, visualize performance metrics, monitor models, and share experiments with others. This is when Comet ML comes into play.
Comet ML is an machine learning platform that allows you to manage, visualize, compare and optimize models. We’re going to use Comet ML to track our hyperparameters and to monitor our model. Let’s move on and have a look at what Gradio is.
Deep learning projects that are not moved to production are dead projects. Gradio is an open-source Python library that helps you build easy-to-use demos for your ML model that you can share with other people.
Beautiful! We briefly talked about the libraries we’ll use. Let’s go ahead and start loading our dataset.
Beginners often start with clean datasets like the MNIST dataset to learn deep learning. It’s good to start with these datasets, but real-world datasets aren’t always clean. One of the challenges in deep learning is loading and working with a custom dataset. The dataset we’ll use is the cat and dog dataset, which contains images of cats and dogs.
Before loading this dataset, let’s launch an Experiment in Comet ML..
# Installing comet_ml
# !pip install comet_ml
# Importing the comet_ml library
import comet_ml
from comet_ml import Experiment
# Building an experiment with your API key
experiment = Experiment(
api_key= my_api_key,
workspace="tirendaz-academy",
project_name="experiment-tracking")
# Setting hyperparameters
hyper_params = {"seed": 42, "batch_size": 32, "num_epochs": 20,
"learning_rate": 1e-3,"image_size": 224}
# Logging hyperparamters
experiment.log_parameters(hyper_params)
If you don’t have a Comet ML account yet, you can create a free account here. To follow along with the code in this blog, you can access the notebook I used in this project here.
Awesome! We started our experiment. Now let’s take a look at our version of Torch and check if we have access to CUDA (GPU). CUDA is a parallel computing platform developed by NVIDIA that makes calculations faster. You can use it for free (in limited amounts) on Google Colab or Kaggle notebooks.
import torch
from torch import nn
# Make sure torch >= 1.10.0
print("The version of torch:", torch.__version__)
# Setuping cuda
device = "cuda" if torch.cuda.is_available() else "cpu"
print("The type of device: ",device)
# Output:
The version of torch: 1.11.0
The type of device: cuda
Now let’s create our training and test paths that we’ll use while loading data.
# Creating our paths
my_train_dir = "/kaggle/input/cat-and-dog/training_set/training_set"
my_test_dir = "/kaggle/input/cat-and-dog/test_set/test_set"
Excellent! Now let’s get a random image and look at the features of this image.
import random
from PIL import Image
import glob
from pathlib import Path
# Setting seed
random.seed(hyper_params["seed"])
# Creating our image path
image_path= glob.glob(f"{image_path}/*/*/*/*.jpg")
# Getting random a path
random_image_path = random.choice(image_path)
# Creating a variable for the path
image_class = Path(random_image_path).parent.stem
# Let's open the image
image = Image.open(random_image_path)
# Let's print our metadata
print("Random image path: {}".format(random_image_path))
print("Image class: {}".format(image_class))
print("Image height: {}".format(image.height))
print("Image width: {}".format(image.width))
image
This is a cat image and the size of this image is 217*179. What a cute cat, right? I love cats. Let’s go ahead and create the necessary functions to load the dataset.
So far, we created variables for the dataset paths and explored an image from the dataset. Note that images in a dataset may not all be the same size. In this case, we’ll need to do some data preprocessing to standardize the size, shape, and format of th stylee pictures.
Transforming data, also known as preprocessing, helps you prepare quality data. With transforming, you can improve the performance of your model and reduce the risk of bias. Let’s transform our dataset with torchvision.transforms
in PyTorch.
from torchvision import transforms
# Setting our image size
IMAGE_SIZE=(hyper_params["image_size"], hyper_params["image_size"])
# Creating a transform for training using TrivialAugment
my_train_transform = transforms.Compose([
transforms.Resize(IMAGE_SIZE),
transforms.TrivialAugmentWide(),
transforms.ToTensor()])
# Creating a transform for testing
my_test_transform = transforms.Compose([
transforms.Resize(IMAGE_SIZE),
transforms.ToTensor()])
Here, we used the TrivialAugmentWide
function, which is a data augmentation technique. Now let’s take a step back and talk about what data augmentation is. Data augmentation is a method used to artificially increase the diversity of your data by modifying your existing data. This technique is often utilized when the dataset is small. I encourage you to examine the examples of the various transforms here.
Nice, we determined how to transform our dataset. Now we’re ready to create our own custom torch Dataset
class.
Dataset
You can find many ready-made datasets such as MNIST, and CIFAR100 in the torchvision.datasets
module. But in most cases, you need to handle real-world datasets. If you want, you can create your own class to load the dataset in PyTorch. But, the good news is that you can use the ImageFolder
function if the format of your dataset is as shown below:
We can use this function because the format of our dataset is as shown above. Let’s load images from train and test folders into Datasets
with the ImageFolder
function.
# Converting our image folders ito Datasets
from torchvision import datasets
# Converting our image folders ito Datasets
my_train_data = datasets.ImageFolder(my_train_dir, transform=my_train_transform)
my_test_data = datasets.ImageFolder(my_test_dir, transform=my_test_transform)
Note that PyTorch has two great functions for loading the dataset: Dataset
and DataLoader
. The samples and their related labels are stored in Dataset
. DataLoader
iteratively wraps the Dataset
for easy access to samples. We built our custom Dataset
s. It’s time to turn our custom Dataset
s into DataLoader
s. Show time!
from torch.utils.data import DataLoader
# Setting some parameters
torch.manual_seed(hyper_params["seed"])
NUM_WORKERS = os.cpu_count()
# Creating a training DataLoader
my_train_dataloader = DataLoader(my_train_data,
batch_size=hyper_params["batch_size"],
shuffle=True,
num_workers=NUM_WORKERS)
# Creating a test DataLoader
my_test_dataloader = DataLoader(my_test_data,
batch_size=hyper_params["batch_size"],
shuffle=False,
num_workers=NUM_WORKERS)
Awesome! We have prepared the necessary functions to load the dataset. We are now ready to build a CNN-based model.
Want to see the evolution of AI-generated art projects? Visit our public project to see time-lapses, experiment evolutions, and more!
Convolutional neural network (CNN) is a deep learning technique often used to extract patterns in visual data. CNN consists of at least three layer types: the convolutional layer, the pooling layer and then the fully connected layer.
The convolutional layer is the fundamental building block of a CNN that is used to extract information such as edges from images. The pooling layer is added between the successive convolution layers and is leveraged to reduce the number of parameters. Images pass through the convolution and pooling layers, and then classification occurs in the final fully connected layer.
You can build a CNN-based model with transfer learning. But I’m going to create a CNN model from scratch:
# Creating a CNN-based image classifier.
class ImageClassifier(nn.Module):
def __init__(self):
super().__init__()
# Creating our first convolutional layer
self.conv_layer_1 = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.MaxPool2d(2))
# Creating our second convolutional layer
self.conv_layer_2 = nn.Sequential(
nn.Conv2d(64, 512, 3, padding=1),
nn.ReLU(),
nn.BatchNorm2d(512),
nn.MaxPool2d(2))
# Creating our third convolutional layer
self.conv_layer_3 = nn.Sequential(
nn.Conv2d(512, 512, kernel_size=3, padding=1),
nn.ReLU(),
nn.BatchNorm2d(512),
nn.MaxPool2d(2))
# Creating our classifier
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=512*3*3, out_features=2))
# Defining the forward function to pass data
def forward(self, x: torch.Tensor):
x = self.conv_layer_1(x)
x = self.conv_layer_2(x)
x = self.conv_layer_3(x)
x = self.conv_layer_3(x)
x = self.conv_layer_3(x)
x = self.conv_layer_3(x)
x = self.classifier(x)
return x
# Instantiating an object
my_model = ImageClassifier().to(device)
Here, we first defined the hyperparameters and then used these hyperparameters in the forward
function. After building the model architecture, we instantiated an object from the ImageClassifier
class.
Nice, we created a CNN-based model. Note that we used some hyperparameters such as filter size, number of neurons, and activation function. You can fine-tune these hyperparameters to build better models.
Let’s see the architecture of the model with torchinfo
and then pass an image from this architecture for control.
# Installing torchinfo
# !pip install torchinfo
import torchinfo
from torchinfo import summary
# Testing with an example input size
summary(my_model, input_size=[1, 3, hyper_params["image_size"] ,hyper_params["image_size"]])
The output torchinfo.summary
shows all the information about our model, such as input size, the total size of parameters, and the estimated total size.
Awesome, our model worked without errors. Let’s move on to creating the train and test steps. Note that we build the model using the training data and evaluate the performance of the model on the test data. First, let’s create a function that we’ll use to train the model.
def my_train_step(model: torch.nn.Module,
dataloader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module,
optimizer: torch.optim.Optimizer):
# Setting train mode
my_model.train()
# Initializing train loss & train accuracy values
train_loss = 0
train_acc = 0
# Looping through each batch data in the dataloader
for batch, (inp, out) in enumerate(dataloader):
# Moving data to device
inp, out = inp.to(device), out.to(device)
# Predicting the input
y_pred = my_model(inp)
# Calculating & accumulating loss
loss = loss_fn(y_pred, out)
train_loss += loss.item()
# Optimizer zero grad
optimizer.zero_grad()
# Loss backward
loss.backward()
# Optimizer step
optimizer.step()
# Calculating & accumulating the accuracy metric
y_pred_label = torch.argmax(torch.softmax(y_pred, dim=1), dim=1)
train_acc += (y_pred_label == out).sum().item()/len(y_pred)
# Calculating metrics
train_loss = train_loss / len(dataloader)
train_acc = train_acc / len(dataloader)
# Logging train metrics
experiment.log_metrics({"train_accuracy": train_acc, "train_loss": train_loss}, epoch=hyper_params['num_epochs'])
return train_loss, train_acc
Here, we used the experiment.log_metrics
function to track the model metrics. Note that if you haven’t instantiated a Comet Experiment (as we did at the very beginning of this article), this step will throw an error. So, we can monitor these metrics on the Comet ML dashboard while the model is being trained.
Beautiful, we created a function for the training step. Let’s go ahead and similarly create the function that we’ll use to test the model.
def my_test_step(model: torch.nn.Module,
dataloader: torch.utils.data.DataLoader,
loss_fn: torch.nn.Module):
# Setting eval mode
my_model.eval()
# Initializing test loss & test accuracy values
test_loss = 0
test_acc = 0
# Starting the inference mode
with torch.inference_mode():
# Looping through each batch data in the dataloader
for batch, (inp, out) in enumerate(dataloader):
# Moving data to device
inp, out = inp.to(device), out.to(device)
# Forward pass
test_pred_logits = my_model(inp)
# Calculating & accumulating loss
loss = loss_fn(test_pred_logits, out)
test_loss += loss.item()
# Calculating & accumulating the accuracy metric
test_pred_labels = test_pred_logits.argmax(dim=1)
test_acc += ((test_pred_labels == out).sum().item()/len(test_pred_labels))
# Calculating metrics
test_loss = test_loss / len(dataloader)
test_acc = test_acc / len(dataloader)
# Logging test metrics
experiment.log_metrics({"test_accuracy": test_acc, "test_loss": test_loss}, epoch=hyper_params['num_epochs'])
return test_loss, test_acc
Here, we used model.eval
mode, as we’ll only use this function to evaluate the model. Now let’s define a function named train to combine the train_step
and test_step
functions.
from tqdm.auto import tqdm
# Setting parameters
def my_train(model: torch.nn.Module,
train_dataloader: torch.utils.data.DataLoader,
test_dataloader: torch.utils.data.DataLoader,
optimizer: torch.optim.Optimizer,
loss_fn: torch.nn.Module = nn.CrossEntropyLoss(),
epochs: int = 5):
# Creating a variable for metrics
my_results = {"train_loss": [],
"train_acc": [],
"test_loss": [],
"test_acc": []}
# Looping for training and testing steps
for epoch in tqdm(range(epochs)):
train_loss, train_acc = my_train_step(model=my_model,
dataloader=my_train_dataloader,
loss_fn=my_loss_fn,
optimizer=my_optimizer)
test_loss, test_acc = my_test_step(model=my_model,
dataloader=my_test_dataloader,
loss_fn=my_loss_fn)
# Printing results
print(
f"Epoch: {epoch+1} | "
f"train_loss: {train_loss:.4f} | "
f"train_acc: {train_acc:.4f} | "
f"test_loss: {test_loss:.4f} | "
f"test_acc: {test_acc:.4f}")
# Updating results
my_results["train_loss"].append(train_loss)
my_results["train_acc"].append(train_acc)
my_results["test_loss"].append(test_loss)
my_results["test_acc"].append(test_acc)
# Returning results at the end of the epochs
return my_results
Awesome, we have created our functions to train and test the model. Now we can start model training.
So far, we have created training and testing steps, and then we have prepared a function to combine these steps. We are ready to train the model using these steps. Show time!
# Setting seeds
torch.manual_seed(hyper_params["seed"])
torch.cuda.manual_seed(hyper_params["seed"])
# Creating loss function & optimizer
my_loss_fn = nn.CrossEntropyLoss()
my_optimizer = torch.optim.Adam(params=my_model.parameters(), lr=hyper_params["learning_rate"])
# Initializing the timer
from timeit import default_timer as timer
my_start_time = timer()
# Training our model
my_model_results = my_train(model=my_model,
train_dataloader=my_train_dataloader,
test_dataloader=my_test_dataloader,
optimizer=my_optimizer,
loss_fn=my_loss_fn,
epochs=hyper_params["num_epochs"])
# Ending the timer
my_end_time = timer()
# Printing the time
print(f"Total training time: {my_end_time-my_start_time:.3f} seconds")
Nice, our model was trained on the training data and evaluated on the test data. At the end of 20 epochs, the accuracy of our model on the training data is 0.91 and on the test data is 0.91. Note that we want the accuracy of the model on the training and test data to be close to each other so that our model is not prone to overfitting. Now, let’s visualize the accuracy and loss metrics.
def my_plot_loss_curves(results):
my_results = dict(list(my_model_results.items()))
# Getting the train & test loss values
my_loss = my_results['train_loss']
my_test_loss = my_results['test_loss']
# Getting the train & test accuracy values
my_accuracy = my_results['train_acc']
my_test_accuracy = my_results['test_acc']
# Calculating epochs
my_epochs = range(len(my_results['train_loss']))
# Let's setup a graph
plt.figure(figsize=(15, 7))
# Let's plot loss
plt.subplot(1, 2, 1)
plt.plot(my_epochs, my_loss, label='train_loss')
plt.plot(my_epochs, my_test_loss, label='test_loss')
plt.title('Loss')
plt.xlabel('Epochs')
plt.legend()
# Let's plot accuracy
plt.subplot(1, 2, 2)
plt.plot(my_epochs, my_accuracy, label='train_accuracy')
plt.plot(my_epochs, my_test_accuracy, label='test_accuracy')
plt.title('Accuracy')
plt.xlabel('Epochs')
plt.legend();
# Let's plot the results
my_plot_loss_curves(model_results)
The accuracy values of the model on the training and test data are not bad. Keep in mind that you can achieve better scores by fine-tuning the hyperparameters.
Since you can obtain different versions of models using different hyperparameters, it’s a good idea to track these with Comet. Let me show you.
# Saving our model
from comet_ml.integration.pytorch import log_model
log_model(experiment, my_model, model_name="My_Image_Classification_Model")
Beautiful, we built a CNN-based model and saw the performance of this model. While performing these steps, we logged the hyperparameters and metrics with Comet ML. Finally, we saved the model for versioning and monitoring. Now let’s end our experiment with the following command:
# Ending our experiment
experiment.end()
Time to review the results we found in the Comet ML dashboard. Here are the results:
Projects that remain in the notebooks are dead projects. The ML lifecycle is an ongoing process from data preparation to deployment and monitoring of the model. With Gradio we can create an app and deploy it on Hugging Face for free. Let’s get started:
# Creating a function for prediction
def predict(inp):
image_transform = transforms.Compose([ transforms.Resize(size=(224,224)), transforms.ToTensor()])
labels = ['cat', 'dog']
inp = image_transform(inp).unsqueeze(dim=0)
with torch.no_grad():
prediction = torch.nn.functional.softmax(model(inp))
confidences = {labels[i]: float(prediction.squeeze()[i]) for i in range(len(labels))}
return confidences
# Building an interface
gr.Interface(fn=predict,
inputs=gr.Image(type="pil"),
outputs=gr.Label(num_top_classes=2),
title=title,
description=description,
article=article,
examples=['cat.jpg', 'dog.jpg']).launch()
Here, we first defined a function to predict the label of the image and then created an interface using this function. You can examine this app and access project files here.
Congratulations! You learned how to perform an end-to-end deep learning project. Note that deep learning projects are a never-ending cycle. You collect the data, train the model with this data, then turn this model into an app, move this app to production, and finally monitor whether this app is working properly and iterate.
In this project, we used PyTorch for model building, Comet ML for experiment tracking, and Gradio to convert the model into an app. You can access the GitHub repo of this project here.
That’s it! Thanks for reading and I hope you enjoyed it. Please let me know if you have any feedback and feel free to connect with me on YouTube | Twitter | Instagram | Linkedin | Kaggle