How to Integrate Comet with Catboost Workflows

Words By Mwanikii Njagi

January 9, 2024

person writing a workflow strategy on a whiteboard — Photo by Campaign Creators on Unsplash

Catboost is one of the most versatile gradient-boosting models. Its crucial capability is processing categorical data without converting it to numerical data. This means that the model can perform its function as you desire after specifying categorical data. An added benefit is that a data practitioner can easily and quickly establish a baseline with minimal data transformations. Unfortunately, as you scroll through Comet’s supported libraries, you will discover the glaring lack of Catboost workflows support. Fortunately, I have a clever but straightforward workaround leveraging Comet’s versatile capabilities.

Comet’s support for Tensorflow’s Tensorboard can become a saving grace for anyone using Catboost effectively with Comet. One integral feature of Catboost is that it uses Tensorboard to keep track of training runs and stores that information locally on the machine’s disk. Let’s look at the general workflow of performing this integration.

Requirements

There are a few things you need to install before you go through with this:
1. Catboost.

2. Comet’s official library.

3. A Comet account that you can get by signing up here.

3. Visual Studio Code (or any IDE that supports Tensorboard Integration).

With the above, it is now possible to proceed.

Catboost Workflows Integration with Tensorboard

Catboost integrates with Tensorboard to ensure we can adequately visualize training runs. It indicates the desired metric that one has specified during the training run, graphs it, and allows for customization. The picture below is an excellent example of what I am talking about.

tensorboard screenshot — Tensorboard for Catboost, screenshot by author

An added advantage of this library is that it stores the information in a local directory or a directory you initially specified before the training run, as seen below.

In this article, we use this feature to our advantage as Comet allows us to upload this information and create a custom panel to have a clear view of the given data. Despite the lack of support for this library, we see that the support for TensorFlow’s Tensorboard gives us an added advantage because we can leverage a single aspect of the Catboost library and upload critical information through an existing Comet function.

Let’s code!

Simple Project

Our simple project here will only focus on the capabilities Catboost, Tensorboard, and Comet offer. All data transformation has already been done, so we have a reasonably clean dataset that will be fed into the model.

The dataset of our choice is the Bengaluru House Prediction Dataset from Machinehack. This competition aims to test your skills in regression problems. We can now check our preprocessed data with Pandas to have a general overview of our data.

import pandas as pd

#reading data in local directory
df = pd.read_csv("preprocessed_train_data.csv")

#defining features(X) and targets(y)
X = df.drop(["price"], axis=1)
y = df["price"]

#printing the top 5 positions of X and y
print(X.head())
print(y.head())

X data by author

Now that we have seen our myriad of features (including one categorical one), we can incorporate this into Comet.

Comet Incorporation

Note: Import the Comet library first and initialize the project before proceeding with any other code for a smoother way forward.

First, import the Comet library and initialize our project under the name “catboost_comet.”

import comet_ml
comet_ml.login(project_name="catboost_comet")

We then define our categorical features and perform train-validation splits for the training of our model. Catboost requires a user to specify the categorical features that a dataset has.

from sklearn.model_selection import train_test_split
import numpy as np

#defining categorical features
categorical_features_indices = np.where(X.dtypes !=np.float)[0]

#Train-test split
X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size = 0.8, random_state = 12)

Now, we can feed the above information into Catboost’s regressor and perform training.

from catboost import CatBoostRegressor

#defining model params
model  = CatBoostRegressor(iterations=50, depth=3, learning_rate=0.1, loss_function='RMSE', early_stopping_rounds=5)

#training model on data
model.fit(X_train, y_train, cat_features=categorical_features_indices, eval_set=(X_validation, y_validation), plot=True)

#performing inference
y_valid = model.predict(X_validation)

training runs and results — Training run and results by author

After this run, there is the expectation that information on the training run will be stored in a folder for the Tensorboard to perform a visualization. This folder will be within the project’s directory. We can extract two files containing the critical info we need from it.

local directory — Local directory by author

Within the “catboost_info” directory, we will find the “learn_error.tsv” and “test_error.tsv” files. We shall log these files into Comet using “log_table().”

#logging both tables to Comet
experiment.log_table("./src/catboost_info/learn_error.tsv")
experiment.log_table("./src/catboost_info/test_error.tsv")

Once those are complete, we can end the experiment and open Comet to see whether it was successful.

#Ending experiment
experiment.end()

Comet Visualization

Once we open our Comet project page, we will find and open our project.

Catboost and Comet project cover — Project image by author

screenshot of project in Comet — Screenshot of project by author

After pressing the prominent blue button in the middle, you will get the menu below and pick “Data Panel.”

Comet menu with scatter chart highlighted — Comet Menu by author

After opening that menu, we will find a drop-down menu called “Data Selection,” where we will see the two tables we initially uploaded.

adding data chart to Comet project — Screenshot by author

We then pick the first table, and it will give a preview of the data. After we press it, it will appear on the original page with no panels, as seen below.

When we press “Add” in the top right corner, we see the option to add another panel, and then we can repeat the process for the other table that we logged in.

We finally have logged information concerning Catboost workflows using Comet’s support for the Tensorboard. In this tutorial, we have successfully integrated a library that Comet does not officially support.

Run open source LLM evaluations with Opik!

How to Integrate Comet with Catboost Workflows

Requirements

Catboost Workflows Integration with Tensorboard

Simple Project

Comet Incorporation

Comet Visualization

Mwanikii Njagi

Products

Learn

Company

Pricing

Run open source LLM evaluations with Opik!

Requirements

Catboost Workflows Integration with Tensorboard

Simple Project

Comet Incorporation

Comet Visualization

Mwanikii Njagi

Related Articles