skip to Main Content

Using XGBoost for Deep Learning

Photo by Alexander Grey on Unsplash

XGBoost is a powerful library that performs gradient boosting. It has an excellent reputation as a tool for predicting many kinds of problems in data science and machine learning. It is also used in regression and classification problems as it is highly intuitive and easily interpretable.

One good indicator of how powerful XGBoost is is the number of winning solutions that utilize this model on Kaggle. For many years, gradient-boosting models and deep-learning solutions have won the lion’s share of Kaggle competitions.

Using XGBoost can quickly help you get from start to finish of a project with minimal initial feature engineering.

One robust use case for XGBoost is integrating it with neural networks to perform a given task. XGBoost is not limited to machine learning tasks, as its incredible power can be harnessed when harmonized with deep learning algorithms.

We can now focus on how to approach XGBoost from a deep learning standpoint and even leverage technical information on building the architecture we aim to leverage.

First of all, to implement this, we need to be familiar with the libraries that contain our model:

  1. XGBoost
  2. Tensorflow and Keras

All the above libraries can be installed by checking on their official sites. For clarity, Tensorflow and Pytorch can be used for building neural networks.

ConvXGB Introduction

ConvXGB is a model that incorporates both Convolutional Neural Networks and XGBoost. It was envisioned by Thongsuwan et al., who found inherent advantages to the architecture and highlighted a few shortcomings by trying to leverage each model individually. Still, there was an excellent way to look at this:

  1. Using XGBoost is “still unclear for feature learning [1].”
  2. Integrating Convolutional Neural Networks provides the added benefit of better feature learning, eliminating the shortcoming of XGBoost.

The process above is standard, as practitioners often combine models in different ways for better performance. ConvXGB explores this but with a few exciting caveats that the researchers included. The Convolutional Neural Network’s architecture differs slightly from the usual architecture.

We can now explore ConvXGB’s architecture and look keenly at this model’s nitty-gritty details.

Architecture

This model’s architecture is categorized into two sections: The feature learner and the class predictor. Both sections are further subdivided into the critical steps to implement this model successfully.

The feature learning section has an input section and a data preprocessing section. The idea behind this is that it’s necessary to have data in the appropriate form to be ingested by the convolutional neural network. For data that is not in a tuple or Numpy array form, the data preprocessing layer is responsible for performing this conversion. If you aren’t familiar with this step, check out my articles discussing Numpy arrays and the constraints that you may experience dealing with them.

The convolutional neural network has some exciting features that are not typical of the average CNN. Thongsuwan et al. hypothesized that reducing parameters would reduce simplicity for a few reasons. This would happen by not including a fully connected layer and pooling layers because “it is not necessary to bring weights from the FC layers back to re-adjust weights in the previous layers [1].”

Now, the data can be fed into the predicting section of the model. This section consists of three parts: the reshaping layer, the class prediction layer, and the output layer.

Benefits and Drawbacks

This architecture has several benefits, as the paper’s writers demonstrated, but they have shortcomings.

One of the most impressive benefits of this architecture is how well it performs against other similar algorithms. In many of the benchmarks in the paper, the ConvXGB architecture outperformed other similar models.

Despite the excellent performance, there were a few drawbacks. For instance, one has to get the number of convolution layers right, as coming up with too many may increase the architecture’s computational cost. The authors admit that it might “outweigh any advantage, and we must balance carefully any increase in accuracy with the cost incurred [1].”

Conclusion

The ConvXGB is a powerful model that combines the power of XGBoost with deep learning. Its simplicity and ease of implementation allow for inference and strong performance if implemented correctly.

In our next article, we can try an implementation of the model.

Sources:

[1]Thongsuwan, Setthanun, Saichon Jaiyen, Anantachai Padcharoen, and Praveen Agarwal. “ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost.” Nuclear Engineering and Technology53, no. 2 (2021): 522–531.

Mwanikii headshot

Mwanikii Njagi

Back To Top