August 30, 2024
A guest post from Fabrício Ceolin, DevOps Engineer at Comet. Inspired by the growing demand…
Deep learning and tabular data infamously have a difficult relationship with each other. Research has, over the years, led to tremendous spikes in performance for deep learning in various different domains and this has led many to logically assume that the same gains could be replicated in tabular data.
So far, a multitude of results have arisen through painstaking research; some are encouraging and some raise serious questions about the feasibility of using deep learning in this manner. I will explore both perspectives.
Deep learning relies on artificial neural networks to perform a specified function. Some of these functions may include coming up with predictions for classification problems or even more complex problems such as coloring images and detecting fraud.
Borisov et al. state that “deep learning methods perform outstandingly well for classification tasks or data generation tasks on homogenous data (Borisov et al., 2021).” Homogenous data is comprised of a few categories such as images, audio, and text.
Tabular data on the other hand falls under the category of heterogenous data as it comes with many characteristics such as having many types of numerical or categorical data types within it.
Deep learning has largely been constrained by these characteristics and has gained the interest of researchers such as Borisov who have suggested that it could be possible to improve its performance on tabular data. It then raises the question, which exact characteristics cause the weak performance of deep learning on tabular data?
Centralizing knowledge means being able to reproduce, extrapolate, and tailor experiments. Learn how large scale companies like Uber share internal knowledge.
The homogeneity of data is not something that someone immediately thinks about when tackling tabular data. It unavoidably presents itself as a daunting task when trying to improve the performance of deep learning algorithms.
For instance, different features in tables have largely different statistical properties. Some features correlate strongly with others and have a strong influence on what the final outcome may be while others have minimal impact.
Additionally, this correlation is also weaker than any of the correlations prevalent in spatial or semantic relationships (image and audio) according to Borisov et al. One is then left with the problem of having to rely on other methods to come up with intuition for these models.
Attempts to overcome these problems have led to interesting lines of research that have further magnified a problem with deep learning models; a shortage of interpretability. Using neural networks that have many hidden layers may make it more difficult to know what alterations lead to changes in performance.
A few research papers have come up with promising results that it is possible to improve the performance of deep learning on tabular data. The same research also shows that it is possible to surpass gradient-boosting models on tabular data.
The proposed methods heavily lean on one part of machine learning but move in different directions to achieve better performance. Regularization, according to both, Kadra et al. and Shavit et al., is where the new focus of improving deep learning for tabular datasets should occur.
In the papers I mentioned above, there is a general consensus that some form of regularization is necessary in order to get the spectacular results that we are looking for.
For instance, Kadra et al.’s paper demonstrates that it is possible for even the simplest neural networks to outperform traditional state-of-the-art gradient boosting models such as XGBoost.
They suggested that using regularization techniques tailored for raw data makes deep learning perform much better on tabular data. They go further to propose using “regularization cocktails.” According to the paper’s author, “the optimal regularizer is a cocktail mixture of a large set of regularization methods, all being simultaneously applied with different strengths (Kadra et al., 2021).”
The problem with this technique is that it does not offer opportunities for the broad utilization of suggested hyperparameters for many different datasets. Kadra et al. admit that these hyperparameters would need to be “dataset-specific.” One would have to specifically tailor different cocktails for individual datasets to squeeze performance out of them.
Shavit and Segal’s paper, on the other hand, argues that the introduction of a loss function known as the “Counterfactual Loss ”could drastically improve hyperparameter tuning. Additionally, they name the neural networks that use this particular method of regularization “Regularization Neural Networks.”
They state that a Regularization Learning Network would “use the Counterfactual Loss to tune its regularization hyperparameters efficiently during learning together with the learning of the weights of the network (Shavit & Segal, 2018).” Additionally, these RLNs performed best when ensembled with Gradient-Boosting algorithms.
The implementation of the above can be found on GitHub.
Both methods yield promising results that could potentially be developed to ensure that deep learning gets better at predicting tabular data.
We still have to assess whether it is a necessary struggle to come up with better techniques to ensure that deep learning works with tabular data. Is it more productive to develop the already promising gradient boosting models such as XGBoost or should more effort be devoted to deep learning?
More research into these questions will yield the answers we want.
Sources:
[1] Borisov, V., Leemann, T., Seßler, K., Haug, J., Pawelczyk, M., & Kasneci, G. (2021). Deep neural networks and tabular data: A survey. arXiv preprint arXiv:2110.01889.
[2] Kadra, A., Lindauer, M., Hutter, F., & Grabocka, J. (2021). Regularization is all you need: Simple neural nets can excel on tabular data. arXiv preprint arXiv:2106.11189.
[3] Shavitt, I., & Segal, E. (2018). Regularization learning networks: deep learning for tabular datasets. Advances in Neural Information Processing Systems, 31.