TY - JOUR
T1 - CNN-Based Approaches for Various Types of Tabular Data
AU - Anh, Vu Tuan
AU - Ha, Il Do
AU - Burke, Kevin
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - Deep learning (DL) includes various architectures, such as deep neural networks (DNNs) and convolutional neural networks (CNNs). DL is very powerful and flexible for non-tabular (non-structured) data (e.g. image, text). However, in tabular data, standard DNNs often do not outperform traditional machine learning (ML) methods such as tree-based models (e.g. random forest, XGBoost). CNNs carry out dimensionality reduction for non-tabular (especially image) data, but may be useful in tabular data too. In this paper, we present a unified framework of one-dimensional CNN (1D-CNN)-based approaches for various types of tabular data, which provides an end-to-end learning framework. We also propose two novel 1D-CNN-based models, i.e. a negative binomial CNN (NB-CNN) model for over-dispersed count data and a Cox-based CNN Self-Attention model for high-dimensional survival data. The predictive performance of the proposed method is evaluated by comparing it with existing ML/DL methods using four types of real tabular data, i.e. a binary response data with high dimensional features, over-dispersed count data, high-dimension survival data, and time-series data with substantial variability. The experimental results show that the proposed methods overall outperform existing ML/DL models. In particular, the NB-CNN achieves lower root mean squared error (RMSE) and higher coefficient of determination (R2) on over-dispersed count data than tree-based methods. Similarly, the Cox-based CNN Self-Attention model yields higher C-index values for high-dimensional survival tasks relative to state-of-the-art approaches.
AB - Deep learning (DL) includes various architectures, such as deep neural networks (DNNs) and convolutional neural networks (CNNs). DL is very powerful and flexible for non-tabular (non-structured) data (e.g. image, text). However, in tabular data, standard DNNs often do not outperform traditional machine learning (ML) methods such as tree-based models (e.g. random forest, XGBoost). CNNs carry out dimensionality reduction for non-tabular (especially image) data, but may be useful in tabular data too. In this paper, we present a unified framework of one-dimensional CNN (1D-CNN)-based approaches for various types of tabular data, which provides an end-to-end learning framework. We also propose two novel 1D-CNN-based models, i.e. a negative binomial CNN (NB-CNN) model for over-dispersed count data and a Cox-based CNN Self-Attention model for high-dimensional survival data. The predictive performance of the proposed method is evaluated by comparing it with existing ML/DL methods using four types of real tabular data, i.e. a binary response data with high dimensional features, over-dispersed count data, high-dimension survival data, and time-series data with substantial variability. The experimental results show that the proposed methods overall outperform existing ML/DL models. In particular, the NB-CNN achieves lower root mean squared error (RMSE) and higher coefficient of determination (R2) on over-dispersed count data than tree-based methods. Similarly, the Cox-based CNN Self-Attention model yields higher C-index values for high-dimensional survival tasks relative to state-of-the-art approaches.
KW - CNN
KW - deep learning
KW - DNN
KW - high-dimensional survival data
KW - machine learning
UR - https://www.scopus.com/pages/publications/105022836732
U2 - 10.1109/ACCESS.2025.3635724
DO - 10.1109/ACCESS.2025.3635724
M3 - Article
AN - SCOPUS:105022836732
SN - 2169-3536
VL - 13
SP - 200537
EP - 200554
JO - IEEE Access
JF - IEEE Access
ER -