TY - GEN
T1 - A hybrid approach to the problem of class imbalance
AU - Fitzgerald, Jeannie
AU - Ryan, Conor
PY - 2013
Y1 - 2013
N2 - In Machine Learning classification tasks, the class imbalance problem is an important one which has received a lot of attention in the last few years. In binary classification, class imbalance occurs when there are significantly fewer examples of one class than the other. A variety of strategies have been applied to the problem with varying degrees of success. Typically previous approaches have involved attacking the problem either algorithmically or by manipulating the data in order to mitigate the imbalance. We propose a hybrid approach which combines Individualised Random Sampling(IRS) with two different fitness functions designed to improve performance on imbalanced classification problems in Genetic Programming. We investigate the efficacy of the proposed methods together with that of five different algorithmic GP solutions, two of which are taken from the recent literature. We conclude that the IRS approach combined with either average accuracy or Matthews Correlation Coefficient, delivers superior results in terms of AUC score when applied to either balanced or imbalanced datasets.
AB - In Machine Learning classification tasks, the class imbalance problem is an important one which has received a lot of attention in the last few years. In binary classification, class imbalance occurs when there are significantly fewer examples of one class than the other. A variety of strategies have been applied to the problem with varying degrees of success. Typically previous approaches have involved attacking the problem either algorithmically or by manipulating the data in order to mitigate the imbalance. We propose a hybrid approach which combines Individualised Random Sampling(IRS) with two different fitness functions designed to improve performance on imbalanced classification problems in Genetic Programming. We investigate the efficacy of the proposed methods together with that of five different algorithmic GP solutions, two of which are taken from the recent literature. We conclude that the IRS approach combined with either average accuracy or Matthews Correlation Coefficient, delivers superior results in terms of AUC score when applied to either balanced or imbalanced datasets.
KW - Binary classification
KW - Class imbalance problem
KW - Genetic programming
KW - Over sampling
KW - Under sam- pling
UR - http://www.scopus.com/inward/record.url?scp=84905722083&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84905722083
SN - 9788021447554
T3 - Mendel
SP - 129
EP - 136
BT - MENDEL 2013 - 19th International Conference on Soft Computing
PB - Brno University of Technology
T2 - 19th International Conference on Soft Computing: Evolutionary Computation, Genetic Programming, Swarm Intelligence, Fuzzy Logic, Neural Networks, Fractals, Bayesian Methods, MENDEL 2013
Y2 - 26 June 2013 through 28 June 2013
ER -