TY - GEN
T1 - A bootstrapping approach to reduce over-fitting in genetic programming
AU - Fitzgerald, Jeannie
AU - Muhammad Atif Azad, R.
AU - Ryan, Conor
PY - 2013
Y1 - 2013
N2 - Historically, the quality of a solution in Genetic Programming (GP) was often assessed based on its performance on a given training sample. However, in Machine Learning, we are more interested in achieving reliable estimates of the quality of the evolving individuals on unseen data. In this paper, we propose to simulate the effect of unseen data during training without actually using any additional data. We do this by employing a technique called bootstrapping that repeatedly re-samples with replacement from the training data and helps estimate sensitivity of the individual in question to small variations across these re-sampled data sets. We minimise this sensitivity, as measured by the Bootstrap Standard Error, together with the training error, in an effort to evolve models that generalise better to the unseen data. We evaluate the proposed technique on four binary classification problems and compare with a standard GP approach. The results show that for the problems undertaken, the proposed method not only generalises significantly better than standard GP while the training performance improves, but also demonstrates a strong side effect of containing the tree sizes.
AB - Historically, the quality of a solution in Genetic Programming (GP) was often assessed based on its performance on a given training sample. However, in Machine Learning, we are more interested in achieving reliable estimates of the quality of the evolving individuals on unseen data. In this paper, we propose to simulate the effect of unseen data during training without actually using any additional data. We do this by employing a technique called bootstrapping that repeatedly re-samples with replacement from the training data and helps estimate sensitivity of the individual in question to small variations across these re-sampled data sets. We minimise this sensitivity, as measured by the Bootstrap Standard Error, together with the training error, in an effort to evolve models that generalise better to the unseen data. We evaluate the proposed technique on four binary classification problems and compare with a standard GP approach. The results show that for the problems undertaken, the proposed method not only generalises significantly better than standard GP while the training performance improves, but also demonstrates a strong side effect of containing the tree sizes.
KW - Binary classification
KW - Generalisation
KW - Genetic programming
UR - http://www.scopus.com/inward/record.url?scp=84882379858&partnerID=8YFLogxK
U2 - 10.1145/2464576.2482690
DO - 10.1145/2464576.2482690
M3 - Conference contribution
AN - SCOPUS:84882379858
SN - 9781450319645
T3 - GECCO 2013 - Proceedings of the 2013 Genetic and Evolutionary Computation Conference Companion
SP - 1113
EP - 1120
BT - GECCO 2013 - Proceedings of the 2013 Genetic and Evolutionary Computation Conference Companion
T2 - 15th Annual Conference on Genetic and Evolutionary Computation, GECCO 2013
Y2 - 6 July 2013 through 10 July 2013
ER -