TY - GEN
T1 - Bootstrapping to reduce bloat and improve generalisation in genetic programming
AU - Fitzgerald, Jeannie
AU - Muhammad Atif Azad, R.
AU - Ryan, Conor
PY - 2013
Y1 - 2013
N2 - Typically, the quality of a solution in Genetic Programming (GP) is represented by a score on a given training sample. However, in Machine Learning, we are most interested in estimating the quality of the evolving individuals on unseen data. In this paper, we propose to simulate the effect of unseen data to direct training without actually using additional data, by employing a technique called bootstrapping that repeatedly re-samples with replacement from the training data and helps estimate sensitivity of the individual in question to small variations across these re-sampled data sets. We minimise this sensitivity, as measured by the Bootstrap Standard Error, alongside the training error, in a bid to evolve models that generalise better to the unseen data. We evaluate the proposed technique on four binary classification problems and compare with a standard GP approach. The results show that for the problems undertaken, the proposed method not only generalises significantly better than standard GP while the training performance improves, but also demonstrates a strong side effect of containing the tree sizes.
AB - Typically, the quality of a solution in Genetic Programming (GP) is represented by a score on a given training sample. However, in Machine Learning, we are most interested in estimating the quality of the evolving individuals on unseen data. In this paper, we propose to simulate the effect of unseen data to direct training without actually using additional data, by employing a technique called bootstrapping that repeatedly re-samples with replacement from the training data and helps estimate sensitivity of the individual in question to small variations across these re-sampled data sets. We minimise this sensitivity, as measured by the Bootstrap Standard Error, alongside the training error, in a bid to evolve models that generalise better to the unseen data. We evaluate the proposed technique on four binary classification problems and compare with a standard GP approach. The results show that for the problems undertaken, the proposed method not only generalises significantly better than standard GP while the training performance improves, but also demonstrates a strong side effect of containing the tree sizes.
KW - Binary classification
KW - Generalisation
KW - Genetic programming
UR - http://www.scopus.com/inward/record.url?scp=84882411135&partnerID=8YFLogxK
U2 - 10.1145/2464576.2464647
DO - 10.1145/2464576.2464647
M3 - Conference contribution
AN - SCOPUS:84882411135
SN - 9781450319645
T3 - GECCO 2013 - Proceedings of the 2013 Genetic and Evolutionary Computation Conference Companion
SP - 141
EP - 142
BT - GECCO 2013 - Proceedings of the 2013 Genetic and Evolutionary Computation Conference Companion
T2 - 15th Annual Conference on Genetic and Evolutionary Computation, GECCO 2013
Y2 - 6 July 2013 through 10 July 2013
ER -