TY - GEN
T1 - On size, complexity and generalisation error in GP
AU - Fitzgerald, Jeannie
AU - Ryan, Conor
PY - 2014
Y1 - 2014
N2 - For some time, Genetic Programming research has lagged behind the wider Machine Learning community in the study of generalisation, where the decomposition of generalisation error into bias and variance components is well understood. However, recent Genetic Programming contributions focusing on complexity, size and bloat as they relate to over-fitting have opened up some interesting avenues of research. In this paper, we carry out a simple empirical study on five binary classification problems. The study is designed to discover what effects may be observed when program size and complexity are varied in combination, with the objective of gaining a better understanding of relationships which may exist between solution size, operator complexity and variance error. The results of the study indicate that the simplest configuration, in terms of operator complexity, consistently results in the best average performance, and in many cases, the result is significantly better. We further demonstrate that the best results are achieved when this minimum complexity set-up is combined with a less than parsimonious permissible size.
AB - For some time, Genetic Programming research has lagged behind the wider Machine Learning community in the study of generalisation, where the decomposition of generalisation error into bias and variance components is well understood. However, recent Genetic Programming contributions focusing on complexity, size and bloat as they relate to over-fitting have opened up some interesting avenues of research. In this paper, we carry out a simple empirical study on five binary classification problems. The study is designed to discover what effects may be observed when program size and complexity are varied in combination, with the objective of gaining a better understanding of relationships which may exist between solution size, operator complexity and variance error. The results of the study indicate that the simplest configuration, in terms of operator complexity, consistently results in the best average performance, and in many cases, the result is significantly better. We further demonstrate that the best results are achieved when this minimum complexity set-up is combined with a less than parsimonious permissible size.
KW - Classification
KW - Generalisation
KW - Genetic programming
UR - http://www.scopus.com/inward/record.url?scp=84905717310&partnerID=8YFLogxK
U2 - 10.1145/2576768.2598346
DO - 10.1145/2576768.2598346
M3 - Conference contribution
AN - SCOPUS:84905717310
SN - 9781450326629
T3 - GECCO 2014 - Proceedings of the 2014 Genetic and Evolutionary Computation Conference
SP - 903
EP - 910
BT - GECCO 2014 - Proceedings of the 2014 Genetic and Evolutionary Computation Conference
PB - Association for Computing Machinery
T2 - 16th Genetic and Evolutionary Computation Conference, GECCO 2014
Y2 - 12 July 2014 through 16 July 2014
ER -