TY - GEN
T1 - Efficient interleaved sampling of training data in genetic programming
AU - Azad, R. Muhammad Atif
AU - Medernach, David
AU - Ryan, Conor
PY - 2014
Y1 - 2014
N2 - The ability to generalize beyond the training set is important for Genetic Programming (GP). Interleaved Sampling is a recently proposed approach to improve generalization in GP. In this technique, GP alternates between using the entire data set and only a single data point. Initial results showed that the technique not only produces solutions that generalize well, but that it so happens at a reduced computational expense as half the number of generations only evaluate a single data point. This paper further investigates the merit of interleaving the use of training set with two alternatives approaches. These are: the use of random search instead of a single data point, and simply minimising the tree size. Both of these alternatives are computationally even cheaper than the original setup as they simply do not invoke the fitness function half the time. We test the utility of these new methods on four, well cited, and high dimensional problems from the symbolic regression domain. The results show that the new approaches continue to produce general solutions despite taking only half the fitness evaluations. Size minimisation also prevents bloat while producing competitive results on both training and test data sets. The tree sizes with size minisation are substantially smaller than the rest of the setups, which further brings down the training costs.
AB - The ability to generalize beyond the training set is important for Genetic Programming (GP). Interleaved Sampling is a recently proposed approach to improve generalization in GP. In this technique, GP alternates between using the entire data set and only a single data point. Initial results showed that the technique not only produces solutions that generalize well, but that it so happens at a reduced computational expense as half the number of generations only evaluate a single data point. This paper further investigates the merit of interleaving the use of training set with two alternatives approaches. These are: the use of random search instead of a single data point, and simply minimising the tree size. Both of these alternatives are computationally even cheaper than the original setup as they simply do not invoke the fitness function half the time. We test the utility of these new methods on four, well cited, and high dimensional problems from the symbolic regression domain. The results show that the new approaches continue to produce general solutions despite taking only half the fitness evaluations. Size minimisation also prevents bloat while producing competitive results on both training and test data sets. The tree sizes with size minisation are substantially smaller than the rest of the setups, which further brings down the training costs.
KW - Computational efficiency
KW - Genetic Programming
KW - Interleaved Sampling
KW - Over-fitting
KW - Robustness of solutions
KW - Speedup technique
UR - http://www.scopus.com/inward/record.url?scp=84905666237&partnerID=8YFLogxK
U2 - 10.1145/2598394.2598480
DO - 10.1145/2598394.2598480
M3 - Conference contribution
AN - SCOPUS:84905666237
SN - 9781450328814
T3 - GECCO 2014 - Companion Publication of the 2014 Genetic and Evolutionary Computation Conference
SP - 127
EP - 128
BT - GECCO 2014 - Companion Publication of the 2014 Genetic and Evolutionary Computation Conference
PB - Association for Computing Machinery
T2 - 16th Genetic and Evolutionary Computation Conference Companion, GECCO 2014 Companion
Y2 - 12 July 2014 through 16 July 2014
ER -