TY - GEN
T1 - Efficient approaches to interleaved sampling of training data for symbolic regression
AU - Azad, R. Muhammad Atif
AU - Medernach, David
AU - Ryan, Conor
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/10/12
Y1 - 2014/10/12
N2 - The ability to generalize beyond the training set is paramount for any machine learning algorithm and Genetic Programming (GP) is no exception. This paper investigates a recently proposed technique to improve generalisation in GP, termed Interleaved Sampling where GP alternates between using the entire data set and only a single data point in alternate generations. This paper proposes two alternatives to using a single data point: the use of random search instead of a single data point, and simply minimising the tree size. Both the approaches are more efficient than the original Interleaved Sampling because they simply do not evaluate the fitness in half the number of generations. The results show that in terms of generalisation, random search and size minimisation are as effective as the original Interleaved Sampling; however, they are computationally more efficient in terms of data processing. Size minimisation is particularly interesting because it completely prevents bloat while still being competitive in terms of training results as well as generalisation. The tree sizes with size minimisation are substantially smaller reducing the computational expense substantially.
AB - The ability to generalize beyond the training set is paramount for any machine learning algorithm and Genetic Programming (GP) is no exception. This paper investigates a recently proposed technique to improve generalisation in GP, termed Interleaved Sampling where GP alternates between using the entire data set and only a single data point in alternate generations. This paper proposes two alternatives to using a single data point: the use of random search instead of a single data point, and simply minimising the tree size. Both the approaches are more efficient than the original Interleaved Sampling because they simply do not evaluate the fitness in half the number of generations. The results show that in terms of generalisation, random search and size minimisation are as effective as the original Interleaved Sampling; however, they are computationally more efficient in terms of data processing. Size minimisation is particularly interesting because it completely prevents bloat while still being competitive in terms of training results as well as generalisation. The tree sizes with size minimisation are substantially smaller reducing the computational expense substantially.
KW - Genetic Programming
KW - optimisation
KW - over fitting
UR - http://www.scopus.com/inward/record.url?scp=84911918221&partnerID=8YFLogxK
U2 - 10.1109/NaBIC.2014.6921874
DO - 10.1109/NaBIC.2014.6921874
M3 - Conference contribution
AN - SCOPUS:84911918221
T3 - 2014 6th World Congress on Nature and Biologically Inspired Computing, NaBIC 2014
SP - 176
EP - 183
BT - 2014 6th World Congress on Nature and Biologically Inspired Computing, NaBIC 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 6th World Congress on Nature and Biologically Inspired Computing, NaBIC 2014
Y2 - 30 July 2014 through 1 August 2014
ER -