Efficient approaches to interleaved sampling of training data for symbolic regression

R. Muhammad Atif Azad, David Medernach, Conor Ryan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The ability to generalize beyond the training set is paramount for any machine learning algorithm and Genetic Programming (GP) is no exception. This paper investigates a recently proposed technique to improve generalisation in GP, termed Interleaved Sampling where GP alternates between using the entire data set and only a single data point in alternate generations. This paper proposes two alternatives to using a single data point: the use of random search instead of a single data point, and simply minimising the tree size. Both the approaches are more efficient than the original Interleaved Sampling because they simply do not evaluate the fitness in half the number of generations. The results show that in terms of generalisation, random search and size minimisation are as effective as the original Interleaved Sampling; however, they are computationally more efficient in terms of data processing. Size minimisation is particularly interesting because it completely prevents bloat while still being competitive in terms of training results as well as generalisation. The tree sizes with size minimisation are substantially smaller reducing the computational expense substantially.

Original languageEnglish
Title of host publication2014 6th World Congress on Nature and Biologically Inspired Computing, NaBIC 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages176-183
Number of pages8
ISBN (Electronic)9781479959372
DOIs
Publication statusPublished - 12 Oct 2014
Event2014 6th World Congress on Nature and Biologically Inspired Computing, NaBIC 2014 - Porto, Portugal
Duration: 30 Jul 20141 Aug 2014

Publication series

Name2014 6th World Congress on Nature and Biologically Inspired Computing, NaBIC 2014

Conference

Conference2014 6th World Congress on Nature and Biologically Inspired Computing, NaBIC 2014
Country/TerritoryPortugal
CityPorto
Period30/07/141/08/14

Keywords

  • Genetic Programming
  • optimisation
  • over fitting

Fingerprint

Dive into the research topics of 'Efficient approaches to interleaved sampling of training data for symbolic regression'. Together they form a unique fingerprint.

Cite this