Abstract

With the growing popularity of machine learning (ML), regression problems in many domains are becoming increasingly high-dimensional. Identifying relevant features from a high-dimensional dataset still remains a significant challenge for building highly accurate machine learning models. Evolutionary feature selection has been used for high-dimensional symbolic regression using Genetic Programming (GP). While grammar based GP, especially Grammatical Evolution (GE), has been extensively used for symbolic regression, no systematic grammar-based feature selection approach exists. This work presents a grammar-based feature selection method, Production Ranking based Feature Selection (PRFS), and reports on the results of its application in symbolic regression. The main contribution of our work is to demonstrate that the proposed method can not only consistently select the most relevant features, but also significantly improves the generalization performance of GE when compared with several state-of-the-art ML-based feature selection methods. Experimental results on benchmark symbolic regression problems show that the generalization performance of GE using PRFS was significantly better than that of a state-of-the-art Random Forest based feature selection in three out of four problems, while in fourth problem the performance was the same.

Original languageEnglish
Title of host publicationGECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference
PublisherAssociation for Computing Machinery, Inc
Pages902-910
Number of pages9
ISBN (Electronic)9781450392372
DOIs
Publication statusPublished - 8 Jul 2022
Event2022 Genetic and Evolutionary Computation Conference, GECCO 2022 - Virtual, Online, United States
Duration: 9 Jul 202213 Jul 2022

Publication series

NameGECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference

Conference

Conference2022 Genetic and Evolutionary Computation Conference, GECCO 2022
Country/TerritoryUnited States
CityVirtual, Online
Period9/07/2213/07/22

Keywords

  • feature selection
  • grammar pruning
  • grammatical evolution
  • production ranking
  • symbolic regression

Fingerprint

Dive into the research topics of 'Automated grammar-based feature selection in symbolic regression'. Together they form a unique fingerprint.

Cite this