TY - GEN
T1 - Automated grammar-based feature selection in symbolic regression
AU - Ali, Muhammad Sarmad
AU - Kshirsagar, Meghana
AU - Naredo, Enrique
AU - Ryan, Conor
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/7/8
Y1 - 2022/7/8
N2 - With the growing popularity of machine learning (ML), regression problems in many domains are becoming increasingly high-dimensional. Identifying relevant features from a high-dimensional dataset still remains a significant challenge for building highly accurate machine learning models. Evolutionary feature selection has been used for high-dimensional symbolic regression using Genetic Programming (GP). While grammar based GP, especially Grammatical Evolution (GE), has been extensively used for symbolic regression, no systematic grammar-based feature selection approach exists. This work presents a grammar-based feature selection method, Production Ranking based Feature Selection (PRFS), and reports on the results of its application in symbolic regression. The main contribution of our work is to demonstrate that the proposed method can not only consistently select the most relevant features, but also significantly improves the generalization performance of GE when compared with several state-of-the-art ML-based feature selection methods. Experimental results on benchmark symbolic regression problems show that the generalization performance of GE using PRFS was significantly better than that of a state-of-the-art Random Forest based feature selection in three out of four problems, while in fourth problem the performance was the same.
AB - With the growing popularity of machine learning (ML), regression problems in many domains are becoming increasingly high-dimensional. Identifying relevant features from a high-dimensional dataset still remains a significant challenge for building highly accurate machine learning models. Evolutionary feature selection has been used for high-dimensional symbolic regression using Genetic Programming (GP). While grammar based GP, especially Grammatical Evolution (GE), has been extensively used for symbolic regression, no systematic grammar-based feature selection approach exists. This work presents a grammar-based feature selection method, Production Ranking based Feature Selection (PRFS), and reports on the results of its application in symbolic regression. The main contribution of our work is to demonstrate that the proposed method can not only consistently select the most relevant features, but also significantly improves the generalization performance of GE when compared with several state-of-the-art ML-based feature selection methods. Experimental results on benchmark symbolic regression problems show that the generalization performance of GE using PRFS was significantly better than that of a state-of-the-art Random Forest based feature selection in three out of four problems, while in fourth problem the performance was the same.
KW - feature selection
KW - grammar pruning
KW - grammatical evolution
KW - production ranking
KW - symbolic regression
UR - http://www.scopus.com/inward/record.url?scp=85135236045&partnerID=8YFLogxK
U2 - 10.1145/3512290.3528852
DO - 10.1145/3512290.3528852
M3 - Conference contribution
AN - SCOPUS:85135236045
T3 - GECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference
SP - 902
EP - 910
BT - GECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference
PB - Association for Computing Machinery, Inc
T2 - 2022 Genetic and Evolutionary Computation Conference, GECCO 2022
Y2 - 9 July 2022 through 13 July 2022
ER -