TY - GEN
T1 - Econometric genetic programming outperforms traditional econometric algorithms for regression tasks
AU - Novaes, André Luiz Farias
AU - Tanscheit, Ricardo
AU - Dias, Douglas Mota
PY - 2017/7/15
Y1 - 2017/7/15
N2 - Econometric Genetic Programming (EGP) evolves multiple linear regressions through Genetic Programming (GP), which is responsible for model selection, aiming to generate high accuracy regressions with potential interpretability of parameters. It uses statistical significance as a feature selection tool, directly and efficiently identifying introns and controlling bloat. In this paper, EGP is tested against traditional feature-selection econometric algorithms in regression tasks - namely Partial Least Squares Regression, Ridge Regression and Stepwise Forward Regression - outperforming them in all three datasets. The way EGP explores search space of possible regressors and models is crucial for its results. EGP is carefully constructed considering econometric theory on cross-sectional datasets, giving rigorous treatment on topics like homoscedasticity and heteroscedasticity, statistical inference for estimated parameters and sampling criteria. It also benefits by the mathematical proof on accuracy and statistical significance: Accuracy will only increase if the regressor presents a test's statistics module in a two-sided hypothesis testing higher than a predefined value.
AB - Econometric Genetic Programming (EGP) evolves multiple linear regressions through Genetic Programming (GP), which is responsible for model selection, aiming to generate high accuracy regressions with potential interpretability of parameters. It uses statistical significance as a feature selection tool, directly and efficiently identifying introns and controlling bloat. In this paper, EGP is tested against traditional feature-selection econometric algorithms in regression tasks - namely Partial Least Squares Regression, Ridge Regression and Stepwise Forward Regression - outperforming them in all three datasets. The way EGP explores search space of possible regressors and models is crucial for its results. EGP is carefully constructed considering econometric theory on cross-sectional datasets, giving rigorous treatment on topics like homoscedasticity and heteroscedasticity, statistical inference for estimated parameters and sampling criteria. It also benefits by the mathematical proof on accuracy and statistical significance: Accuracy will only increase if the regressor presents a test's statistics module in a two-sided hypothesis testing higher than a predefined value.
KW - Feature Selection
KW - Genetic Programming
KW - Model Selection
KW - Multiple Regression
UR - https://www.scopus.com/pages/publications/85026856243
U2 - 10.1145/3067695.30825060
DO - 10.1145/3067695.30825060
M3 - Conference contribution
AN - SCOPUS:85026856243
T3 - GECCO 2017 - Proceedings of the Genetic and Evolutionary Computation Conference Companion
SP - 1427
EP - 1430
BT - GECCO 2017 - Proceedings of the Genetic and Evolutionary Computation Conference Companion
PB - Association for Computing Machinery, Inc
T2 - 2017 Genetic and Evolutionary Computation Conference Companion, GECCO 2017
Y2 - 15 July 2017 through 19 July 2017
ER -