TY - CHAP
T1 - A Symbolic Regression Screening Approach Within Peptide Optimisation
AU - Murphy, Aidan
AU - Kocherovsky, Mark
AU - Dayan, Nir
AU - Miralavy, Ilya
AU - Gilad, Assaf
AU - Banzhaf, Wolfgang
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - The Protein Optimization Evolving Tool is a genetic programming based peptide generation tool which has successfully created novel peptides with improved performance for MRI imaging. However, like all supervised machine learning techniques, it may overfit to its library of training peptides and create peptides which do not improve functionality. To overcome this problem we create symbolic regression models to act as another predictor of peptide function. We create a set of 76 features of physicochemical, theoretical and composite properties for each peptide and evolve the models using Grammatical Evolution on two datasets, one containing 74 peptides and the other 100 peptides. Models trained using these 76 features can successfully predict peptide functionality with a median MSE of 0.427 on the first dataset and 0.179 on the larger dataset, achieving state of the art results on both. We next investigate if a reduced set of 8 real-world features, which could result in more interpretable models, can accurately predict protein functionality. The models created on this reduced set were outperformed by model with used the full set on features on the first dataset but were statistically equivalent on the second dataset. Finally, we down sample the data at 10%, 33% and 50% to evaluate the robustness of this approach. Our results show that models trained on as little as 7 peptides can be used as an additional measure of functionality within the Protein Optimization Evolving Tool.
AB - The Protein Optimization Evolving Tool is a genetic programming based peptide generation tool which has successfully created novel peptides with improved performance for MRI imaging. However, like all supervised machine learning techniques, it may overfit to its library of training peptides and create peptides which do not improve functionality. To overcome this problem we create symbolic regression models to act as another predictor of peptide function. We create a set of 76 features of physicochemical, theoretical and composite properties for each peptide and evolve the models using Grammatical Evolution on two datasets, one containing 74 peptides and the other 100 peptides. Models trained using these 76 features can successfully predict peptide functionality with a median MSE of 0.427 on the first dataset and 0.179 on the larger dataset, achieving state of the art results on both. We next investigate if a reduced set of 8 real-world features, which could result in more interpretable models, can accurately predict protein functionality. The models created on this reduced set were outperformed by model with used the full set on features on the first dataset but were statistically equivalent on the second dataset. Finally, we down sample the data at 10%, 33% and 50% to evaluate the robustness of this approach. Our results show that models trained on as little as 7 peptides can be used as an additional measure of functionality within the Protein Optimization Evolving Tool.
UR - https://doi.org/10.1007/978-3-031-90065-5_30
U2 - 10.1007/978-3-031-90065-5_30
DO - 10.1007/978-3-031-90065-5_30
M3 - Chapter
SP - 492
EP - 506
BT - EvoApplications 2025
ER -