Skip to main navigation Skip to search Skip to main content

A Symbolic Regression Screening Approach Within Peptide Optimisation

  • Aidan Murphy
  • , Mark Kocherovsky
  • , Nir Dayan
  • , Ilya Miralavy
  • , Assaf Gilad
  • , Wolfgang Banzhaf

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

The Protein Optimization Evolving Tool is a genetic programming based peptide generation tool which has successfully created novel peptides with improved performance for MRI imaging. However, like all supervised machine learning techniques, it may overfit to its library of training peptides and create peptides which do not improve functionality. To overcome this problem we create symbolic regression models to act as another predictor of peptide function. We create a set of 76 features of physicochemical, theoretical and composite properties for each peptide and evolve the models using Grammatical Evolution on two datasets, one containing 74 peptides and the other 100 peptides. Models trained using these 76 features can successfully predict peptide functionality with a median MSE of 0.427 on the first dataset and 0.179 on the larger dataset, achieving state of the art results on both. We next investigate if a reduced set of 8 real-world features, which could result in more interpretable models, can accurately predict protein functionality. The models created on this reduced set were outperformed by model with used the full set on features on the first dataset but were statistically equivalent on the second dataset. Finally, we down sample the data at 10%, 33% and 50% to evaluate the robustness of this approach. Our results show that models trained on as little as 7 peptides can be used as an additional measure of functionality within the Protein Optimization Evolving Tool.

Original languageEnglish
Title of host publication EvoApplications 2025
Pages492-506
Number of pages15
DOIs
Publication statusPublished - 2025

Cite this