TY - GEN
T1 - LC/NC Pipeline for Training and Operationalising Segmentation Models in a Data Scarce Domain
T2 - 2nd International Conference on Bridging the Gap Between AI and Reality, AISoLA 2024
AU - Brandon, Colm
AU - Fennell, Éanna
AU - Singh, Amandeep
AU - Margaria, Tiziana
N1 - Publisher Copyright:
© The Author(s) 2026.
PY - 2026
Y1 - 2026
N2 - Here we present a new approach to training and operationalizing segmentation models for de-arraying Tissue Micro Arrays (TMAs). The scarcity of large, high-quality datasets in sensitive domains such as human tissue samples, coupled with strict privacy regulations to protect doner interests, poses significant obstacles to training robust and generalised segmentation models. To address these challenges, we introduce a new Low-Code/No-Code (LCNC) Domain-Specific Language (DSL) integrated into the Cinco de Bio (CdB) platform. The DSL consists of multiple Service-Independent Building Blocks (SIBs), each providing a distinct functionality essential to creating a pipeline. LCNC enables biologists to train and deploy de-arraying models without writing code. Our methodology incorporates a domain-specific data augmentation technique that generates pseudo-synthetic samples from a minimal set of real data. It also leverages AutoML techniques, including Neural Architecture Search (NAS) and hyperparameter optimisation, to automate the model development process. Furthermore, we present an architectural update to the Cinco de Bio platform, adopting a “Model as Data” paradigm that treats neural network models as dynamic, versioned data assets that can be used as inputs to SIBs. This work provides a practical solution to the challenges of distribution shift and data scarcity in sensitive health domains, where building sufficiently sized datasets to train generalise robust models is infeasible. The proposed LCNC DSL and accompanying pipeline enables domain experts to effectively leverage Artificial Intelligence (AI) technologies and tailor them to their own data.
AB - Here we present a new approach to training and operationalizing segmentation models for de-arraying Tissue Micro Arrays (TMAs). The scarcity of large, high-quality datasets in sensitive domains such as human tissue samples, coupled with strict privacy regulations to protect doner interests, poses significant obstacles to training robust and generalised segmentation models. To address these challenges, we introduce a new Low-Code/No-Code (LCNC) Domain-Specific Language (DSL) integrated into the Cinco de Bio (CdB) platform. The DSL consists of multiple Service-Independent Building Blocks (SIBs), each providing a distinct functionality essential to creating a pipeline. LCNC enables biologists to train and deploy de-arraying models without writing code. Our methodology incorporates a domain-specific data augmentation technique that generates pseudo-synthetic samples from a minimal set of real data. It also leverages AutoML techniques, including Neural Architecture Search (NAS) and hyperparameter optimisation, to automate the model development process. Furthermore, we present an architectural update to the Cinco de Bio platform, adopting a “Model as Data” paradigm that treats neural network models as dynamic, versioned data assets that can be used as inputs to SIBs. This work provides a practical solution to the challenges of distribution shift and data scarcity in sensitive health domains, where building sufficiently sized datasets to train generalise robust models is infeasible. The proposed LCNC DSL and accompanying pipeline enables domain experts to effectively leverage Artificial Intelligence (AI) technologies and tailor them to their own data.
KW - Artificial Intelligence
KW - AutoML
KW - Health Informatics
KW - Low-Code/No-Code
KW - Model Driven Development
UR - https://www.scopus.com/pages/publications/105019534152
U2 - 10.1007/978-3-032-01377-4_5
DO - 10.1007/978-3-032-01377-4_5
M3 - Conference contribution
AN - SCOPUS:105019534152
SN - 9783032013767
T3 - Lecture Notes in Computer Science
SP - 104
EP - 121
BT - Bridging the Gap Between AI and Reality - 2nd International Conference, AISoLA 2024, Selected Papers
A2 - Steffen, Bernhard
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 30 October 2024 through 3 November 2024
ER -