LC/NC Pipeline for Training and Operationalising Segmentation Models in a Data Scarce Domain: De-arraying Tissue MicroArrays

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Here we present a new approach to training and operationalizing segmentation models for de-arraying Tissue Micro Arrays (TMAs). The scarcity of large, high-quality datasets in sensitive domains such as human tissue samples, coupled with strict privacy regulations to protect doner interests, poses significant obstacles to training robust and generalised segmentation models. To address these challenges, we introduce a new Low-Code/No-Code (LCNC) Domain-Specific Language (DSL) integrated into the Cinco de Bio (CdB) platform. The DSL consists of multiple Service-Independent Building Blocks (SIBs), each providing a distinct functionality essential to creating a pipeline. LCNC enables biologists to train and deploy de-arraying models without writing code. Our methodology incorporates a domain-specific data augmentation technique that generates pseudo-synthetic samples from a minimal set of real data. It also leverages AutoML techniques, including Neural Architecture Search (NAS) and hyperparameter optimisation, to automate the model development process. Furthermore, we present an architectural update to the Cinco de Bio platform, adopting a “Model as Data” paradigm that treats neural network models as dynamic, versioned data assets that can be used as inputs to SIBs. This work provides a practical solution to the challenges of distribution shift and data scarcity in sensitive health domains, where building sufficiently sized datasets to train generalise robust models is infeasible. The proposed LCNC DSL and accompanying pipeline enables domain experts to effectively leverage Artificial Intelligence (AI) technologies and tailor them to their own data.

Original languageEnglish
Title of host publicationBridging the Gap Between AI and Reality - 2nd International Conference, AISoLA 2024, Selected Papers
EditorsBernhard Steffen
PublisherSpringer Science and Business Media Deutschland GmbH
Pages104-121
Number of pages18
ISBN (Print)9783032013767
DOIs
Publication statusPublished - 2026
Event2nd International Conference on Bridging the Gap Between AI and Reality, AISoLA 2024 - Crete, Greece
Duration: 30 Oct 20243 Nov 2024

Publication series

NameLecture Notes in Computer Science
Volume16032 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Conference on Bridging the Gap Between AI and Reality, AISoLA 2024
Country/TerritoryGreece
CityCrete
Period30/10/243/11/24

Keywords

  • Artificial Intelligence
  • AutoML
  • Health Informatics
  • Low-Code/No-Code
  • Model Driven Development

Fingerprint

Dive into the research topics of 'LC/NC Pipeline for Training and Operationalising Segmentation Models in a Data Scarce Domain: De-arraying Tissue MicroArrays'. Together they form a unique fingerprint.

Cite this