From Data Science to Modular Workflows Changing Perspectives from Data to Platform: DBDIrl 1864-1922 Case Study

Enda O’Shea, Marco Krumrey, Daniel Sami Mitwalli, Sebastian Teumert, Tiziana Margaria

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Many historical data collections foot on handwritten documents and registers, whose consultation is often very difficult due to the conservation state of the physical artefacts, and whose comprehension is also made difficult by the handwriting, difficult to interpret, and the language used, different from the modern terminology. Therefore significant research efforts by historians, demographers, population health scientists and others have been started in the past with the aim of making such data collections digitally available, first on the basis of images and then as readily available repositories of transcribed data in electronically queryable formats. For the purpose of extracting data from the Irish Civil registers of deaths in the DBDIrl 1864-1922 project (https://www.dbdirl.com), an AI-ML Data Analytics Pipeline was proposed as a working approach validated on a subset of the data. However, the pipeline requires manual steps and it is not applicable as is on similar datasets without significant modifications to its inner workings. We are currently transforming this prototyped, single purpose product to a modular, fully automated workflow, intended to be used and reconfigured for new data in a low-code/no-code fashion by domain experts like historians. We explain our adopted analysis and refactoring process, illustrate it on part of the pipeline, including how we faced obstacles and handled pitfalls. We also evaluate its potential to become a methodical approach to transforming an interactive program to a fully automated process, in a low-code/no-code workflow style, that can be easily reused, reconfigured and extended to be able to tailor it to other datasets as needed.

Original languageEnglish
Title of host publicationBridging the Gap Between AI and Reality - 1st International Conference, AISoLA 2023, Selected Papers
EditorsBernhard Steffen
PublisherSpringer Science and Business Media Deutschland GmbH
Pages84-103
Number of pages20
ISBN (Print)9783031737404
DOIs
Publication statusPublished - 2025
Event1st International Symposium on Leveraging Applications of Formal Methods, AISoLA 2023 - Crete, Greece
Duration: 23 Oct 202328 Oct 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14129 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st International Symposium on Leveraging Applications of Formal Methods, AISoLA 2023
Country/TerritoryGreece
CityCrete
Period23/10/2328/10/23

Keywords

  • Data science
  • Digital Thread
  • DIME
  • Historical data
  • Low-code/No-code
  • Model driven development

Fingerprint

Dive into the research topics of 'From Data Science to Modular Workflows Changing Perspectives from Data to Platform: DBDIrl 1864-1922 Case Study'. Together they form a unique fingerprint.

Cite this