TY - GEN
T1 - From Data Science to Modular Workflows Changing Perspectives from Data to Platform
T2 - 1st International Symposium on Leveraging Applications of Formal Methods, AISoLA 2023
AU - O’Shea, Enda
AU - Krumrey, Marco
AU - Mitwalli, Daniel Sami
AU - Teumert, Sebastian
AU - Margaria, Tiziana
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025
Y1 - 2025
N2 - Many historical data collections foot on handwritten documents and registers, whose consultation is often very difficult due to the conservation state of the physical artefacts, and whose comprehension is also made difficult by the handwriting, difficult to interpret, and the language used, different from the modern terminology. Therefore significant research efforts by historians, demographers, population health scientists and others have been started in the past with the aim of making such data collections digitally available, first on the basis of images and then as readily available repositories of transcribed data in electronically queryable formats. For the purpose of extracting data from the Irish Civil registers of deaths in the DBDIrl 1864-1922 project (https://www.dbdirl.com), an AI-ML Data Analytics Pipeline was proposed as a working approach validated on a subset of the data. However, the pipeline requires manual steps and it is not applicable as is on similar datasets without significant modifications to its inner workings. We are currently transforming this prototyped, single purpose product to a modular, fully automated workflow, intended to be used and reconfigured for new data in a low-code/no-code fashion by domain experts like historians. We explain our adopted analysis and refactoring process, illustrate it on part of the pipeline, including how we faced obstacles and handled pitfalls. We also evaluate its potential to become a methodical approach to transforming an interactive program to a fully automated process, in a low-code/no-code workflow style, that can be easily reused, reconfigured and extended to be able to tailor it to other datasets as needed.
AB - Many historical data collections foot on handwritten documents and registers, whose consultation is often very difficult due to the conservation state of the physical artefacts, and whose comprehension is also made difficult by the handwriting, difficult to interpret, and the language used, different from the modern terminology. Therefore significant research efforts by historians, demographers, population health scientists and others have been started in the past with the aim of making such data collections digitally available, first on the basis of images and then as readily available repositories of transcribed data in electronically queryable formats. For the purpose of extracting data from the Irish Civil registers of deaths in the DBDIrl 1864-1922 project (https://www.dbdirl.com), an AI-ML Data Analytics Pipeline was proposed as a working approach validated on a subset of the data. However, the pipeline requires manual steps and it is not applicable as is on similar datasets without significant modifications to its inner workings. We are currently transforming this prototyped, single purpose product to a modular, fully automated workflow, intended to be used and reconfigured for new data in a low-code/no-code fashion by domain experts like historians. We explain our adopted analysis and refactoring process, illustrate it on part of the pipeline, including how we faced obstacles and handled pitfalls. We also evaluate its potential to become a methodical approach to transforming an interactive program to a fully automated process, in a low-code/no-code workflow style, that can be easily reused, reconfigured and extended to be able to tailor it to other datasets as needed.
KW - Data science
KW - Digital Thread
KW - DIME
KW - Historical data
KW - Low-code/No-code
KW - Model driven development
UR - http://www.scopus.com/inward/record.url?scp=85208648711&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-73741-1_6
DO - 10.1007/978-3-031-73741-1_6
M3 - Conference contribution
AN - SCOPUS:85208648711
SN - 9783031737404
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 84
EP - 103
BT - Bridging the Gap Between AI and Reality - 1st International Conference, AISoLA 2023, Selected Papers
A2 - Steffen, Bernhard
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 23 October 2023 through 28 October 2023
ER -