TY - JOUR
T1 - Automating mixture model fitting of task durations for process conformance checking
AU - Yang, Lingkai
AU - McClean, Sally
AU - Faddy, Malcolm
AU - Donnelly, Mark
AU - Khan, Kashaf
AU - Burke, Kevin
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/9
Y1 - 2025/9
N2 - Process task duration data often exhibit multiple peaks, indicating differences in, for example, customer ages and preferences, resource capabilities or the day/hour of a week. This heterogeneous data, which captures diverse customer patterns, should be represented using different models, resulting in an overall mixture model. This paper introduces gamma mixture models to represent various customer patterns in task duration data, with a focus on automating the fitting process. The approach involves a two-stage procedure: first, divide-and-conquer using peak-, equidistance- and cluster-based techniques to partition data, and automatically fit gamma distributions to each subset. The second stage then improves the fitted mixture model by directly searching the log-likelihood surface. The method is compared with the expectation–maximization (EM) algorithm and an open tool (HyperStar), using both artificially generated datasets and a publicly available hospital billing dataset, demonstrating its effectiveness and time efficiency in modelling heterogeneous process duration data. Furthermore, a case study on process conformance checking is conducted using the hospital billing dataset, highlighting a potential application area for the method in process mining.
AB - Process task duration data often exhibit multiple peaks, indicating differences in, for example, customer ages and preferences, resource capabilities or the day/hour of a week. This heterogeneous data, which captures diverse customer patterns, should be represented using different models, resulting in an overall mixture model. This paper introduces gamma mixture models to represent various customer patterns in task duration data, with a focus on automating the fitting process. The approach involves a two-stage procedure: first, divide-and-conquer using peak-, equidistance- and cluster-based techniques to partition data, and automatically fit gamma distributions to each subset. The second stage then improves the fitted mixture model by directly searching the log-likelihood surface. The method is compared with the expectation–maximization (EM) algorithm and an open tool (HyperStar), using both artificially generated datasets and a publicly available hospital billing dataset, demonstrating its effectiveness and time efficiency in modelling heterogeneous process duration data. Furthermore, a case study on process conformance checking is conducted using the hospital billing dataset, highlighting a potential application area for the method in process mining.
KW - Divide-and-conquer fitting
KW - Gamma mixture model
KW - Nelder-Mead optimisation
KW - Process conformance checking
KW - Process duration modelling
KW - Process mining
UR - https://www.scopus.com/pages/publications/105011251381
U2 - 10.1007/s10618-025-01131-5
DO - 10.1007/s10618-025-01131-5
M3 - Article
AN - SCOPUS:105011251381
SN - 1384-5810
VL - 39
JO - Data Mining and Knowledge Discovery
JF - Data Mining and Knowledge Discovery
IS - 5
M1 - 53
ER -