TY - JOUR
T1 - External Validation of a Hip-Worn Accelerometry-Based Machine Learning Model for Physical Behavior Classification in Free-Living Conditions
AU - the WEALTH Consortium
AU - Swenne, Annika
AU - Sigcha, Luis
AU - Hebestreit, Antje
AU - Bouchan, Jérôme
AU - Cimler, Richard
AU - Cardon, Greet
AU - Elavsky, Steriani
AU - Fezeu, Léopold K.
AU - Kühnová, Jitka
AU - Oppert, Jean Michel
AU - Vetrovsky, Tomas
AU - Donnelly, Alan
AU - Van de Ven, Pepijn
AU - Buck, Christoph
N1 - Publisher Copyright:
© 2025 Human Kinetics, Inc.
PY - 2025/1
Y1 - 2025/1
N2 - Background: Accurate classification of physical behavior from accelerometer data is crucial for health and behavioral research. While machine learning models often perform well within the populations they are trained on, they are rarely validated on independent populations, and their generalizability remains poorly understood. Therefore, we aimed to externally validate a widely used random forest model for physical behavior classification, and to assess whether its performance varied by participants’ age, sex, or body mass index. Methods: We validated the random forest classifier, trained by Ellis et al., which achieved a balanced accuracy of 79% for classifying sitting, standing, and walking/running from hip-worn accelerometer data in the original training population. For the external validation, we obtained ActiGraph recordings for 610 participants from four European countries from the WEALTH (WEarable sensor Assessment of physicaL and eaTing beHaviors) project, which were labeled with the corresponding free-living behavior using ecological momentary assessment. Classifier performance was assessed using confusion matrices, precision, recall, F-score, and balanced accuracy. Results: In the WEALTH population, the random forest classifier achieved a balanced accuracy of 40% and an average F-score of 0.33. Precision and recall were highest for sitting, followed by walking/running and standing. Performance was consistent across subpopulations defined by age, sex, and body mass index. Conclusion: The substantial reduction in accuracy demonstrates the limited generalizability of the existing random forest classifier. Our findings underscore the need for external validation and more diverse training data to ensure robust application of machine learning models in physical behavior research.
AB - Background: Accurate classification of physical behavior from accelerometer data is crucial for health and behavioral research. While machine learning models often perform well within the populations they are trained on, they are rarely validated on independent populations, and their generalizability remains poorly understood. Therefore, we aimed to externally validate a widely used random forest model for physical behavior classification, and to assess whether its performance varied by participants’ age, sex, or body mass index. Methods: We validated the random forest classifier, trained by Ellis et al., which achieved a balanced accuracy of 79% for classifying sitting, standing, and walking/running from hip-worn accelerometer data in the original training population. For the external validation, we obtained ActiGraph recordings for 610 participants from four European countries from the WEALTH (WEarable sensor Assessment of physicaL and eaTing beHaviors) project, which were labeled with the corresponding free-living behavior using ecological momentary assessment. Classifier performance was assessed using confusion matrices, precision, recall, F-score, and balanced accuracy. Results: In the WEALTH population, the random forest classifier achieved a balanced accuracy of 40% and an average F-score of 0.33. Precision and recall were highest for sitting, followed by walking/running and standing. Performance was consistent across subpopulations defined by age, sex, and body mass index. Conclusion: The substantial reduction in accuracy demonstrates the limited generalizability of the existing random forest classifier. Our findings underscore the need for external validation and more diverse training data to ensure robust application of machine learning models in physical behavior research.
KW - ActiGraph
KW - activity recognition
KW - free-living assessment
KW - model generalizability
KW - random forest
UR - https://www.scopus.com/pages/publications/105027529261
U2 - 10.1123/jmpb.2025-0030
DO - 10.1123/jmpb.2025-0030
M3 - Article
AN - SCOPUS:105027529261
SN - 2575-6605
VL - 8
SP - 1
EP - 9
JO - Journal for the Measurement of Physical Behaviour
JF - Journal for the Measurement of Physical Behaviour
IS - 1
M1 - jmpb.2025-0030
ER -