TY - JOUR
T1 - Supporting Safety Analysis of Image-processing DNNs through Clustering-based Approaches
AU - Attaoui, Mohammed Oualid
AU - Fahmy, Hazem
AU - Pastore, Fabrizio
AU - Briand, Lionel
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/6/3
Y1 - 2024/6/3
N2 - The adoption of deep neural networks (DNNs) in safety-critical contexts is often prevented by the lack of effective means to explain their results, especially when they are erroneous. In our previous work, we proposed a white-box approach (HUDD) and a black-box approach (SAFE) to automatically characterize DNN failures. They both identify clusters of similar images from a potentially large set of images leading to DNN failures. However, the analysis pipelines for HUDD and SAFE were instantiated in specific ways according to common practices, deferring the analysis of other pipelines to future work. In this article, we report on an empirical evaluation of 99 different pipelines for root cause analysis of DNN failures. They combine transfer learning, autoencoders, heatmaps of neuron relevance, dimensionality reduction techniques, and different clustering algorithms. Our results show that the best pipeline combines transfer learning, DBSCAN, and UMAP. It leads to clusters almost exclusively capturing images of the same failure scenario, thus facilitating root cause analysis. Further, it generates distinct clusters for each root cause of failure, thus enabling engineers to detect all the unsafe scenarios. Interestingly, these results hold even for failure scenarios that are only observed in a small percentage of the failing images.
AB - The adoption of deep neural networks (DNNs) in safety-critical contexts is often prevented by the lack of effective means to explain their results, especially when they are erroneous. In our previous work, we proposed a white-box approach (HUDD) and a black-box approach (SAFE) to automatically characterize DNN failures. They both identify clusters of similar images from a potentially large set of images leading to DNN failures. However, the analysis pipelines for HUDD and SAFE were instantiated in specific ways according to common practices, deferring the analysis of other pipelines to future work. In this article, we report on an empirical evaluation of 99 different pipelines for root cause analysis of DNN failures. They combine transfer learning, autoencoders, heatmaps of neuron relevance, dimensionality reduction techniques, and different clustering algorithms. Our results show that the best pipeline combines transfer learning, DBSCAN, and UMAP. It leads to clusters almost exclusively capturing images of the same failure scenario, thus facilitating root cause analysis. Further, it generates distinct clusters for each root cause of failure, thus enabling engineers to detect all the unsafe scenarios. Interestingly, these results hold even for failure scenarios that are only observed in a small percentage of the failing images.
KW - clustering
KW - DNN debugging
KW - DNN explanation
KW - DNN functional safety analysis
KW - transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85195840166&partnerID=8YFLogxK
U2 - 10.1145/3643671
DO - 10.1145/3643671
M3 - Article
AN - SCOPUS:85195840166
SN - 1049-331X
VL - 33
JO - ACM Transactions on Software Engineering and Methodology
JF - ACM Transactions on Software Engineering and Methodology
IS - 5
M1 - 130
ER -