TY - GEN
T1 - A Theoretical Framework for Understanding the Relationship Between Log Parsing and Anomaly Detection
AU - Shin, Donghwan
AU - Khan, Zanis Ali
AU - Bianculli, Domenico
AU - Briand, Lionel
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Log-based anomaly detection identifies systems’ anomalous behaviors by analyzing system runtime information recorded in logs. While many approaches have been proposed, all of them have in common an essential pre-processing step called log parsing. This step is needed because automated log analysis requires structured input logs, whereas original logs contain semi-structured text printed by logging statements. Log parsing bridges this gap by converting the original logs into structured input logs fit for anomaly detection. Despite the intrinsic dependency between log parsing and anomaly detection, no existing work has investigated the impact of the “quality” of log parsing results on anomaly detection. In particular, the concept of “ideal” log parsing results with respect to anomaly detection has not been formalized yet. This makes it difficult to determine, upon obtaining inaccurate results from anomaly detection, if (and why) the root cause for such results lies in the log parsing step. In this short paper, we lay the theoretical foundations for defining the concept of “ideal” log parsing results for anomaly detection. Based on these foundations, we discuss practical implications regarding the identification and localization of root causes, when dealing with inaccurate anomaly detection, and the identification of irrelevant log messages.
AB - Log-based anomaly detection identifies systems’ anomalous behaviors by analyzing system runtime information recorded in logs. While many approaches have been proposed, all of them have in common an essential pre-processing step called log parsing. This step is needed because automated log analysis requires structured input logs, whereas original logs contain semi-structured text printed by logging statements. Log parsing bridges this gap by converting the original logs into structured input logs fit for anomaly detection. Despite the intrinsic dependency between log parsing and anomaly detection, no existing work has investigated the impact of the “quality” of log parsing results on anomaly detection. In particular, the concept of “ideal” log parsing results with respect to anomaly detection has not been formalized yet. This makes it difficult to determine, upon obtaining inaccurate results from anomaly detection, if (and why) the root cause for such results lies in the log parsing step. In this short paper, we lay the theoretical foundations for defining the concept of “ideal” log parsing results for anomaly detection. Based on these foundations, we discuss practical implications regarding the identification and localization of root causes, when dealing with inaccurate anomaly detection, and the identification of irrelevant log messages.
KW - Anomaly detection
KW - Log analysis
KW - Log parsing
UR - http://www.scopus.com/inward/record.url?scp=85117493377&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-88494-9_16
DO - 10.1007/978-3-030-88494-9_16
M3 - Conference contribution
AN - SCOPUS:85117493377
SN - 9783030884932
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 277
EP - 287
BT - Runtime Verification - 21st International Conference, RV 2021, Proceedings
A2 - Feng, Lu
A2 - Fisman, Dana
PB - Springer Science and Business Media Deutschland GmbH
T2 - 21st International Conference on Runtime Verification, RV 2021
Y2 - 11 October 2021 through 14 October 2021
ER -