TY - GEN
T1 - Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis
AU - Shar, Lwin Khin
AU - Beng Kuan Tan, Hee
AU - Briand, Lionel C.
PY - 2013
Y1 - 2013
N2 - In previous work, we proposed a set of static attributes that characterize input validation and input sanitization code patterns. We showed that some of the proposed static attributes are significant predictors of SQL injection and cross site scripting vulnerabilities. Static attributes have the advantage of reflecting general properties of a program. Yet, dynamic attributes collected from execution traces may reflect more specific code characteristics that are complementary to static attributes. Hence, to improve our initial work, in this paper, we propose the use of dynamic attributes to complement static attributes in vulnerability prediction. Furthermore, since existing work relies on supervised learning, it is dependent on the availability of training data labeled with known vulnerabilities. This paper presents prediction models that are based on both classification and clustering in order to predict vulnerabilities, working in the presence or absence of labeled training data, respectively. In our experiments across six applications, our new supervised vulnerability predictors based on hybrid (static and dynamic) attributes achieved, on average, 90% recall and 85% precision, that is a sharp increase in recall when compared to static analysis-based predictions. Though not nearly as accurate, our unsupervised predictors based on clustering achieved, on average, 76% recall and 39% precision, thus suggesting they can be useful in the absence of labeled training data.
AB - In previous work, we proposed a set of static attributes that characterize input validation and input sanitization code patterns. We showed that some of the proposed static attributes are significant predictors of SQL injection and cross site scripting vulnerabilities. Static attributes have the advantage of reflecting general properties of a program. Yet, dynamic attributes collected from execution traces may reflect more specific code characteristics that are complementary to static attributes. Hence, to improve our initial work, in this paper, we propose the use of dynamic attributes to complement static attributes in vulnerability prediction. Furthermore, since existing work relies on supervised learning, it is dependent on the availability of training data labeled with known vulnerabilities. This paper presents prediction models that are based on both classification and clustering in order to predict vulnerabilities, working in the presence or absence of labeled training data, respectively. In our experiments across six applications, our new supervised vulnerability predictors based on hybrid (static and dynamic) attributes achieved, on average, 90% recall and 85% precision, that is a sharp increase in recall when compared to static analysis-based predictions. Though not nearly as accurate, our unsupervised predictors based on clustering achieved, on average, 76% recall and 39% precision, thus suggesting they can be useful in the absence of labeled training data.
KW - Defect prediction
KW - empirical study
KW - input validation and sanitization
KW - static and dynamic analysis
KW - vulnerability
UR - http://www.scopus.com/inward/record.url?scp=84886430853&partnerID=8YFLogxK
U2 - 10.1109/ICSE.2013.6606610
DO - 10.1109/ICSE.2013.6606610
M3 - Conference contribution
AN - SCOPUS:84886430853
SN - 9781467330763
T3 - Proceedings - International Conference on Software Engineering
SP - 642
EP - 651
BT - 2013 35th International Conference on Software Engineering, ICSE 2013 - Proceedings
T2 - 2013 35th International Conference on Software Engineering, ICSE 2013
Y2 - 18 May 2013 through 26 May 2013
ER -