TY - JOUR
T1 - A systematic literature review on fault prediction performance in software engineering
AU - Hall, Tracy
AU - Beecham, Sarah
AU - Bowes, David
AU - Gray, David
AU - Counsell, Steve
PY - 2012
Y1 - 2012
N2 - Background: The accurate prediction of where faults are likely to occur in code can help direct test effort, reduce costs, and improve the quality of software. Objective: We investigate how the context of models, the independent variables used, and the modeling techniques applied influence the performance of fault prediction models. Method: We used a systematic literature review to identify 208 fault prediction studies published from January 2000 to December 2010. We synthesize the quantitative and qualitative results of 36 studies which report sufficient contextual and methodological information according to the criteria we develop and apply. Results: The models that perform well tend to be based on simple modeling techniques such as Naive Bayes or Logistic Regression. Combinations of independent variables have been used by models that perform well. Feature selection has been applied to these combinations when models are performing particularly well. Conclusion: The methodology used to build models seems to be influential to predictive performance. Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.
AB - Background: The accurate prediction of where faults are likely to occur in code can help direct test effort, reduce costs, and improve the quality of software. Objective: We investigate how the context of models, the independent variables used, and the modeling techniques applied influence the performance of fault prediction models. Method: We used a systematic literature review to identify 208 fault prediction studies published from January 2000 to December 2010. We synthesize the quantitative and qualitative results of 36 studies which report sufficient contextual and methodological information according to the criteria we develop and apply. Results: The models that perform well tend to be based on simple modeling techniques such as Naive Bayes or Logistic Regression. Combinations of independent variables have been used by models that perform well. Feature selection has been applied to these combinations when models are performing particularly well. Conclusion: The methodology used to build models seems to be influential to predictive performance. Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.
KW - software fault prediction
KW - Systematic literature review
UR - http://www.scopus.com/inward/record.url?scp=84870561393&partnerID=8YFLogxK
U2 - 10.1109/TSE.2011.103
DO - 10.1109/TSE.2011.103
M3 - Review article
AN - SCOPUS:84870561393
SN - 0098-5589
VL - 38
SP - 1276
EP - 1304
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
IS - 6
M1 - 6035727
ER -