TY - JOUR
T1 - A historical, textual analysis approach to feature location
AU - Chochlov, Muslim
AU - English, Michael
AU - Buckley, Jim
N1 - Publisher Copyright:
© 2017 Elsevier B.V.
PY - 2017/8/1
Y1 - 2017/8/1
N2 - Context Feature location is the task of finding the source code that implements specific functionality in software systems. A common approach is to leverage textual information in source code against a query, using Information Retrieval (IR) techniques. To address the paucity of meaningful terms in source code, alternative, relevant source-code descriptions, like change-sets could be leveraged for these IR techniques. However, the extent to which these descriptions are useful has not been thoroughly studied. Objective This work rigorously characterizes the efficacy of source-code lexical annotation by change-sets (ACIR), in terms of its best-performing configuration. Method A tool, implementing ACIR, was used to study different configurations of the approach and to compare them to a baseline approach (thus allowing comparison against other techniques going forward). This large-scale evaluation employs eight subject systems and 600 features. Results It was found that, for ACIR: (1) method level granularity demands less search effort; (2) using more recent change-sets improves effectiveness; (3) aggregation of recent change-sets by change request, decreases effectiveness; (4) naive, text-classification-based filtering of “management” change-sets also decreases the effectiveness. In addition, a strongly pronounced dichotomy of subject systems emerged, where one set recorded better feature location using ACIR and the other recorded better feature location using the baseline approach. Finally, merging ACIR and the baseline approach significantly improved performance over both standalone approaches for all systems. Conclusion The most fundamental finding is the importance of rigorously characterizing proposed feature location techniques, to identify their optimal configurations. The results also suggest it is important to characterize the software systems under study when selecting the appropriate feature location technique. In the past, configuration of the techniques and characterization of subject systems have not been considered first-class entities in research papers, whereas the results presented here suggests these factors can have a big impact.
AB - Context Feature location is the task of finding the source code that implements specific functionality in software systems. A common approach is to leverage textual information in source code against a query, using Information Retrieval (IR) techniques. To address the paucity of meaningful terms in source code, alternative, relevant source-code descriptions, like change-sets could be leveraged for these IR techniques. However, the extent to which these descriptions are useful has not been thoroughly studied. Objective This work rigorously characterizes the efficacy of source-code lexical annotation by change-sets (ACIR), in terms of its best-performing configuration. Method A tool, implementing ACIR, was used to study different configurations of the approach and to compare them to a baseline approach (thus allowing comparison against other techniques going forward). This large-scale evaluation employs eight subject systems and 600 features. Results It was found that, for ACIR: (1) method level granularity demands less search effort; (2) using more recent change-sets improves effectiveness; (3) aggregation of recent change-sets by change request, decreases effectiveness; (4) naive, text-classification-based filtering of “management” change-sets also decreases the effectiveness. In addition, a strongly pronounced dichotomy of subject systems emerged, where one set recorded better feature location using ACIR and the other recorded better feature location using the baseline approach. Finally, merging ACIR and the baseline approach significantly improved performance over both standalone approaches for all systems. Conclusion The most fundamental finding is the importance of rigorously characterizing proposed feature location techniques, to identify their optimal configurations. The results also suggest it is important to characterize the software systems under study when selecting the appropriate feature location technique. In the past, configuration of the techniques and characterization of subject systems have not been considered first-class entities in research papers, whereas the results presented here suggests these factors can have a big impact.
KW - Dataset expansion
KW - Feature location
KW - Search effort
KW - Software systems’ characterization
KW - Version histories
UR - http://www.scopus.com/inward/record.url?scp=85018634750&partnerID=8YFLogxK
U2 - 10.1016/j.infsof.2017.04.003
DO - 10.1016/j.infsof.2017.04.003
M3 - Article
AN - SCOPUS:85018634750
SN - 0950-5849
VL - 88
SP - 110
EP - 126
JO - Information and Software Technology
JF - Information and Software Technology
ER -