A historical, textual analysis approach to feature location

Research output: Contribution to journalArticlepeer-review

Abstract

Context Feature location is the task of finding the source code that implements specific functionality in software systems. A common approach is to leverage textual information in source code against a query, using Information Retrieval (IR) techniques. To address the paucity of meaningful terms in source code, alternative, relevant source-code descriptions, like change-sets could be leveraged for these IR techniques. However, the extent to which these descriptions are useful has not been thoroughly studied. Objective This work rigorously characterizes the efficacy of source-code lexical annotation by change-sets (ACIR), in terms of its best-performing configuration. Method A tool, implementing ACIR, was used to study different configurations of the approach and to compare them to a baseline approach (thus allowing comparison against other techniques going forward). This large-scale evaluation employs eight subject systems and 600 features. Results It was found that, for ACIR: (1) method level granularity demands less search effort; (2) using more recent change-sets improves effectiveness; (3) aggregation of recent change-sets by change request, decreases effectiveness; (4) naive, text-classification-based filtering of “management” change-sets also decreases the effectiveness. In addition, a strongly pronounced dichotomy of subject systems emerged, where one set recorded better feature location using ACIR and the other recorded better feature location using the baseline approach. Finally, merging ACIR and the baseline approach significantly improved performance over both standalone approaches for all systems. Conclusion The most fundamental finding is the importance of rigorously characterizing proposed feature location techniques, to identify their optimal configurations. The results also suggest it is important to characterize the software systems under study when selecting the appropriate feature location technique. In the past, configuration of the techniques and characterization of subject systems have not been considered first-class entities in research papers, whereas the results presented here suggests these factors can have a big impact.

Original languageEnglish
Pages (from-to)110-126
Number of pages17
JournalInformation and Software Technology
Volume88
DOIs
Publication statusPublished - 1 Aug 2017

Keywords

  • Dataset expansion
  • Feature location
  • Search effort
  • Software systems’ characterization
  • Version histories

Fingerprint

Dive into the research topics of 'A historical, textual analysis approach to feature location'. Together they form a unique fingerprint.

Cite this