TY - JOUR
T1 - An active learning approach for improving the accuracy of automated domain model extraction
AU - Arora, Chetan
AU - Sabetzadeh, Mehrdad
AU - Nejati, Shiva
AU - Briand, Lionel
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/1
Y1 - 2019/1
N2 - Domain models are a useful vehicle for making the interpretation and elaboration of natural-language requirements more precise. Advances in natural-language processing (NLP) have made it possible to automatically extract from requirements most of the information that is relevant to domain model construction. However, alongside the relevant information, NLP extracts from requirements a significant amount of information that is superfluous (not relevant to the domain model). Our objective in this article is to develop automated assistance for filtering the superfluous information extracted by NLP during domain model extraction. To this end, we devise an active-learning-based approach that iteratively learns from analysts’ feedback over the relevance and superfluousness of the extracted domain model elements and uses this feedback to provide recommendations for filtering superfluous elements. We empirically evaluate our approach over three industrial case studies. Our results indicate that, once trained, our approach automatically detects an average of ≈45% of the superfluous elements with a precision of ≈96%. Since precision is very high, the automatic recommendations made by our approach are trustworthy. Consequently, analysts can dispose of a considerable fraction – nearly half – of the superfluous elements with minimal manual work. The results are particularly promising, as they should be considered in light of the non-negligible subjectivity that is inherently tied to the notion of relevance.
AB - Domain models are a useful vehicle for making the interpretation and elaboration of natural-language requirements more precise. Advances in natural-language processing (NLP) have made it possible to automatically extract from requirements most of the information that is relevant to domain model construction. However, alongside the relevant information, NLP extracts from requirements a significant amount of information that is superfluous (not relevant to the domain model). Our objective in this article is to develop automated assistance for filtering the superfluous information extracted by NLP during domain model extraction. To this end, we devise an active-learning-based approach that iteratively learns from analysts’ feedback over the relevance and superfluousness of the extracted domain model elements and uses this feedback to provide recommendations for filtering superfluous elements. We empirically evaluate our approach over three industrial case studies. Our results indicate that, once trained, our approach automatically detects an average of ≈45% of the superfluous elements with a precision of ≈96%. Since precision is very high, the automatic recommendations made by our approach are trustworthy. Consequently, analysts can dispose of a considerable fraction – nearly half – of the superfluous elements with minimal manual work. The results are particularly promising, as they should be considered in light of the non-negligible subjectivity that is inherently tied to the notion of relevance.
KW - Active learning
KW - Case study research
KW - Domain modeling
KW - Natural-language requirements
KW - Requirements engineering
UR - http://www.scopus.com/inward/record.url?scp=85060897284&partnerID=8YFLogxK
U2 - 10.1145/3293454
DO - 10.1145/3293454
M3 - Article
AN - SCOPUS:85060897284
SN - 1049-331X
VL - 28
JO - ACM Transactions on Software Engineering and Methodology
JF - ACM Transactions on Software Engineering and Methodology
IS - 1
M1 - 4
ER -