TY - GEN
T1 - A machine learning-based approach for demarcating requirements in textual specifications
AU - Abualhaija, Sallam
AU - Arora, Chetan
AU - Sabetzadeh, Mehrdad
AU - Briand, Lionel C.
AU - Vaz, Eduardo
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - A simple but important task during the analysis of a textual requirements specification is to determine which statements in the specification represent requirements. In principle, by following suitable writing and markup conventions, one can provide an immediate and unequivocal demarcation of requirements at the time a specification is being developed. However, neither the presence nor a fully accurate enforcement of such conventions is guaranteed. The result is that, in many practical situations, analysts end up resorting to after-the-fact reviews for sifting requirements from other material in a requirements specification. This is both tedious and time-consuming. We propose an automated approach for demarcating requirements in free-form requirements specifications. The approach, which is based on machine learning, can be applied to a wide variety of specifications in different domains and with different writing styles. We train and evaluate our approach over an independently labeled dataset comprised of 30 industrial requirements specifications. Over this dataset, our approach yields an average precision of 81.2% and an average recall of 95.7%. Compared to simple baselines that demarcate requirements based on the presence of modal verbs and identifiers, our approach leads to an average gain of 16.4% in precision and 25.5% in recall.
AB - A simple but important task during the analysis of a textual requirements specification is to determine which statements in the specification represent requirements. In principle, by following suitable writing and markup conventions, one can provide an immediate and unequivocal demarcation of requirements at the time a specification is being developed. However, neither the presence nor a fully accurate enforcement of such conventions is guaranteed. The result is that, in many practical situations, analysts end up resorting to after-the-fact reviews for sifting requirements from other material in a requirements specification. This is both tedious and time-consuming. We propose an automated approach for demarcating requirements in free-form requirements specifications. The approach, which is based on machine learning, can be applied to a wide variety of specifications in different domains and with different writing styles. We train and evaluate our approach over an independently labeled dataset comprised of 30 industrial requirements specifications. Over this dataset, our approach yields an average precision of 81.2% and an average recall of 95.7%. Compared to simple baselines that demarcate requirements based on the presence of modal verbs and identifiers, our approach leads to an average gain of 16.4% in precision and 25.5% in recall.
KW - Machine Learning
KW - Natural Language Processing
KW - Requirements Identification and Classification
KW - Textual Requirements
UR - http://www.scopus.com/inward/record.url?scp=85076921088&partnerID=8YFLogxK
U2 - 10.1109/RE.2019.00017
DO - 10.1109/RE.2019.00017
M3 - Conference contribution
AN - SCOPUS:85076921088
T3 - Proceedings of the IEEE International Conference on Requirements Engineering
SP - 51
EP - 62
BT - Proceedings - 2019 IEEE 27th International Requirements Engineering Conference, RE 2019
A2 - Damian, Daniela
A2 - Perini, Anna
A2 - Lee, Seok-Won
PB - IEEE Computer Society
T2 - 27th IEEE International Requirements Engineering Conference, RE 2019
Y2 - 23 September 2019 through 27 September 2019
ER -