Improving requirements glossary construction via clustering: Approach and industrial case studies

Chetan Arora, Mehrdad Sabetzadeh, Lionel Briand, Frank Zimmer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Context. A glossary is an important part of any software requirements document. By making explicit the technical terms in a domain and providing definitions for them, a glossary serves as a helpful tool for mitigating ambiguities. Goal. A necessary step for building a glossary is to decide upon the glossary terms and to identify their related terms. Doing so manually is a laborious task. Our objective is to provide automated support for identifying candidate glossary terms and their related terms. Our work differs from existing work on term extraction mainly in that, instead of providing a flat list of candidate terms, our approach clusters the terms by relevance. Method. We use case study research as the basis for our empirical investigation. Results. We present an automated approach for identifying and clustering candidate glossary terms. We evaluate the approach through two industrial case studies; one study concerns a satellite software component, and the other - an evidence management tool for safety certification. Conclusions. Our results indicate that over requirements documents: (1) our approach is more accurate than other existing methods for identifying candidate glossary terms; this makes it less likely that our approach will miss important glossary terms. (2) Clustering provides an effective basis for grouping related terms; this makes clustering a useful support tool for selection of glossary terms and associating these terms with their related terms.

Original languageEnglish
Title of host publicationInternational Symposium on Empirical Software Engineering and Measurement
PublisherIEEE Computer Society
ISBN (Electronic)9781450327749
DOIs
Publication statusPublished - 18 Sep 2014
Externally publishedYes
Event8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2014 - Torino, Italy
Duration: 18 Sep 201419 Sep 2014

Publication series

NameInternational Symposium on Empirical Software Engineering and Measurement
ISSN (Print)1949-3770
ISSN (Electronic)1949-3789

Conference

Conference8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2014
Country/TerritoryItaly
CityTorino
Period18/09/1419/09/14

Keywords

  • case study research
  • clustering
  • glossary
  • natural language processing (NLP)
  • term extraction

Fingerprint

Dive into the research topics of 'Improving requirements glossary construction via clustering: Approach and industrial case studies'. Together they form a unique fingerprint.

Cite this