Synthetic data generation for statistical testing

Ghanem Soltana, Mehrdad Sabetzadeh, Lionel C. Briand

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Usage-based statistical testing employs knowledge about the actual or anticipated usage profile of the system under test for estimating system reliability. For many systems, usage-based statistical testing involves generating synthetic test data. Such data must possess the same statistical characteristics as the actual data that the system will process during operation. Synthetic test data must further satisfy any logical validity constraints that the actual data is subject to. Targeting data-intensive systems, we propose an approach for generating synthetic test data that is both statistically representative and logically valid. The approach works by first generating a data sample that meets the desired statistical characteristics, without taking into account the logical constraints. Subsequently, the approach tweaks the generated sample to fix any logical constraint violations. The tweaking process is iterative and continuously guided toward achieving the desired statistical characteristics. We report on a realistic evaluation of the approach, where we generate a synthetic population of citizens' records for testing a public administration IT system. Results suggest that our approach is scalable and capable of simultaneously fulfilling the statistical representativeness and logical validity requirements.

Original languageEnglish
Title of host publicationASE 2017 - Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering
EditorsTien N. Nguyen, Grigore Rosu, Massimiliano Di Penta
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages872-882
Number of pages11
ISBN (Electronic)9781538626849
DOIs
Publication statusPublished - 20 Nov 2017
Externally publishedYes
Event32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017 - Urbana-Champaign, United States
Duration: 30 Oct 20173 Nov 2017

Publication series

NameASE 2017 - Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering

Conference

Conference32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017
Country/TerritoryUnited States
CityUrbana-Champaign
Period30/10/173/11/17

Keywords

  • Model-Driven Engineering
  • OCL
  • Test Data Generation
  • UML
  • Usage-based Statistical Testing

Fingerprint

Dive into the research topics of 'Synthetic data generation for statistical testing'. Together they form a unique fingerprint.

Cite this