STEM Rebalance: A Novel Approach for Tackling Imbalanced Datasets using SMOTE, Edited Nearest Neighbour, and Mixup

Yumnah Hasan, Fatemeh Amerehi, Patrick Healy, Conor Ryan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Imbalanced datasets in medical imaging are characterized by skewed class proportions and scarcity of abnormal cases. When trained using such data, models tend to assign higher probabilities to normal cases, leading to biased performance. Common oversampling techniques such as SMOTE rely on local information and can introduce marginalization issues. This paper investigates the potential of using Mixup augmentation that combines two training examples along with their corresponding labels to generate new data points as a generic vicinal distribution. To this end, we propose STEM, which combines SMOTEENN and Mixup at the instance level. This integration enables us to effectively leverage the entire distribution of minority classes, thereby mitigating both between-class and within-class imbalances. We focus on the breast cancer problem, where imbalanced datasets are prevalent. The results demonstrate the effectiveness of STEM, which achieves AUC values of 0.96 and 0.99 in the Digital Database for Screening Mammography and Wisconsin Breast Cancer (Diagnostics) datasets, respectively. Moreover, this method shows promising potential when applied with an ensemble of machine learning (ML) classifiers.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE 19th International Conference on Intelligent Computer Communication and Processing Conference, ICCP 2023
EditorsSergiu Nedevschi, Rodica Potolea, Radu Razvan Slavescu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3-9
Number of pages7
ISBN (Electronic)9798350370355
DOIs
Publication statusPublished - 2023
Event19th IEEE International Conference on Intelligent Computer Communication and Processing Conference, ICCP 2023 - Cluj-Napoca, Romania
Duration: 26 Oct 202328 Oct 2023

Publication series

NameProceedings - 2023 IEEE 19th International Conference on Intelligent Computer Communication and Processing Conference, ICCP 2023

Conference

Conference19th IEEE International Conference on Intelligent Computer Communication and Processing Conference, ICCP 2023
Country/TerritoryRomania
CityCluj-Napoca
Period26/10/2328/10/23

Keywords

  • Augmentation
  • Breast Cancer
  • Image processing
  • Machine Learning
  • SMOTE

Fingerprint

Dive into the research topics of 'STEM Rebalance: A Novel Approach for Tackling Imbalanced Datasets using SMOTE, Edited Nearest Neighbour, and Mixup'. Together they form a unique fingerprint.

Cite this