Abstract
Accurate skin lesion segmentation is crucial for early melanoma diagnosis, yet deep learning models are often hindered by two key challenges: the scarcity of expert-annotated data and the difficulty of precisely delineating complex lesion boundaries. Existing methods either require extensive annotated data or, in the case of graph-based refinement techniques, suffer from a feature representation bottleneck, where performance is limited by the quality of initial, often supervised, features. This paper introduces TokenCutSeg, a novel hybrid framework designed to overcome these limitations by strategically separating representation learning from supervised refinement. Our approach first leverages all available images, both labelled and unlabeled, in a self-supervised stage. A Vector Quantised Generative Adversarial Network (VQGAN) learns a rich, discrete visual vocabulary, tokenising images into a compact and meaningful representation. Subsequently, a Transformer, trained with a masked token prediction task, models the deep contextual relationships between these visual tokens. This self-supervised pipeline produces powerful, context-aware features. Finally, these high-quality features are fed into a lightweight, supervised graph-based module that performs the final segmentation refinement, effectively solving the traditional feature bottleneck. Evaluated on the ISIC 2016, 2017, and 2018 datasets, TokenCutSeg establishes a new state-of-the-art, achieving mIoU scores of 86.18%, 81.37%, and 82.76%, respectively, demonstrating superior boundary accuracy and generalization. Our results validate that this label-efficient, hybrid approach leads to more robust and practical segmentation models. Our code is available at GitHub.
| Original language | English |
|---|---|
| Pages (from-to) | 150671-150683 |
| Number of pages | 13 |
| Journal | IEEE Access |
| Volume | 13 |
| DOIs | |
| Publication status | Published - 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- computer-aided diagnosis
- dermoscopy
- graph attention networks
- Graph neural networks
- image segmentation
- medical image analysis
- semantic segmentation
- skin lesion
- transformers
- vision transformer
Fingerprint
Dive into the research topics of 'TokenCutSeg: Label-Efficient Skin Lesion Segmentation by Powering Supervised Refinement With Self-Learned Visual Vocabularies'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver