TY - GEN
T1 - A Global-Local 3D Brain Tumor Segmentation Model Using Vision Transformers and Axial Statespace Modeling
AU - Ranjbarzadeh, Ramin
AU - Keles, Ayse
AU - Anari, Shokofeh
AU - Cunneen, Martin
AU - Bendechache, Malika
N1 - Publisher Copyright:
© 2026 IEEE.
PY - 2026
Y1 - 2026
N2 - Precise segmentation of glioma subregions using multimodal 3D MRI is crucial for diagnosis, treatment planning, and disease monitoring, although it poses challenges due to diverse tumor morphology and significant class imbalance. This study presents a global-local hybrid architecture for volumetric brain tumor segmentation, which combines convolutional feature extraction with a Vision Transformer bottleneck and Axial StateSpace (Mamba) modeling. The encoder-decoder architecture captures intricate structural details, and the ViT bottleneck facilitates global contextual reasoning throughout the entire 3D volume. To improve spatial consistency, Axial-Mamba blocks are utilized on skip connections to effectively express long-range dependencies across the depth, height, and width axes in a statespace formulation. We assess the approach using five crossvalidation folds, presenting performance as mean ± standard deviation across all folds. The model demonstrates consistent convergence, attaining a macro-Dice of approximately 0.45 - 0.50, high accuracy for edema (0.98), and modest efficacy for enhancing tumors (Dice =0.42). The tumor core continues to be the most challenging area (Dice =0.18), highlighting the recognized difficulties linked to its heterogeneous structure. The qualitative results validate that the model generates coherent and anatomically relevant segmentations, with precise edema segmentation and adequate representation of enhancing tumor margins.
AB - Precise segmentation of glioma subregions using multimodal 3D MRI is crucial for diagnosis, treatment planning, and disease monitoring, although it poses challenges due to diverse tumor morphology and significant class imbalance. This study presents a global-local hybrid architecture for volumetric brain tumor segmentation, which combines convolutional feature extraction with a Vision Transformer bottleneck and Axial StateSpace (Mamba) modeling. The encoder-decoder architecture captures intricate structural details, and the ViT bottleneck facilitates global contextual reasoning throughout the entire 3D volume. To improve spatial consistency, Axial-Mamba blocks are utilized on skip connections to effectively express long-range dependencies across the depth, height, and width axes in a statespace formulation. We assess the approach using five crossvalidation folds, presenting performance as mean ± standard deviation across all folds. The model demonstrates consistent convergence, attaining a macro-Dice of approximately 0.45 - 0.50, high accuracy for edema (0.98), and modest efficacy for enhancing tumors (Dice =0.42). The tumor core continues to be the most challenging area (Dice =0.18), highlighting the recognized difficulties linked to its heterogeneous structure. The qualitative results validate that the model generates coherent and anatomically relevant segmentations, with precise edema segmentation and adequate representation of enhancing tumor margins.
KW - Brain Tumor Segmentation
KW - BraTS 2020 dataset
KW - Mamba Architecture
KW - Multimodal MRI
KW - Vision Transformer
UR - https://www.scopus.com/pages/publications/105037588321
U2 - 10.1109/ACDSA67686.2026.11468214
DO - 10.1109/ACDSA67686.2026.11468214
M3 - Conference contribution
AN - SCOPUS:105037588321
T3 - International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications, ACDSA 2026
BT - International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications, ACDSA 2026
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications, ACDSA 2026
Y2 - 5 February 2026 through 7 February 2026
ER -