SkinViT-EfficientX: a hybrid vision transformer model with token pruning and explainable AI for multiclass skin cancer diagnosis

Rahman Shakil, Mostafizur; Rahman, Mahfuzur; Jahan Meem, Erin; Imranul Hoque Bhuiyan, Md; Akter, Sanjida; Bin Mohiuddin, Arafath; Rahman, Shafiur; Kabir, Istiak

London Met Repository

Tools

Lists

Rahman Shakil, Mostafizur, Rahman, Mahfuzur, Jahan Meem, Erin, Imranul Hoque Bhuiyan, Md, Akter, Sanjida, Bin Mohiuddin, Arafath, Rahman, Shafiur and Kabir, Istiak (2026) SkinViT-EfficientX: a hybrid vision transformer model with token pruning and explainable AI for multiclass skin cancer diagnosis. In: 2025 IEEE International Conference on Biomedical Engineering, Computer and Information Technology for Health (BECITHCON), 29-30 November 2025, Dhaka, Bangladesh.

Abstract
Details
Record

[+][-]

Abstract

Skin cancer is a common and serious health issue, making early diagnosis crucial for better outcomes. Traditional manual dermoscopy can be slow and inconsistent, demonstrating a need for automated diagnostic tools. This study introduces SkinViT-EfficientX, a hybrid deep learning model specifically designed for classifying skin lesions. It utilizes an EfficientNetV2-S encoder and a lightweight Vision Transformer connected by a residual cross-attention mechanism for effective local-global feature extraction. To enhance performance, a confidence-guided token pruning strategy is employed, and Grad-CAM is used for class-specific visual explanations. The model underwent thorough preprocessing and augmentation on two benchmark datasets: HAM10000 and the combined ISIC 2019 + DermNet dataset. SkinViT-EfficientX achieved a 97.36% F1-Score, 95.64% MCC, and 97.93% Specificity on HAM10000, while scoring 98.42% F1-Score, 96.51% MCC, and 98.86% Specificity on the combined dataset. It outperformed top models like MaxViT, Swin V2-T, DeiT III-S, and MobileViT V2-S in all metrics. The model's robustness and stability for rare lesion classes were validated through confusion matrix and learning curve analyses. Further, it is integrated into a web application for dermoscopic image uploads, class predictions, and heatmap visualizations. SkinViT-EfficientX provides an efficient, accurate, and interpretable AI-driven solution for skin cancer screening.

Details

Title:

SkinViT-EfficientX: a hybrid vision transformer model with token pruning and explainable AI for multiclass skin cancer diagnosis

Creators:

Rahman Shakil, Mostafizur, Rahman, Mahfuzur, Jahan Meem, Erin, Imranul Hoque Bhuiyan, Md, Akter, Sanjida, Bin Mohiuddin, Arafath, Rahman, Shafiur and Kabir, Istiak

Identification Number:

10.1109/becithcon69222.2025.11503998

Official URL:

https://doi.org/10.1109/becithcon69222.2025.115039...

Date:

8 May 2026

Subjects:

000 Computer science, information & general works
600 Technology > 610 Medicine & health

Department:

School of Computing and Digital Media

Publisher:

IEEE