Rahman Shakil, Mostafizur, Rahman, Mahfuzur, Jahan Meem, Erin, Imranul Hoque Bhuiyan, Md, Akter, Sanjida, Bin Mohiuddin, Arafath, Rahman, Shafiur and Kabir, Istiak (2026) SkinViT-EfficientX: a hybrid vision transformer model with token pruning and explainable AI for multiclass skin cancer diagnosis. In: 2025 IEEE International Conference on Biomedical Engineering, Computer and Information Technology for Health (BECITHCON), 29-30 November 2025, Dhaka, Bangladesh.
Skin cancer is a common and serious health issue, making early diagnosis crucial for better outcomes. Traditional manual dermoscopy can be slow and inconsistent, demonstrating a need for automated diagnostic tools. This study introduces SkinViT-EfficientX, a hybrid deep learning model specifically designed for classifying skin lesions. It utilizes an EfficientNetV2-S encoder and a lightweight Vision Transformer connected by a residual cross-attention mechanism for effective local-global feature extraction. To enhance performance, a confidence-guided token pruning strategy is employed, and Grad-CAM is used for class-specific visual explanations. The model underwent thorough preprocessing and augmentation on two benchmark datasets: HAM10000 and the combined ISIC 2019 + DermNet dataset. SkinViT-EfficientX achieved a 97.36% F1-Score, 95.64% MCC, and 97.93% Specificity on HAM10000, while scoring 98.42% F1-Score, 96.51% MCC, and 98.86% Specificity on the combined dataset. It outperformed top models like MaxViT, Swin V2-T, DeiT III-S, and MobileViT V2-S in all metrics. The model's robustness and stability for rare lesion classes were validated through confusion matrix and learning curve analyses. Further, it is integrated into a web application for dermoscopic image uploads, class predictions, and heatmap visualizations. SkinViT-EfficientX provides an efficient, accurate, and interpretable AI-driven solution for skin cancer screening.
![]() |
View Item |
Tools
Tools