Purpose
Evaluate the classification performance and interpretability of the Vision Transformer (ViT) model on acute and chronic vertebral compression fractures using Shapley significance maps.
Materials and Methods
This retrospective study utilized medical imaging data from December 2018 to December 2023 from three hospitals in China. The study included 942 patients, with imaging data comprising X-rays, CTs, and MRIs. Patients were divided into training, validation, and test sets with a ratio of 7:2:1. The ViT model variant, SimpleViT, was fine-tuned on the training dataset. Statistical analyses were performed using the PixelMedAI platform, focusing on metrics such as ROC curves, sensitivity, specificity, and AUC values, with statistical significance assessed using the DeLong test.
Results
A total of 942 patients (mean age 69.17 ± 10.61 years) were included, with 1076 vertebral fractures analyzed (705 acute, 371 chronic). In the test set, the ViT model demonstrated superior performance over the ResNet18 model, with an accuracy of 0.880 and an AUC of 0.901 compared to 0.843 and 0.833, respectively. The use of ViT Shapley saliency maps significantly enhanced diagnostic sensitivity and specificity, reaching 0.883 (95% CI: 0.800, 0.963) and 0.950 (95% CI: 0.891, 1.00), respectively.
Conclusion
In vertebral compression fractures classification, Vision Transformer outperformed Convolutional Neural Network, providing more effective Shapley-based saliency maps that were favored by radiologists over GradCAM.