Background
Radiation therapy (RT), often with concurrent chemotherapy, is frequently used in the management of head and neck cancer (HNC) as definitive or adjuvant treatment. RT for HNC improves local control but is associated with significant treatment-related toxicities such as xerostomia [
1,
2]. Approximately 50–80% of patients with HNC will experience xerostomia to some degree after RT [
3,
4]. While these swallow-related toxicities significantly influence long-term patient outcomes and quality of life, our ability to robustly characterize these complications as they relate to individual patients and the radiotherapy dosimetry delivered to salivary glands is limited.
In radiation oncology, there is increasing popularity for rapid-learning health systems which use routine clinical data to develop models that can be used to predict patient specific treatment outcomes [
5‐
7]. In addition to predicting outcomes, the goal of decision support systems is to improve overall patient care and determine when and how to personalize patients’ treatments. Machine learning algorithms have emerged as popular tools for decision support. These algorithms are already being applied to many aspects of radiation therapy including: target delineation [
8,
9], treatment planning [
10,
11], radiation physics quality assurance [
12], and outcome [
13] and tumor response modelling [
14]. With recent advancements in image processing, informatics, and machine learning, medical imaging is increasingly being used for improved clinical decision making. Studies have demonstrated that the variability in clinical image intensity, shape, and texture can be quantified generating a radiomic signature for individual tumors and normal anatomic structures [
15‐
20]. For radiation therapy, radiomics offers the potential to significantly influence clinical decision-making, therapy planning, and follow-up workflow. In HNC, a radiomic signature has been shown to be prognostic and has been validated across several institutions [
19,
20]. Radiomics derived from computed tomography (CT) have also been used to predict xerostomia and survival in HNC patients [
18,
21,
22].
To our knowledge, the incorporation of MR-based biomarkers with CT and dosimetry features in acute RT-induced xerostomia prediction models has not been investigated in HNC. Thus, the objective of this study was to analyze baseline CT/MR image features of salivary glands to better understand their role in the prediction of radiation-induced xerostomia 3 months after HNC radiotherapy. We hypothesized that baseline CT/MR image features are related to xerostomia and incorporating these into a prediction model improves the accuracy of predicting radiation-induced xerostomia compared to dosimetric information alone.
Discussion
In this study, to better understand the influence of image features in the prediction of RT-induced xerostomia, we investigated the relationships between CT and MR image features with xerostomia scores in HNC patients using machine learning approaches. We made the following observations: 1) image features from both the parotid and submandibular glands significantly contributed to our prediction of xerostomia, 2) higher order texture features for both ipsi- and contralateral salivary glands were important predictors of xerostomia, and 3) combining multimodal image features with dosimetry features improved xerostomia prediction. Collectively, these observations further support prior work [
22,
35] demonstrating that baseline salivary gland image features with CT along with quantifying radiation injury are important in predicting for the risk of xerostomia 3 months following RT.
mage features from both salivary glands significantly contributed to the prediction of xerostomia post-RT, concordant with the readily apparent differences visualized in both salivary glands using CT and MR (Figs.
2 and
3). Patients with xerostomia after RT appeared to have more heterogeneous parotid and submandibular glands at baseline. We should note that the majority of HNC research using radiomics has focused on the parotid glands [
31,
36‐
39] with relatively little attention paid to the submandibular glands [
18,
22]. Interestingly, the features with the greatest OR corresponded to the submandibular glands. While the parotid glands produce the majority of saliva during eating and with oral stimulation, submandibular glands contribute up to more than 70% of unstimulated/resting salivary output [
40] which is rich in mucin. This allows for the oral mucosa to maintain its hydration [
41,
42]. These results suggest that baseline submandibular gland image features may provide insight into unstimulated salivary function, and this insight may improve prediction of susceptibility to post-RT xerostomia.
Important features in our cohort stemmed from the GLRLM and the GLSZM and both the ipsilateral and contralateral salivary glands. For the contralateral side, the CT SG wavelet LLL GLRLM gray level non-uniformity normalized significantly contributed to the GLM. The cPG CT wavelet LHL GLRLM long run high gray level emphasis, which had the second lowest standard error in the model, increases when the texture is dominated by long runs with high intensity levels. These results suggest that patients with xerostomia have cSG that have lower similarity in intensities (increased gray level non-uniformity) and more heterogeneous size zone volumes (increased size zone non-uniformity). Furthermore, patients with increased risk of xerostomia have finer structural textures of the cPG (decreased long run emphasis) [
22] with longer run of high intensity voxels (increased long run high gray level emphasis). Focusing on the ipsilateral side, the feature that contributed significantly to the GLM included the MR iSG wavelet LHL GLSZM small area high gray level emphasis. This feature indicates that patients with xerostomia have ipsilateral submandibular glands with more small regions of low intensity (i.e. more locally heterogeneous as indicated by an increase of small area low gray level intensity).
Similar to previously reported work, these image features suggest that patients who are likely to develop xerostomia have more locally heterogeneous salivary glands. The heterogeneity differences can be seen in the representative images (Figs.
2 and
3) where patients with xerostomia had more regions of low intensities in both parotid and submandibular glands compared to those patients without xerostomia. This is consistent with previously published work demonstrating that patients who develop xerostomia after RT have more heterogeneous parotid gland tissue [
22]. More recently, MR derived image features of the parotid glands were used in the prediction of xerostomia in HNC patients [
31]. This important work demonstrated that high signal intensity, specifically the 90th percentile of the MR-intensities in parotid glands improved the performance of the xerostomia prediction model. It is well known that high signal intensity in T1-weighted images is related to fat deposition because of the short T1 relaxation time of fatty tissue [
43]. In fact, fat deposition may represent the loss of normal glandular cells as this phenomenon is also seen in diseases such as Sjögren’s syndrome which is characterized by autoimmune destruction of salivary and lacrimal glands [
44]. Of note, the salivary glands of patients with Sjögren’s syndrome have also been shown to be more heterogeneous than those without this syndrome [
45]. Fatty replaced salivary glands have also been shown to be related to age [
46] and xerostomia [
47]. However, in our cohort, age was not correlated with image features or xerostomia. On CT, fatty tissue appears as low density [
48]. This is consistent with the representative CT images (Figs.
2 and
3) of the parotid and submandibular glands, where the patients with xerostomia had hypodense salivary glands (with obvious local heterogeneous regions).
Finally, in our xerostomia prediction models, for our training cohort, there were no significant differences between our DVH, CT-only, and MR-only models. However, when CT and MR were combined, the performance improved compared to DVH alone. More importantly, we observed that the combination of dosimetry and image features improved overall prediction compared to dosimetry or image features alone. However, the specificity of our models with CT, MR, and DVH-only features was low. In fact, the combination of DVH + CT + MR features did not lead to a significant improvement in sensitivity and specificity. With the addition of all features in a single model, the sensitivity improved only modestly. We should note that majority of our patients did not develop xerostomia, resulting in a biased dataset which could influence sensitivity and specificity. Compared to previously published work evaluating CT image features to predict xerostomia at 12 months [
22], the performance of our models was comparable. This work reported an AUC of 0.77 with the inclusion of CT features, specifically features derived from the GLRLM and GLSZM. Other work that has used imaging to predict xerostomia at 12 months using CT only [
37,
38] and MR only [
31] parotid gland image features has also demonstrated comparable performance to our models (AUC range: 0.60–0.80). Cone beam CT of the parotid glands has also been used to predict xerostomia in a single cohort with performance ranging from 0.71–0.76 [
49]. Other work that has used CT parotid image features with dosimetry in a single cohort with nested cross validation has also shown model performances in the range of 0.68–0.88 [
50]. In our validation cohort, we observed a similar trend where combining imaging improved performance. Adding dosimetry to our training cohort did improve performance which is consistent with previously published work that has shown the prediction of xerostomia improves when CT image features are combined with dosimetric information [
18,
22,
35,
38]. However, in our validation set, adding dosimetry to imaging features did not improve performance. It should be noted that our work used time to separate our training and validation sets. The decrease in performance of the DVH model may be indicative of evolving practices of the attending physicians. Specifically, changes in physician preferences of dose constraints to the salivary glands. The reduction in performance may also reflect limitations of the DVH in capturing 3D spatial information. This may also explain the decrease in performance of the validation models that contained DVH features. It should also be noted that combining clinical data with CT and MR significantly improved xerostomia prediction compared to CT alone. Although the receiver operating characteristic curves had overlapping confidence intervals, there was a trend towards prediction improvement when combining clinical data with dosimetry and image features compared to dosimetry and CT features alone which to our knowledge has not been previously demonstrated. Future work in an independent dataset is required to further determine the benefits of combining imaging modalities in outcome prediction modelling.
Although this study provides promising preliminary results, future work is needed to ascertain the generalizability of these findings. It should be noted that random variation in small datasets can often be mistakenly interpreted as meaningful (i.e. overfitting), and as a consequence the model may not perform as well in independent datasets. In the present work, the risks of overfitting the model were addressed by pre-selecting variables based on their inter-correlation (with no correction for multiple comparisons since
p-values at this step were simply used to selected a group of candidate features which were further refined using LASSO), cross-validation of the internal dataset, and validating our models using a temporally split dataset [
30]. It should be noted that temporal splitting is an intermediate validation method compared to internal and external validation [
30]. Future work will need to validate these models on an independent external dataset. The presence of multiple correlated explanatory variables can lead to unstable models with highly variable coefficient estimates and incorrect selection of significant texture features. In this work, collinearity was addressed by determining the Pearson correlation coefficient between two features [
30‐
32]. If the correlation coefficient was larger than 0.80, only the variable with the highest correlation with xerostomia was selected. Modality specific resampling was not performed for the CT images. and non-cubic voxels were used for radiomics analysis, similar to prior studies [
18,
22,
31,
38]. Resampling images compared to using the original resolution before feature computation is an active area of radiomics research, and there is no widely accepted recommendation. Resampling images to an isotropic resolution may lead to better interpretation of certain features, but there will be information loss/degradation due to interpolation process. For our MR images, we used the same scanning protocol for training and validation. This may limit the translatability and generalizability of our results because MR intensities are highly dependent on scanning protocol. Also, unlike CT, MR signal-intensity is influenced by hardware factors such as the positioning of the RF coils, which introduce inter-scan variability. Although normalization of MR data has been proposed to address this, the benefits of normalization for radiomic prediction models to differentiate patients with or without xerostomia has not been well established. Future work is needed to establish the benefits of signal normalization for radiomic prediction models of xerostomia. In our work, salivary glands were contoured by the patient’s attending radiation oncologist or by multi-atlas-based auto-segmentation with manual assessment/correction (when clinical contours were not available). Although multiple observers did not contour the same patient, multiple observers’ contours of the glands were included in our feature selection and prediction model building process. Therefore, we anticipate that the selected features are robust to contour variability while being relevant to the outcome. Although previous studies have shown that inter-observer delineation variability has a relevant influence on radiomics analysis [
22,
51], we should note that it is important to determine a model that is robust to variability in raw clinically available data so that it can be used in a real clinical scenario. However, further study will be needed to better understand the influence of contour variability to the computed radiomics features and successive feature selection-prediction performance. Finally, we acknowledge that our image feature analysis was limited to a single bin size for CT and single bin count for MRI. Texture features have been shown to be affected by the bin width or number of bins used to discretize image intensities. Although the optimal bin width/count for image feature analysis has not been established, previous HNC work has used a 25 unit bin width (similar to the bin width we used) for the evaluation of image features [
27,
28]. However, since image features depend on the way they are computed (i.e. using different binning strategies) further work is needed to investigate the dependency of bin width and the selection of image features on xerostomia prediction.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.