Introduction
Thyroid nodules are very common, and with the advances of high-resolution ultrasound, the detection rate in the general population ranges from 19 to 68%. The majority of thyroid nodules are benign, and only a small fraction is clinically significant [
1‐
3]. The incidence of thyroid cancer continues to rise and is ranked as the fifth most common cancer among women in the United States currently [
4]. The clinical management of thyroid nodules depends on the benignity or malignancy [
5] and early qualitative diagnosis plays a crucial role in optimizing treatment and improving patient outcomes.
Various Ultrasonography risk stratification systems have been developed to effectively manage of thyroid nodules, including the American College of Radiology Thyroid Imaging Reporting and Data System (ACR- TIRADS) [
6], Korean TIRADS (K-TIRADS) [
7], European TIRADS (EU-TIRADS) [
8], Kwak-TIRADS [
9] and Chinese-TIRADS (C-TIRADS) [
10]. While these TI-RADSs demonstrated high sensitivity (>90%) in identifying nodules, their specificity remains relatively unsatisfactory [
11]. Additionally, differentiated thyroid cancer accounts for over 90% of cases, and has an excellent prognosis, with a 5-year survival rate exceeding 98% [
12]. The high sensitivity and relatively low specificity of TI-RADS systems have led to the diagnosis of numerous thyroid nodules that lack clinical significance, resulting in unnecessary fine-needle aspiration (FNA) and overtreatment. FNA is considered the gold standard for the preoperative diagnosis of thyroid cancer. However, it is an invasive procedure, and approximately 20–30% of the puncture results are either nondiagnostic or of uncertain significance [
13,
14].
Magnetic resonance imaging (MRI) has gained significant popularity in the diagnosis of head and neck tumors due to its numerous advantages, including multi-parameter measurement, arbitrary planar imaging, low risk of contrast allergy, no ionizing radiation, and high soft tissue contrast [
15]. In recent years, there has been a growing trend in utilizing MRI for the preoperative evaluation of thyroid nodules [
16‐
20]. However, few studies have explored the use of morphological features on multiparametric MRI to assess the benignity or malignancy of thyroid nodules.
In this study, we investigated the value of MRI morphological features in distinguishing between benign and malignant thyroid nodules. We also developed and validated a prediction model and compared its performance with the ultrasound-based TI-RADS system.
Materials and methods
Patients and study design
This study was a retrospective observational study conducted in accordance with the Declaration of Helsinki. Informed consent requirements were waived due to the retrospective nature of the study by the Institutional Review Board of Fudan University Minhang Hospital (2020-008-01 K).
Consecutive patients who underwent surgical thyroidectomy at our institution from January 2017 to December 2022 were retrospectively analyzed. Inclusion criteria were as follows: (1) patients underwent preoperative thyroid MRI; (2) nodules with postoperative pathological confirmation as either benign or malignant. Exclusion criteria were as follows: (1) diffuse bilateral lesions of different pathological types; (2) poor image quality with severe artefacts that cannot be used for diagnostic analysis; (3) patients who underwent FNA or partial thyroidectomy prior to MRI; (4) unclear postoperative pathological findings; (5) absence of nodules on MRI; (6) incomplete imaging data; and (7) lesions<5 mm. The surgical indications for thyroid nodules include those categorized as TI-RADS grade≥4, indicating a high suspicion of thyroid cancer, as well as symptomatic benign thyroid tumors resulting from compression, hyper-functioning thyroid adenomas, or concomitant hyperthyroidism.
MRI Acquisition
MRI examinations were performed on a GE Healthcare 1.5T MRI scanner (Excite HD; GE Healthcare, Milwaukee, WI, USA) with an 8-channel phased-array thyroid coil (Shanghai Chenguang Medical Technologies, Shanghai, China).
The MRI protocols included: (1) coronal fat-suppressed T2WI; (2) axial T1-weighted imaging (T1WI); (3) axial fat-suppressed T2W; (4) diffusion weighted imaging (DWI), b value = 0 and 800 s/mm
2; (5) multiphasic contrast-enhanced T1WI (MCE-T1WI). Contrast agent (Magen Vixen; Bayer Pharmaceuticals, Berlin, Germany) was injected at a dose of 0.2 ml/kg and rate of 3 ml/s, followed by 15 ml saline flush. Six sequential MCE-T1WI scans were performed at 30 s, 60 s, 120 s, 180 s, 240 s, and 300 s after the contrast agent injection. Patients were instructed to hold their breath during the scan. The total scan duration was approximately 16 min. Table
S1 lists the detailed MRI acquisition parameters.
MRI morphological analysis
Two radiologists, each with 5 and 9 years of experience in diagnostic thyroid MRI, independently evaluated the MRI images using the Advantage Workstation 4.5 workstation (GE Healthcare, Waukesha, WI, USA) and Picture Archiving and Communication System (PACS). Both radiologists were unaware of the pathological results of lesions and consensus was reached in cases where there was a disagreement.
The following parameters were utilized to assess the lesion: (1) size of the lesion, measured by the diameter of the largest dimension of the nodule, classified as 5-10 mm, 10-40 mm or ≥4 cm); (2) number of nodules, classified as unifocal or multifocal. The qualitative MRI morphological features are as follows: (1) non-enhanced features, including light pearl sign, black-white flower sign, restricted diffusion, cystic degeneration, flow-void signal, high signal intensity on T2WI, high signal intensity on T1WI, and low signal intensity on T2WI; (2) contrast-enhanced features, including enhancement patterns like no enhancement, gap-filling enhancement, pseudocapsule, hyperintense on T2WI with enhancement, wash-out pattern, fissure-filling enhancement, reversed halo sign in delay phase, hyperenhancement in early phase and change of lesion size in multiphasic enhancement. Detailed definitions and illustrations of the MRI morphological features are shown in Appendix S1.
TI-RADS
Two experienced US specialists with over 10 years of experience, who were unaware of the histopathological findings, performed the retrospective analysis of the US feature of thyroid nodules and reached a consensus. The US features evaluated encompassed composition, echogenicity, margins, shape, and calcification. All thyroid nodules were then classified according to the ACR-TIRADS, K-TIRADS, EU-TIRADS, Kwak-TIRADS, and C-TIRADS. In the cases of ACR-TIRADS, K-TIRADS, and EU-TIRADS, nodules categorized as ≥4 or 5 were considered to be malignant. For Kwak-TIRADS and C-TIRADS, nodules categorized as ≥4b and 4c, respectively, were regarded as malignant. The diagnostic performance of these five different TIRADS systems was subsequently calculated.
Nomogram construction and evaluation
Univariate and multivariate logistic stepwise regression analysis were employed to identify independent predictors in the training cohort and then a nomogram was developed. The optimal model was selected based on the Akaike Information Criteria.The goodness-of-fit of the model was assessed using the Hosmer-Lemeshow test, with a significance level of P≥0.05 indicating a good fit. To evaluate the performance of the nomogram, receiver operating characteristic (ROC) analysis, calibration curve analysis, and decision curve analysis (DCAs) were conducted.
The risk score derived from the regression coefficients (β coefficients) of the independent predictors by multiples of the minimum β coefficient (multiples are rounded to the nearest whole number) was used to develop a risk scoring system (RSS). To establish an optimal cut-off value for the risk score, the Yorden-index is maximized. The MRI-based prediction model was constructed using the aforementioned RSS and specific morphological features. The diagnostic performance of the model was evaluated by assessing sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV) and area under the ROC curve (AUC), and compared it with five different US TI-RADS. The AUCs were compared using the Delong test. The Net reclassification index (NRI) was utilized to determine the enhancement in predictive accuracy of the model.
Statistical analysis
All statistical tests were carried out using SPSS statistical software 26.0, R software 4.2.0 (
http://www.r-project.org, and Medcalc Software (version 20.100). Continuous variables were represented as mean±standard deviation (SD), while categorical variables were expressed as percentages. The t-test was employed to compare continuous variables, whereas the chi-square test or Fisher’s exact test was used to compared categorical variables. Concordance between two radiologists was assessed using the Kappa concordance test. The construction of the nomogram was accomplished using the R software package “rms”. Statistical analyses were conducted with two-tailed p values and 95% confidence intervals (CI). A significance level of
P<0.05 was deemed statistically significant.
Discussion
This study presented findings indicating that age, low signal intensity on T2WI, restricted diffusion, reversed halo sign in delay phase, cystic degeneration, and wash-out pattern were independent predictors for malignant thyroid nodules. The MRI-based nomogram was developed using these predictors showed satisfactory predictive ability and calibration in both training and validation cohorts. In addition, the MRI-based prediction model had superior specificity and NPV along with higher sensitivity compared to the other five TI-RADS, which improved the overall integrated prediction performance.
There have been a number of TI-RADSs that were developed as a result of previous studies using US to differentiate benign and malignant thyroid nodules. Kim DH et al. [
11] found that ACR-TIRADS, K-TIRADS, and EU-TIRADS 4 and 5 category thyroid nodules had a good sensitivity greater than 90% in the meta-analysis. Their specificity, however, remained relatively low, with K-TIRADS having the highest specificity at 61%, followed by ACR-TIRADS (49%) and EU-TIRADS (48%). The suboptimal accuracy of thyroid cancer diagnosis has resulted in an increase in unnecessary FNA, which are both invasive and yield nondiagnostic outcomes in approximately 25% of cases [
21]. This research study evaluated the performance of C-TIRADS, using a cut-off value of 4c category, and ACR-TIRADS, with a cut-off value of 5 category, in terms of specificity. C-TIRADS exhibited a specificity of 94.7%, while ACR-TIRADS demonstrated a specificity of 97.3%. However, both approaches sacrificed a significant amount of sensitivity, with values of only 51.3% and 64.6% respectively. In contrast, the MRI-based prediction model had the highest specificity of 93.4% while maintaining high sensitivity.
Risk stratification systems for thyroid nodules at ultrasound were frequently complicated and subject to low specificity and inadequate interobserver agreement. It is necessary to continuously improve these systems in order to minimize the unnecessary FNA. Wildman Tobriner et al. [
22] showed that an artificial intelligence-optimized TI-RADS can moderately improve specificity and sensitivity compared to TI-RADS. In recent years, there has been a gradual increase in the research of MRI for diagnosing both benign and malignant thyroid nodules [
23‐
27] and predicting the preoperative aggressiveness of papillary thyroid carcinoma [
28‐
34], demonstrating promising application prospects. The ongoing advancements in MRI technology, as a functional medical imaging modality, warrant continuous exploration of its potential in diagnosing and evaluating thyroid nodules, ultimately leading to the development of its clinical application. Consequently, it becomes necessary to establish a prediction model for distinguishing between benign and malignant thyroid nodules based on MRI.
The predictive model developed in this study for distinguishing between benign and malignant thyroid nodules using MRI exhibited robust diagnostic efficacy. Integration of this model with the ultrasound TI-RADS grading system had the potential to improve diagnostic and treatment strategies for thyroid nodules. In clinical practice, patients categorized as TI-RADS category 4 or 5 may receive follow-up if the MRI-based predictive model suggests a benign nature, thereby avoiding unnecessary FNA. Conversely, if the model indicates malignancy, FNA and surgical intervention are recommended. However, further research was necessary to validate the effectiveness of the MRI-based prediction model in enhancing the ultrasound TI-RADS grading system. Subgroup analysis revealed a decreased sensitivity of the MRI-based prediction model for thyroid nodules larger than 4 cm, with missed diagnoses of 7 cases of thyroid follicular carcinoma and 1 case of thyroid papillary carcinoma. These findings indicated a limited diagnostic efficacy of the model for follicular thyroid neoplasms (FTNs) larger than 4 cm. Lin et al. [
35] highlighted the ineffectiveness of various TIRADSs in managing FTNs, as evidenced by the high percentage (65.3 to 93.1%) of patients subjected to unnecessary FNA. This highlighted the importance of establishing a tailored stratification system for FTNs. Consequently, it was imperative to create MRI-based predictive models specifically designed for FTNs.
In this study, all nodules that exhibit non-enhancement and the light pearl sign were confirmed to be benign. It was observed that these two features were identified as specific characteristics indicated benignity. The presence of the black-white flower sign and gap-filling enhancement were found to be specific features associated with malignant nodules. The black-white flower sign was defined as petaloid and cerebrospinal fluid-like high signal on T2WI with an irregular apparent low signal in the centre of the lesion. From a pathological perspective, this sign corresponds to the stromal appearance, where it segregated follicular cells into irregular petal-like structures that aggregate in the central area of the lesion. The stroma contains few fluid content, resulting in low signal intensity on T2WI, whereas the presence of numerous follicular cells with higher fluid content contributes to the high signal on T2WI. Gap-filling enhancement was defined as a lesion located in the perithyroid region with a progressive enhancement pattern, with disruption of the contour line in the early phases and an intact contour line in the delayed phases. In the early phases of enhancement, interrupted thyroid contour was a sign of involvement of the thyroid envelope by the stromal component of the lesion [
36], whereas in the delayed phase this stromal enhancement leads to filling of the interrupted envelope.
In the RSS, restricted diffusion and reversed halo sign in delay phase were the two independent predictors with the highest scores of 11 and 9 points. DWI has been widely employed in the field of oncology for the purposes of diagnosing, monitoring, and prognosticating malignancies [
37]. The apparent diffusion coefficient (ADC) has been established as a valuable tool in the differentiation of benign and malignant thyroid nodules [
38‐
40]. Restricted diffusion was defined by the presence of a solid component within the lesion, which manifested as high signal intensity on DWI and low signal intensity on ADC. The assessment of restricted diffusion provided a direct and pragmatic approach in contrast to the utilization of quantitative ADC values. The reversed halo sign in delay phase was defined by the wash-out pattern of the central portion of the lesion, continuous enhancement in the peripheral area relative to the central part during the delay phase, and a blurred border. This finding aligns with a previous study conducted by Wang et al. [
41]. The interpretation implied that the central region of the tumor exhibits active proliferation of neoplastic cells, whereas the peripheral area primarily consisted of connective tissue with a profusion of tomur stroma, leading to sustained enhancement in the delay phase.
This study exhibited several limitations. Firstly, it adopted a retrospective design, thereby introducing an inherent selection bias, as it exclusively includes cases that underwent surgery intervention for pathological examination. Consequently, the exclusion of nodules selected for follow-up after FNA or those deemed too small for FNA may have influenced the outcomes. Secondly, the study did not encompass nodules smaller than 5 mm due to the limitation imposed by MRI imaging techniques. Thirdly, the qualitative parameters selected in this study possessed a certain degree of subjectivity. Although quantitative parameters were considered objective, their accuracy can be affected by various factors such as equipment, parameters, and measurement methods in clinical practice. In contrast, qualitative indicators provided convenience in clinical settings. Lastly, this study was limited to a single-center, and incorporating multi-center cases would enhance the validation of the MRI-based prediction model.
In conclusion, the utilization of MRI morphological features in the prediction model for benign and malignant thyroid nodules demonstrated a notable diagnostic efficacy, thereby establishing a reliable basis for clinical diagnosis and treatment decision-making. Moreover, the incorporation of MRI morphological features holds promise in enhancing the diagnostic accuracy of TI-RADS.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.