Based on this observational, secondary analysis of blood samples collected in a representative patient population from a multicenter study, we tested five TRAb assays for their power to diagnose and predict relapse in GD patients. Three competition assays, including the recently released automated EliA anti-TSH-R, an automated assay based on bridge technology [
9], and one cell-based bioassay have been considered.
Diagnosis of GD
ROC curve analysis demonstrated highly comparable AUCs for the different assays except for the bioassay which showed a fairly lower AUC. Sensitivities varied from 79.5% (EliA anti-TSH-R) to 94.0% (IMMULITE TSI and RSR TRAb Fast). Previous studies described slightly higher sensitivities for IMMULITE TSI between 95 and 100% [
8,
10‐
12], while the manufacturer suggested a sensitivity for EliA anti-TSH-R varying between 83% at a cut-off 2.9 U/l and 79% at 3.3 U/l (grey-zone 2.9–3.3 U/l) [
13]. Thus, in our study, the performance of the bioassay was inferior to that reported in former studies examining different bioassay systems [
12,
14,
15]. BRAHMS TRAK showed a higher sensitivity than previously reported by Diana et al. [
12]. The RSR TRAb Fast, a modified version of the RSR 3rd generation TRAb ELISA [
16], exhibited a sensitivity of 94% which is higher than the 85–93% observed with the unmodified assay [
17,
18]. Overall, we report lower sensitivities compared to those described in a meta-analysis performed by Tozzoli et al. [
19] examining different 3rd generation assays (pooled sensitivity of 97.4%). There are several explanations for these differences. First, we evaluated a rather small cohort of patients and due to the retrospective design, selection bias towards lower severity patients is likely. This also explains to lower risk for relapse in our cohort as compared to previous studies [
20]. Still, out of the GD 268 patients with blood samples (see Additional file
1: Figure S1) 25 and 26 received surgery or RAI in the long term, respectively and median time to definitive therapy after diagnosis was 35 months (median, IQR 8–71, mean 47 months) which argues against selection bias. Second, previous studies compared assay performance between GD patient and healthy volunteers, while we included patients with different types of thyroid pathologies. Thus, our results may better reflect real life indications for TRAb.
It is well known that TRAb levels decline gradually under ATD treatment until they disappear in about three quarters of the patients after 18 months [
21]. In our opinion, this has a limited influence on our results as we only included patients up to 2.5 months after ATD initiation. By definition, every untreated GD patient should have TRAb. However, in the past up to 6–7% of GD patients were described to lack detectable TRAb, albeit these numbers are rather based on earlier TRAb assay generations [
22,
23]. Nevertheless, in our study four sera of GD patients (4.8%) were negative with all assays.
Specificities ranged from 87.5% for the bioassay to 97.9% for the EliA anti-TSH-R at the upper cut-off (3.3 U/l). This is in agreement with the specificity of 97.7% published by Luther et al. [
13] for the EliA anti-TSH-R. With EliA anti-TSH-R only one patient of the control group (autoimmune thyroiditis) had a borderline result (MOC 1.03 at cut-off 3.3). This serum was positive with all other assays (MOCs: RSR Fast TRAb 3.1, IMMULITE TSI 4.36, BRAHMS TRAK 1.39, TSAb Bioassay 1.29). Previously published specificities are generally higher (98.7–100%) compared to our results [
8,
10,
11,
17,
19]. However, many studies included healthy subjects, whereas our control group consisted solely of thyroid-related disease patients. The frequency of TRAb positivity for multinodular toxic goiter or primary autoimmune hypothyroidism has been shown to be about 10% with RSR 3rd generation TRAb ELISA [
17] and 10% for Hashimoto’s thyroiditis (HT) with BRAHMS TRAK [
12]. According to the literature, stimulating TRAb can be found in 5.5–22% of HT patients [
24,
25]. TRAb were detected in 1 out of 15 patients (6.7%) in the HT-control group. This particular serum was positive with all binding assays (MOCs: RSR Fast TRAb 2.28, IMMULITE TSI 1.39, BRAHMS TRAK 1.08) except with EliA anti-TSH-R (MOC 0.79 at cut-off 2.9) and TSAb bioassay (MOC 0.57). In this case both TSAb and TBAb bioassays were negative. According to Diana et al. TBAb can be observed in 4.2% of GD and in 9.3% of HT patients [
26]. In our study, TBAb were detected in low amount in only one patient with silent thyroiditis (data not shown). This could be due to the different bioassay setup used in the study by Diana et al. [
26] or to the limited sample size of our retrospective analysis.
Prediction of relapse
Added to the GREAT score two assays (i.e. BRAHMS TRAK, and IMMULITE TSI) showed a statistically significant improvement of its predictive capabilities. Thus, these assays might provide a clinical benefit in predicting the relapse risk of newly diagnosed GD patients offered ATD therapy.
Somewhat surprising was the finding that concentrations of EliA anti-TSH-R did not seem to differ largely between the two groups (see Tables
1 and
2 for medians and Fig.
3 for box-plots), whereas the average HR for relapse prediction for the assay itself was the highest of all (see column “HR for assay alone (Q4 vs. Q1-3) (95% CI)” in Table
4). We think that this finding occurred by chance due to our small sample size as suggested by the wide confidence intervals. In this subsample of our previously published dataset [
6], we observed a rather low overall recurrence rate of only 21.7% (originally 50.1%). This is slightly lower than usually reported from other cohorts in the past (30–60%) [
9‐
11]. Although we had such a low incidence of events, we still observed statistically significant findings. Thus, we are confident that our data are robust and valid. Especially, as we ensured a high follow up rate in our original study by performing follow-up interviews with patients and/or their primary care physicians in case there had not been a contact within the last 6 months with a study center. In Switzerland, patients typically stay with their general practitioner for many years.
The overall predictive accuracy of the TRAb assays alone is ranging from 0.67 to 0.71, being like the GREAT score with the routine TRAb (AUC of 0.69). Although some new TRAb assays showed statistically significant improvements, it is less clear if these improvements prove clinically relevant.
Fitted into a survival model, we compared the fourth quartile of TRAb assay results against the remaining lower three. HR for all TRAb assays were in the same range as those for the GREAT class II (i.e. HR 1.79; 95% CI 1.42–2.27). When added to the GREAT score predictive ability improved even further. Hence, we believe that the TRAb assays used in our study provide some benefit for patient assessment with only slight differences between the different manufacturers. There is a slight reduction in hazard ratios in GREAT class III, which we attribute mainly to the variance caused by few data points in this group.
All these findings do not apply to the cAMP bioassay. Although disease course prediction has been reported to be improved by using bioassays, we could not replicate similar results [
27,
28]. Even the IMMULITE TSI assay by Siemens did not have unrivalled predictive capabilities, albeit it is supposed to specifically detect only stimulatory antibodies. One reason might be that our sample size has not been large enough for a confirmatory finding.
Overall, the fact that a single factor in predicting the outcome of GD patients under ATD therapy is insufficient and needs to be combined with other factors. Accordingly, the addition of the new assays to the GREAT score is better than the predictive power of the assays alone. This also explains why previous attempts to predict relapse risk have failed [
4,
7,
11‐
17]. Additionally, it leaves ample space for further research, either on even more specific TRAb or entirely new biomarkers (e.g. cytokines, genetic markers).
We acknowledge several limitations in our study. First, this study is retrospective in design. However, we could gather most data from medical records and we have a long enough follow-up. Second, although we analyzed the blood samples of 332 patients, we had to exclude all but 83 from analysis because a lot of samples were drawn long after ATD treatment initiation. As an exclusion criterion, we chose an ongoing ATD therapy duration for more than 2.5 months. We randomly chose this cut-off as it allowed us to use approximately 1/3 of our dataset. Although, there is a steady fall in TRAb levels during ATD treatment, we do not think that this has inflicted our results. Whereas TRAb levels seem to fall more strongly within 1–3 months after thyroidectomy [
29], this decline is less pronounced in patients receiving ATD therapy [
30‐
32]. Thus, we think that including blood samples from patients being up to 2.5 months under ATD therapy did not introduce substantial bias.
Third, we have longer treatment times than recommended by current evidence [
4,
33]. Median treatment time was similar in both groups (19 vs. 18 months). This is explained by our retrospective design. Physicians and patients usually opt for an extended medical therapy before referral to a thyroid ablative procedure. We hold it unlikely that this might have influenced the results, as treatment duration over 18 months have been found to be of no benefit regarding relapse rate [
4].
Forth, our study centers used different routine TRAb assays over the time course of our study. One might argue, that this might have introduce bias. In this case, it should be expected that our results were shifted towards non-significant findings as it disperses our baseline values. Nevertheless, we still found good prognostic accuracy despite inconsistencies in our data set compared to the one from the original GREAT score publication [
5], underscoring the consistency of the GREAT score.
Fifth, we used a convenience sample based on a biological repository and had only limited samples available for measurement of TRAbs. Also, we did not use the novel Thyretain bioassay which may have much better performance compared to older bioassays [
34]. This should be evaluated in future studies.
Finally, due to our inclusion criteria, seronegative patients with Graves’ hyperthyroidism are not represented in our study and it remains unclear how well our findings apply to this patient population. However, every new TRAb assay generation into clinical practice has reduced this population further [
19]. It is believed that even those seronegative have TRAb production confined to the thyroid itself or adjacent lymph nodes [
35].