Introduction

Clostridium difficile is an anaerobic, spore-forming Gram-positive bacillus that is capable of causing diarrhea mediated by the production of C. difficile toxins A and B1. C. difficile infection (CDI) accounts for 15% to 25% of antibiotic-associated diarrhea2. The two serious risk factors of CDI are exposure to antibiotics exposure to the organism, usually during a hospital stay. Others factors are older age, gastrointestinal tract surgery and anti-acid medications including proton-pump inhibitors3,4. The severity of CDI ranges from very mild to toxic megacolon with septic shock. Metronidazole and vancomycin are the most frequently used first-line antibiotics to treat CDI. Fecal microbiota transplantation has recently been proposed as alternative treatment5,6. However, patients who do not respond to these medications may require intensive care or colectomy. According to surveillance, mortality from CDI is approximately 5.7%7.

The initial step in proper treatment of CDI is quick and accurate diagnosis of CDI. However, none of the existing C. difficile examinations is perfect in view of accuracy, cost and incubation time8,9,10,11. Nucleic acid amplification tests (NAATs) such as polymerase chain reaction and loop-mediated isothermal amplification provide quick and accurate diagnosis12,13,14, albeit a high cost. Though expensive, single-step diagnosis strategies utilizing only a NAAT is the simplest diagnosis strategy8. Multiple-step diagnosis is another strategy for which low cost exam, namely glutamate dehydrogenase (GDH) assay, is used as the first-step tool, followed by NAATs or by toxin tests only for specimens with positive result in the first test8. Detecting GDH seems a reasonable screening tool because this non-expensive and non-time-consuming test is sensitive15.

Since the last decade, an increasing number of observational studies concerning GDH assay accuracy for C. difficile detection have been reported15. The current understanding is that single-step GDH assay could not confirm the CDI. Nonetheless, evaluation of the single-step GDH assay is necessary for some reasons. Single-step GDH assay negative usually warrants CDI negative. In addition, we had to know the diagnostic test accuracy of single-step GDH assay to design two-step and three-step GDH assays. Shetty et al. reported a systematic review concerning this topic in 201115. However, due to considerable heterogeneity among studies, their study mainly focused on describing the summary receiver operating characteristic (SROC) curve and avoided presenting accurate pooled sensitivity and specificity. They avoided it because univariate meta-analysis leads to gross underestimates of sensitivity and specificity when the diagnostic test performance differs owing to local conditions15. Even though GDH is commonly accepted as a screening tool for CDI, no published meta-analysis has provided straightforward summary estimates of sensitivity and specificity of GHD to diagnose CDI. The recent meta-analysis methodology for diagnostic test accuracy strongly recommends use of a hierarchical model, which enables us appropriately deal with the tradeoff between sensitivity and specificity caused by the threshold effect16,17,18,19. In addition, many original studies have been published concerning GDH since the review by Shetty et al. was published. Thus, we believe an updated systematic review and meta-analysis using a hierarchical model is required to reveal how accurate the GDH assay is in diagnosing CDI.

Methods

Study registration

The protocol has been registered with the international prospective register of systematic reviews (PROSPERO) as number CRD4201603276020. This study protocol follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement and the Cochrane Handbook for Diagnostic Test Accuracy Reviews16,21. Institutional review board approval and patient consent were waivered because of the review nature of this study.

Eligibility criteria

Type of studies

We had planned to include both two-gate cohort studies and one-gate case-control studies. However, we eventually found no case-control study. We included a study with sufficient data to estimate the sensitivity and the specificity of GDH assay for CDI using PCR as reference standard. Along with a study with single-step GDH assay, we included a study that evaluated multi-step GDH assay when we could extract the separate GDH data from such study. Conference abstracts, short articles and non-full articles were allowed.

Participants

Meta-analysis was conducted based on numbers of specimens but not on numbers of persons. Specimens from cases with a possible diagnosis of CDI, diarrheal stool and liquid stood were preferred. When a study included formed specimens, we marked a high applicability concern for patient selection22. Human non-stool samples, animal stool samples and food samples were excluded.

Index test

As an index test, we included any stool GDH assay including commercialized kit and in-house assays.

Reference test

The stool cell cytotoxicity neutralization assay (CCNA) and stool toxigenic culture (TC) were used as reference tests8. Other tests such as NAATs and simple culture were not regarded as references in this study.

Outcome

First, we made a two by two contingency from the numbers of true positives/false negatives/false positives/true negatives presented in each original study. Then, we assessed the diagnostic odds ratio (DOR) and the area under the hierarchical SROC curve (AUC) to find the overall accuracy. The summary estimate of sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), positive predictive value (PPV) and negative predictive value (NPV) were also assessed16.

Literature search strategy

We had conducted a database search using PubMed, Embase, the Cochrane Library and Web of Science on January 5th, 2016. Search formulas were presented in Supplementary Text 1.

References to previously published reviews and those of included original studies were hand-searched.

Study selection

The two investigators independently conducted title/abstract screening after uploading a citation list into the software, Endnote X7 (THOMSON REUTERS, Philadelphia, USA). Articles that were not excluded by at least one investigator were passed for scrutiny. We scrutinized them by checking the full text independently. The final inclusion was determined after discussion to solve any discrepancies. Duplicate use of the same data was carefully excluded.

Data extraction

The two investigators independently extracted data and input them into Microsoft Excel 2013. Then, the data extracted by the two investigators were crosschecked. Discrepancies were resolved by discussion between the two investigators.

Quality assessment for bias and applicability

The two investigators independently evaluated each study. Seven domains of A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) evaluation sheet were scored22. If the two investigators gave different scores, the discrepancies were resolved through discussion.

For the current systematic review, we assessed the quality using the following principles. Excluding patients for whom the authors had difficulty judging whether the patients had CDI or not was a reason for a high risk of patient selection bias. No description of consecutiveness and randomness was a reason for an unclear risk of patient selection. Including formed stool was a reason for a high patient selection applicability concern. Risk of bias for index and reference tests was generally not suspected because we can judge the results of GDH, CCNA and TC unbiasedly. Bias in flow and timing was also not suspected because both index and reference tests were conducted on the same stool specimen.

A study without high risk of bias and high applicability concerns was regarded as a non-high-risk study.

Statistical analysis and quantitative synthesis

Data synthesis

When, two GDH assays were compared to a reference test in a report, one assay was selected in the following order: Chek-60, Quik Chek, Culturette followed by Triage. This order was decided based on a number of studies that assessed each assay and a number of patients that were assessed for each assay. Data from two index assays in a study were independently used for index-test-based subgroup analysis. Similarly, when both CCNA and TC were used as references in a report, we chose CCNA as a reference test because recent study suggested that CCNA is more reliable than TC23. Data from two reference tests in a study were independently used for reference-test-based subgroup analysis.

We used both hierarchical SROC curves and bivariate models16,17,18,19. To assess the overall accuracy, we calculated the DOR using a DerSimonian-Laird random-model and the AUC using Holling’s proportional hazard models24,25. According to a criterion of Jones et al. AUC > 0.97, 0.93–0.96, 0.75–0.92 and 0.5–0.75 were interpreted as “excellent,” “very good,” “good,” and “reasonable,” respectively26. A paired forest plot, hierarchical SROC curve and the summary estimate of the sensitivity and the specificity were obtained using the bivariate model 16. PLR and NLR were obtained from summary estimates of sensitivity and specificity. According to Grimes et al. PLR in the range of 2–5, 5–10 and >10 represent small, moderate and large increases of probability when the test is positive. Similarly, NLR in the range of 0.2–0.5, 0.2–0.1 and <0.1 represent small, moderate and large decreases of probability when the test is negative27. We also obtained PPV and NPV, which were calculated from summary estimates of sensitivity and specificity, as variables depending on pretest probability ranging from 0 to 100%.

As a sensitivity analysis, we conducted subgroup analysis including only non-high-risk studies and subgroup analysis based on reference tests. In addition, to compare the diagnostic accuracy, index-test-based subgroup analyses were carried out.

GRADE Evidence Profile table wad also presented28.

Heterogeneity

We used the I2 statistic to evaluate the heterogeneity of overall test accuracy among the studies: 0% meant no heterogeneity, 0% to 40% meant not important heterogeneity, 30% to 60% meant moderate heterogeneity, 50% to 90% meant substantial heterogeneity, 75% to 100% meant considerable heterogeneity29.

Software

A paired forest plot was made using Reviewing Manager ver. 5.3 (Cochrane Collaboration, Oxford, UK). The following commands of the “mada” package in the free software R were used: “madauni” for DOR, “phm” for AUC and “reitsma” for the hierarchical SROC curve and a summary estimate for the sensitivity and the specificity24,25. GRADE evidence profile table was output from GRADE website30.

Results

Study search

Of 684 articles that met the preliminary criteria, 304, 213 and 125 were excluded through removal of duplication, title/abstract screening and full-article scrutinization, respectively (Supplementary Figure 1). We finally found 42 eligible reports (Table 1, Supplementary Text 2). All the 42 reports used the cohort study approach and we found no case-control study. The 42 reports comprised 33 full-length articles, seven conference abstracts, a conference poster and a letter article, all of which were written in English. Among the 42, 17 were from the USA, six were from Canada, six were from the UK and most of the others were from developed countries. Seven reports described comparisons of two index tests and five reports described comparisons of reference CCNA and TC, thus, we eventually evaluated 54 cohorts.

Table 1 Characteristics of included cohorts.

As a reference test, 31 used CCNA and 23 used TC. As an index test, 18 used Chek-60, 18 used Quik Chek, six used the Culturette Brand Latex Test and five used Triage. The comparison between the index and the reference in each cohort ranged from 60 to 12365 with a median of 373. The total number of comparisons was 47904, which consisted of 4946 reference positive comparisons and 42971 reference negative comparisons. Across the 54 cohorts, the sensitivity ranged from 0.23 to 1 with a median of 0.94 and the specificity ranged from 0.64 to 1 with a median of 0.92 (Fig. 1).

Figure 1
figure 1

Paired forest plot.

TP: true positive. FP: false positive. FN: false negative. TN: true negative. Ind: index test. Pre: Premiere. QC: Quik Chek. C60: Chek 60. Cul: Culturette. Tri: Triage. Ref: reference text. CCNA: cell cytotoxicity neutralization assay. TC: toxigenic culture.

Among the 54 cohorts, 47 had high risk of flow and timing mostly due to duplicate use of multiple specimens from same patient. In addition, four had high risk of patient selection, three had high applicability concerns for patient selection and one had high applicability concerns for the reference test (Supplementary Figure 2). Eventually six cohorts were classified as non-high-risk cohorts.

Diagnostic accuracy across all index tests

Using data from all 42 cohorts consisting of 3055 reference positive comparisons and 26188 reference negative comparisons, DOR was 115 (95% confidence interval (95% CI) 77–172, I2 = 12.0%) and AUC was 0.970 (95% CI 0.958–0.982) (Table 2, Fig. 2A). According to Jones’ criteria, the AUC of 0.970 meant excellent overall diagnostic accuracy26. According to the first sensitivity analysis using data from 6 non-high-risk cohorts with 2745 comparisons, DOR was 189 (95% CI 54–660, I2 = 0%) and AUC was 0.986 (95% CI 0.976–0.998) (Table 2, Fig. 2B). For the second sensitivity analysis based on CCNA, DOR was 80 (95% CI 50–131, I2 = 0%) and AUC was 0.956 (95% CI 0.927–0.987) (Table 2, Fig. 2C). For the third sensitivity analysis based on TC, DOR was 189 (95% CI 106–337, I2 = 27.2%) and AUC was 0.979 (95% CI 0.970–0.988) (Table 2, Fig. 2D).

Table 2 Summary of results
Figure 2
figure 2

Hierarchical summary receiver-operator characteristic curves.

According to the 42 cohorts, the summary estimate of sensitivity was 0.911 (95% CI 0.871–0.940) and the summary estimate of specificity was 0.912 (95% CI 0.892–0.928). These sensitivity and specificity estimates yielded PLR of 10.4 (95% CI 8.4–12.7) and NLR of 0.098 (95% CI 0.066–0.142). Based on Grimes’ criteria, these likelihood ratios suggested a large increase and decrease of probabilities, respectively27. PLR and NLR calculated in subgroup analysis focusing on non-high-risk cohorts and TC reference also suggested large increase and decrease of probabilities, respectively. However, PLR and NLR calculated in sensitivity analysis focusing on CCNA reference suggested a moderate increase and decrease of probabilities, respectively.

GRADE Evidence Profile was presented as Table 3. Supposing the protest pretest probability is in the range 15–25%2, among 1000 tested subjects, there are 137–228 true positives, 12–22 false negatives, 684–775 true negatives and 66–75 false positives. PPV was 65–78% and NPV was 97–98%.

Table 3 GRADE evidence profile for diagnostic test accuracy by detecting glutamate dehydrogenase assay for Clostridium difficile infection (CDI).

Subgroup analysis based on index test

Check-60 was evaluated in 16 cohorts with 18737 comparisons. The DOR of 159 and AUC of 0.979 suggested excellent overall diagnostic accuracy. The sensitivity was 0.942 and the specificity was 0.901. The PLR of 9.5 and NLR of 0.064 suggested moderate increase and large decrease of likelihood ratio, respectively (Table 2, Figure 2E).

Quik Chek was evaluated in 15 cohorts with 6205 comparisons. The DOR of 152 and AUC of 0.980 also suggested excellent overall diagnostic accuracy. The sensitivity was 0.925 and the specificity was 0.918. The PLR of 11.3 and NLR of 0.082 suggested a large increase/decrease of the likelihood ratio (Table 2, Figure 2F).

Six cohorts evaluated the Culturette Latex agglutination test with 2151 comparisons. The AUC was 0.852 (95% CI 0.794–0.918) suggesting good overall diagnostic accuracy. The summary estimate of sensitivity of 0.610 was lower than those by Chek-60 and Quik Chek. The PLR was 8.6, which suggested a moderate increase of probability when the test is positive. The NLR was 0.420, which meant a small decrease of probability when the test is negative (Table 2, Figure 2G).

Five cohorts with 2353 comparisons assessed the diagnostic accuracy of Triage. Though excellent overall diagnostic accuracy was revealed by the AUC of 0.975, the specificity and PLR were lower than those for the other three assay kits (Table 2, Figure 2H).

Discussion

To the best of our knowledge, this is the first meta-analysis to provide the summary estimate sensitivity and specificity of GDH detection for CDI. Our analysis showed that detecting GDH had excellent AUC and that test results from GDH greatly changed the probability of CDI. We believe that our result was robust for the careful study search, the use of hierarchical model and low heterogeneity indicated by I2 < 30%. The quality-based subgroup analysis that replicated the results from all studies with any quality also support the robustness.

Reference-test-based sensitivity analysis revealed slightly discrepant results. When GDH assay was compared to reference test TC, the overall test accuracy was excellent. However, GDH assay seemed to have lower specificity when compared to reference test CCNA. Though both CCNA and TC are regarded as established standard examination for CDI, these two tests sometimes exhibit conflicting results. A large-scale prospective study by Planche et al. suggested that CCNA is a better reference test compared to TC because CCNA more accurately reflect mortality and CDI23. If we trust only the CCNA reference, the diagnostic accuracy of the GDH assay seems slightly degraded (Table 2, Figure 2C).

Index based subgroup analyses revealed that Chek-60 and Quik Chek, which were the most frequently evaluated kits, had the best performance. Although not supported by a sufficient number of studies, Triage seemed to lack specificity. The Culturette Brand Rapid Latex Test for CDI had clearly low diagnostic performance. Even though it detected GDH, this test was not designed for GDH. We have currently no reason to use the Culturette Brand Latex Test to detect GDH.

Once we assume the pretest probability was in the range 15–25%, PPV was 65–78% and NPV was 97–98%. While the GDH assay negative result is generally trustful, a positive GDH assay leads to wrong diagnosis for a third or a fourth of the tested population. Therefore, the currently used multi-step algorithm is a reasonable solution. In the medical resource abundant situation, NAATs can provide quick and accurate results for the second step. If use of NAATs is restricted, toxin detection is an alternative. However, toxin detection is not sensitive enough. Thus, we have to apply the NAATs as third step for GDH-positive toxin-negative specimens31. Even though some epidemiologic studies have suggested that CDI accounts for 15–25% of antibiotics-associated diarrhea, pretest probability should be judged by clinicians considering the patient’s clinical background and epidemiology in the area. Thus, the result of a GDH assay can be carefully interpreted.

To diagnose CDI in clinical practice, biochemical examinations that detect GDH, as well as toxin or nucleic acied of C. difficile in the stool of CDI-suspected patients are widely used. GDH is a metabolic enzyme that converts glutamate to α-ketoglutarate8,9,10,11. This enzyme commonly presents in many eukaryotes and microbes including C. difficile and other Clostridium species. To detect GDH in the stool, latex agglutination test was formerly used, whereas quantitative immunoassays are common these days. The key advantage of the enzyme immunoassays over the latex agglutination test is enhanced sensitivity due to quantitative evaluation using a standard curve. Moreover, the recently available lateral flow assay does not require a trained technician. Nowadays, we can obtain simple and accurate commercially-available enzyme immunoassay kits at low price though CCNA and TC are regarded as standard.

We need to comment on the limitations of our study. First, some of the included studies had high risk or high applicability concerns, therefore, we need to conduct sensitivity analysis excluding these studies. Second, subgroup analysis concerning the Culturette Latex test and Triage included a small number of studies; thus results were not sufficiently trustful. Third, the results were not consistent according to the reference tests. Thus, we provided GDH assay accuracies using two references separately. We believe these data are useful for future research. Fourth, recent advancement of PCR technique enables detection of a scarce load of microbes. PCR may be able to detect C. diff with higher sensitivity than culture though the culture is usually regarded as the gold standard. If we had used PCR as reference standard, the specificity would have been improved32.

In conclusion, we performed a systematic review and meta-analysis of the diagnostic test accuracy of detecting GDH for the diagnosis of CDI using a hierarchical model and a sufficient number of studies and comparisons. According to our analysis using 42 cohorts consisting of 29243 comparisons, the overall test accuracy was excellent, sensitivity was 0.911, specificity was 0.912 and the positive/negative results largely increased/decreased the probability of CDI. Suppose pretest probability was 15–25%, PPV was 65–78% and NPV was 97–98%.

Additional Information

How to cite this article: Arimoto, J. et al. Diagnostic test accuracy of glutamate dehydrogenase for Clostridium difficile: Systematic review and meta-analysis. Sci. Rep. 6, 29754; doi: 10.1038/srep29754 (2016).