Background
In the context of the human immunodeficiency virus (HIV) epidemic, clinicians frequently encounter extrapulmonary and disseminated forms of tuberculosis (TB) [
1‐
3]. In the USA, nearly 20% of the TB cases are extra-pulmonary [
2]. In England and Wales, 38% of the TB cases are extrapulmonary [
3]. Tuberculous pleuritis is a common manifestation of extrapulmonary TB [
4]. TB is the most common cause of pleural effusion in many countries [
4]. For example, studies from Spain [
5], Malaysia [
6], and Saudi Arabia [
7] showed that TB accounted for 25%, 44%, and 37% of all effusions respectively. In the USA, the annual incidence of tuberculous pleuritis has been estimated to be about 1000 cases, and approximately one in 300 patients with TB will have tuberculous pleuritis [
4,
8]. The incidence of tuberculous effusions may be higher in patients with HIV infection [
9].
Conventional diagnostic tests include microscopy of the pleural fluid, culture of pleural fluid, sputum pleural tissue, and pleural biopsy [
4]. These tests have limitations. Microscopy of the pleural fluid is rarely positive (<5%) [
10‐
12]. Culture of pleural fluid has low sensitivity (24% – 58%), and results are not available for weeks [
11‐
13]. Biopsy of pleural tissue, and culture of biopsy material are widely held to be the best methods of confirming the diagnosis [
4,
10,
12]. This combination may lead to the diagnosis 86% of the time [
10]. Although not perfect, culture and/or biopsy, therefore, are widely considered the standard of diagnosis [
4,
10]. However, pleural biopsy is invasive, operator-dependent, and technically difficult (particularly in children) [
14].
Because of the limitations of conventional tests, newer and rapid tests such as nucleic acid amplification tests – including polymerase chain reaction (PCR) – have been evaluated. Because of their high sensitivity and specificity in smear-positive respiratory specimens, these tests are now used – mainly in developed countries – for the direct detection of
M. tuberculosis complex in respiratory specimens [
15,
16]. NAA tests are categorized as commercial or in-house ("home-brew") tests. Commercial tests include the Amplified
Mycobacterium tuberculosis Direct Test
® (MTD) (Gen-Probe Inc, San Diego, CA), the Amplicor
® MTB tests (Roche Molecular Systems, Branchburg, NJ), and the recently discontinued LCx
® test (Abbott Laboratories, Abbott Park, IL). In the USA, the Amplicor test is licensed for use in smear-positive respiratory specimens; the MTD test is approved for smear-positive as well as smear-negative respiratory specimens [
16]. No commercial test is licensed for use in non-respiratory specimens. We conducted this systematic review and meta-analysis to determine the overall accuracy of NAA tests in the diagnosis of tuberculous pleuritis, and to identify factors associated with heterogeneity of results between studies.
Methods
Search strategy
We searched the following databases: PubMed (1985 – January 2003), EMBASE (1988–2002), Web of Science (1990–2002), BIOSIS (1993–2002), Cochrane Library (2002; Issue 2), and LILACS (1990–2002). The Journal of Clinical Microbiology, a high-yield journal for TB diagnostic studies, was also hand-searched separately. All searches were up to date as of August 2002. The PubMed search was updated in January 2003. The search terms used were: tuberculosis, Mycobacterium tuberculosis, nucleic acid amplification techniques, polymerase chain reaction, sensitivity and specificity, and accuracy. Experts in the field were contacted. Bibliographies from the included studies and relevant review articles were screened. We obtained lists of citations from companies that manufacture commercial tests. Although no language restrictions were imposed initially, for the full-text review and final analysis our resources only permitted review of English and Spanish articles. Conference abstracts were excluded because of the limited data presented in them.
Study selection
Our search strategy was designed to include all published studies on NAA tests for the direct detection of M. tuberculosis in pleural fluid specimens. For inclusion, a study had to:
1. report a comparison of an NAA test against a reference standard, and provide data necessary for the computation of both sensitivity and specificity;
2. include at least 10 pleural fluid specimens (since studies with very few specimens are vulnerable to selection bias [
17])
Studies on use of NAA tests on pleural biopsy and/or cytology specimens were excluded.
Two reviewers (MP and LLF) independently judged study eligibility while screening the citations. Disagreements were resolved by consensus. A list of excluded studies and a log of reasons for exclusion are available from the authors upon request.
Data extraction and quality assessment
Data extraction was performed by two reviewers. One reviewer (MP) extracted the data from all English studies. Another reviewer (LLF) extracted data from all Spanish articles. The second reviewer (LLF) also independently extracted data from a subset (36%) of the English articles, in order to determine the inter-rater agreement. The abstracted data included methodological quality, participant characteristics, test methods, and outcome data.
Quality assessment was performed using methods adapted from two guidelines on systematic reviews of diagnostic studies [
17,
18]. For each study, the following quality criteria were scored as fulfilled or not: 1) Independent comparison of NAA test against reference standard; 2) Cross-sectional design (versus case-control design) ; 3) Blinded (single or double) interpretation of test and reference standard results; 4) Consecutive or random sampling of patients; 5) Prospective data collection; 6) Inclusion of at least 10 specimens/patients with confirmed tuberculous pleuritis. If no data on the above criteria were reported in the primary studies, we requested the information from the authors. For the purposes of analysis, responses coded as "not reported" were grouped together with "not met." A high quality study was arbitrarily defined as that which met at least 5/6 criteria; a medium quality met 3 or 4 of the 6 criteria; and a low quality study met less than 3/6 criteria. Since discrepant analysis (where discordant results between NAA test and reference test results are resolved,
post-hoc, using clinical data) may be a potential source of bias, we preferentially included unresolved data.
Statistical analysis and data synthesis
We used standard methods recommended for meta-analyses of diagnostic test evaluations [
17‐
19]. Analyses were performed using Meta-Test [
20], and Stata version 8 (Stata Corporation, Texas). We computed the following measures of test accuracy for each study: sensitivity [true positive rate (TPR)], specificity [1-false positive rate (FPR)], positive likelihood ratio (LR+), negative likelihood ratio (LR-), and diagnostic odds ratio (DOR). These measures were pooled using the random effects model [
17‐
19].
Each study in the meta-analysis contributed a pair of numbers: TPR and FPR. Since TPR and FPR are not independent, we summarized their joint distribution by constructing a summary receiver operating characteristic (SROC) curve [
21]. Unlike a traditional ROC plot used to explore the effect of varying thresholds (cut-points) on TPR and FPR in a single study, each data point in the SROC plot represents a separate study the meta-analysis. The SROC curve (and area under the curve) represents the overall performance of the test, and depicts the trade off between sensitivity and specificity. A symmetric curve suggests that the variability in accuracy between studies is explained, in part, by differences in thresholds employed by the studies.
Heterogeneity in meta-analyses refers to the degree of variability in results across studies. We used the Chi-square and Fisher's exact tests to detect statistically significant heterogeneity. Stratified (subgroup) analyses were used to identify study design and test-related factors responsible for heterogeneity in test accuracy. Studies using commercial tests were analyzed separately from those using in-house tests. Studies with commercial tests were further stratified by type of test (brand). Finally, since publication bias is of concern for meta-analyses of diagnostic studies [
22], we tested for the potential presence of this bias using funnel plots and the Egger test [
23].
Discussion
Since conventional tests are not always helpful in establishing a diagnosis of tuberculous pleuritis, several rapid tests and biomarkers have been evaluated: Adenosine Deaminase (ADA) [
12,
14,
45,
51,
59,
62], Interferon-γ (IFN-γ) [
59,
60,
62,
63], lysozyme [
62], soluble interleukin 2 receptors [
63], and NAA tests [
24‐
61]. There has been an explosion of studies evaluating these rapid tests, and systematic reviews and meta-analyses are necessary to synthesize this growing body of literature. A recent meta-analysis summarized the evidence on ADA and IFN-γ for the diagnosis of tuberculous pleuritis [
64]. Both ADA and IFN-γ tests were found to be reasonably accurate at detecting tuberculous pleuritis. Our meta-analysis summarizes the evidence on accuracy of NAA tests in the diagnosis of tuberculous pleuritis.
Principal findings
The role of NAA tests has been reasonably well defined in pulmonary tuberculosis [
15,
16,
65], and guidelines exist for testing of respiratory specimens [
16]. In contrast, their role in the evaluation of specimens such as pleural fluid is not clear. Our results indicate that commercial NAA tests have high specificity and positive likelihood ratios. These test properties suggest a potential role for commercial tests in confirming (ruling in) the diagnosis of tuberculous pleuritis. These tests, however, have low and widely varying sensitivities – test properties that make them unhelpful in ruling out TB. Potential explanations for the low sensitivity include a low bacillary load in pleural fluid, or the presence of substances in the pleural fluid that inhibit amplification [
65]. Some authors have suggested that pleural fluids should be tested with NAA methods after the specimens are adequately pre-treated to remove inhibitors [
65]. All commercial kits appear to be designed to maximize only specificity. The MTD and LCx kits appear to have higher sensitivity than the Amplicor test. This comparison should be interpreted cautiously because it is based on few studies. Studies that directly compare these commercial tests (head-to-head) within the same study population are required to confirm these observations. The most important finding regarding in-house PCR is the significant heterogeneity across studies.
Clinical implications
To interpret the summary measures in a clinical context, consider a patient from a high incidence setting (e.g. countries such as Spain or Malaysia) who is estimated to have a 50% probability of pleural TB after clinical evaluation, and is evaluated with either the MTD test (LR+ of 17.4 and LR- of 0.31) or the Amplicor test (LR+ of 52.8 and LR- of 0.59). A LR+ of 17.4 for the MTD test suggests that patients with tuberculous pleuritis have a 17-fold higher chance of being MTD test positive as compared to patients without TB. If the MTD test were positive, the likelihood that this patient has TB increases from 50% to 95%, a probability that is sufficiently high to justify initiation of anti-tuberculosis treatment. A positive Amplicor test will raise the probability of TB from 50% to 97%. In contrast, if the MTD test result were negative, there is still a 24% chance that this patient has TB, probably not sufficiently low to rule out TB with confidence. In case of the Amplicor test, a negative test will reduce the probability from 50% to 40%, again not low enough to rule out TB.
Consider another patient from a low incidence setting (e.g. countries such as the USA), where the baseline probability of TB is low (e.g. 5%). If MTD test were positive, the likelihood that this patient has TB increases from 5% to 48%, a probability that justifies further investigation. A positive Amplicor test will raise the probability of TB from 5% to 75%. If the MTD® test result were negative, the baseline probability changes from 5% to 2%, a negligible shift that is unlikely to be helpful in clinical decision-making. In case of the Amplicor test, the probability changes from 5% to 4%. These examples illustrate the impact of the baseline prevalence (pre-test probability) on predictive values of the tests.
The accuracy of in-house PCR was heterogeneous across studies, and thus meaningful summary measures of accuracy could not be determined. The clinical implications, therefore, will depend on the setting. Institutions that use in-house PCR will have to rely on local data to decide on its accuracy and clinical applicability. In general, PCR for tuberculosis is known to have poor inter-laboratory reproducibility [
66].
In addition to the effect of diagnostic thresholds seen in the SROC plot, we identified two factors that were associated with heterogeneity among in-house tests: use of a case-control design, and use of the IS6110 target sequence. Case-control studies sample patients from the extreme ends of the clinical spectrum (an ideal, "extreme contrast" setting). If the sensitivity of a test is evaluated in seriously diseased subjects, and specificity in healthy individuals, both measures will overestimate the true diagnostic accuracy [
67]. Empiric research suggests that case-control studies overestimate the diagnostic odds ratio by a factor of 3, when compared to cross-sectional studies [
68]. Future studies of NAA tests could reduce this bias by avoiding the case-control design and recruiting consecutive series of patients in whom the test is clinically indicated (a realistic, "clinical practice" setting). The IS6110 target sequence is widely used in
M. tuberculosis fingerprinting [
69]. Because this target is specific to the
M. tuberculosis complex, and because it is usually present as multiple copies in the genome, PCR tests using this target might be more sensitive. Further research is underway to confirm this finding, in a larger meta-analysis of in-house PCR in the diagnosis of pulmonary tuberculosis.
Our data are consistent with the results of two previous meta-analyses on the accuracy of NAA tests. Sarmiento and colleagues summarized the accuracy of PCR in the diagnosis of smear-negative pulmonary TB [
70]. Their meta-analysis of 50 studies showed that both sensitivity and specificity estimates were heterogeneous. They concluded that PCR is not consistently accurate enough to be routinely recommended for the diagnosis of smear-negative TB. Our previous meta-analysis of 49 studies summarized the accuracy of NAA tests in the diagnosis of tuberculous meningitis [
71]. Commercial tests were found to have high overall specificity (0.98) and low sensitivity (0.56). The accuracy of in-house PCR was not determined because of heterogeneity in study results.
Limitations of the review
Our review has limitations. Our analysis lacks data on the incremental gain of NAA tests over and above the diagnostic performance achieved by using only conventional methods or other rapid tests like ADA and IFN-γ. The primary studies in our review did not report such data. Also, few studies in our review directly compared the NAA test against tests such as ADA and IFN-γ[
45,
51,
59]. Only one study [
59] directly compared the three tests in the same population, and showed that ADA, IFN-γand PCR were 88%, 86%, and 74% sensitive respectively, and 86%, 97%, and 90% specific respectively, for culture or biopsy-confirmed pleural TB. Since we did not include tests such as ADA and IFN-γ in our literature searches, our review cannot identify the most accurate test. Also, publication bias was a concern with the in-house tests. Exclusion of studies published in languages other than English and Spanish could have contributed to this potential bias.
Conclusions
In summary, our data suggest a potentially useful role for commercial NAA tests in confirming a diagnosis of tuberculous pleuritis. However, commercial kits have low and varying sensitivities, and therefore should not be used for excluding a diagnosis of tuberculous pleuritis. NAA test results, therefore, cannot replace conventional tests; they need to be interpreted in parallel with clinical findings and results of conventional tests. The accuracy of in-house PCR tests is poorly defined because of heterogeneity in study results. Clinically useful summary measures cannot be estimated for in-house PCR tests; their clinical applicability remains unclear.
Acknowledgments
Presented as a poster at the California Tuberculosis Controllers Association Spring 2003 Conference, Oakland, USA, April 24–25, 2003. MP acknowledges the support of the National Institutes of Health, Fogarty AIDS International Training Program (1-D43-TW00003-15). LLF is supported by the Fogarty Tuberculosis supplementary grant (TW00905-S1) and the Consejo Nacional de Ciencia y Tecnologia (CONACyT), National Council of Science and Technology, Mexico (scholarship number 129617). We are grateful to the authors of the primary studies who sent additional data, and representatives of the commercial test manufacturers for sending us lists of studies.
Authors' contributions
Study concept and design: MP, LLF, LWR, JMC.
Acquisition of data: MP, LLF.
Analysis and interpretation of data: MP, AH, LWR, JMC.
Drafting of manuscript: MP, JMC
Critical revision of the manuscript for important intellectual content: all authors
All authors read and approved the final manuscript.