Background
Tuberculosis (TB) is the leading cause of death due to an infectious disease worldwide, killing 1.6 million people in 2017 [
1]. The EndTB strategy aims to reduce TB deaths by 95% and to cut new cases by 90% between 2015 and 2035 [
2]. A critical component of this strategy is early identification of individuals with TB and prevention of transmission. Although the roll-out of GeneXpert has facilitated rapid TB diagnosis, the test has limitations (e.g., lower sensitivity if low bacillary burden, in children, and in extra-pulmonary disease) [
3‐
5]. Furthermore, not all individuals with possible pulmonary TB are able to produce sputum [
6]. Newer blood-based diagnostics using gene expression profiles have the potential to address the limitations of GeneXpert and other sputum-based tests [
7].
Over the past several years, researchers have been able to identify nearly four dozen gene expression signatures that distinguish TB disease from latent TB infection (LTBI) [
8,
9], TB from other infections [
10‐
12], incipient pre-symptomatic TB disease and/or the future development of TB disease in those with LTBI [
13‐
15], and response to therapy [
16,
17]. Signatures can be used to understand the heterogeneous response to TB and help identify the pathways and underlying biology of TB disease progression. These signatures have been developed using multiple profiling technologies (microarray, RNA-sequencing, rt-PCR) and using a diverse set of computational and machine learning prediction algorithms. Some of these signatures were developed using direct training or cross-validation approaches on a single study, while others were developed using a meta-analytical approach [
17,
18]. Furthermore, several of these gene signatures have been validated by independent research teams on diverse cohorts in different settings and using multiple computational algorithms [
19‐
21]. Importantly, recent studies have systematically compared the performance of TB signatures, and their associated gene sets and original predictive models, across a multiple of TB datasets to compare the performance of these signatures to predict TB outcomes [
20,
21]. However, despite this work, there is not a single resource of compiled signature gene lists, methods or biomarkers to apply to new datasets, and most gene sets have not been independently validated using alternative computational methodologies.
Existing studies of blood-based TB diagnostics have another important limitation: most have not evaluated the impact of comorbidities on the modulation of the TB signature. In high-TB burden settings, much of the population has comorbidities that affect host immune response, and likely alter gene signatures of TB disease. Some of these have been directly studied (e.g., diabetes, HIV) [
22‐
24] and others have not (e.g., malnutrition, pregnancy, parasites). In particular, the role of malnutrition, which is known to modulate the innate and adaptive immune responses, has not been explored [
25,
26]. Malnutrition affects much of the population in TB endemic countries including one-third of the adult population in India, the country with 27% of the world’s TB cases [
1]. It is the most common secondary immunodeficiency and has been termed nutritional acquired immunodeficiency syndrome [
27,
28]. Undernutrition appears to impact both the innate and adaptive immune systems [
29], and so can conceivably alter gene expression in these patients in significant ways. For example, undernourished individuals have been noted to have decreased expression of Th1 cytokines and increased concentrations of Th2 cytokines, which hobbles the Th1 response against Mtb [
30,
31]. Prior research has also suggested that undernutrition may also diminish the effectiveness of TB vaccines. Furthermore, a study over two decades in the United States found that a BMI < 18.5 kg/m2 was associated with an adjusted hazard ratio of 12.43 (CI: 95% CI: 5.75, 26.95) for developing TB disease as compared to those with BMI greater than 18.5. In India, more than 50% of TB cases are attributable to undernutrition in most states [
32]. Because of the significant TB risk malnutrition poses and the gap in current knowledge, we sought to determine whether the published gene lists indicating TB disease accurately discriminate TB from LTBI in the setting of malnutrition in India.
In this work, we curated almost four dozen existing TB-related signature gene sets and developed our TBSignatureProfiler software toolkit. We also added two single-gene biomarkers to this comparison that were compared in a previous meta-analysis [
21]. This platform was used to evaluate the function of these signatures for distinguishing between TB and LTBI in severely malnourished individuals. We applied the TBSignatureProfiler to this condition to determine whether existing TB gene sets work in a severely malnourished population. While it is unlikely that these signatures will be implemented in clinical practice for detecting TB disease, we do note that many existing signatures were developed for this purpose. Thus, comparisons between prevalent and latent TB is the logical first step in evaluating and validating these signature gene sets in the setting of malnutrition. Once these signatures are established and validated, they can be used for more innovative and useful applications, such as predicting risk of progression or worsening disease, monitoring treatment efficacy, or the diagnosis of extrapulmonary disease.
Discussion
In this study, we present our set of 45 curated TB signature gene sets along with our TBSignatureProfiler software and use it to assess the impact of malnutrition on discriminative ability of a large number of signature gene sets. The TBSignatureProfiler is an important contribution that provides the first comprehensive, open-source evaluation tool to compare TB signature gene sets in a direct and reproducible way. This automated platform enables investigators to apply nearly three dozen TB gene sets directly to their datasets using multiple different scoring methods with tools to visualize signature gene set strength. Future analyses performed using these same gene sets on additional datasets can be directly compared with past results using the same scoring methods and analytic approach. In addition, new/future signature gene sets can be added and evaluated in a simple and straight-forward way—by merely adding them to the TB signature gene sets collection in the software. This functionality has never been previously available in the TB research field, despite the publication of many dozens of previous gene expression studies, signatures, previous evaluations and metanalyses [
17,
19,
20]. Ultimately, the TBSignatureProfiler will enable investigations into whether signature gene sets work in different geographic settings and in the context of different social conditions, contexts, or co-morbidities (e.g., high alcohol use), and efficiently evaluate and compare new signature gene sets in these populations as they are developed.
Overall, there were very few genes that overlapped between the signature gene sets. There were, however, many common functional families that are represented across the gene sets. For example, guanylate-binding proteins (GBPs) are IFN-induced GTPases and contribute to an inflammatory response by activating the NLRP3 and AIM2 inflammasome assembly [
51‐
53]. Interferons are produced during Mtb infection which could lead to activation of GBP5 and GBP6. These GBPs then further enhance the inflammatory response via inflammasome activation. FcGR1 (CD64) is the high affinity receptor for IgG and is expressed on most myeloid cells. In humans, FcGR1 is encoded by three genes, FcGR1A, FcGR1A and FcGR1C that are highly homologous. Interaction of IgG and FcGR1 results in cellular activation, including phagocytosis, generation of reactive oxygen species, antigen-presentation, release of inflammatory cytokines, and antibody-mediated cellular cytotoxicity [
54], FcGR1 expression on neutrophils has been proposed as a biomarker of infection and sepsis [
55]. Neutrophils in Juvenile Idiopathic Arthritis, an inflammatory disease, express higher levels of FCGR1B compared to controls [
56]. It is therefore not surprising that many signature gene sets encompassed either FcRG1A or FcRG1B. Kinase activation and phosphorylation cascades induced following immune cell activation are regulated by dual-specificity phosphatases (DUSPs) [
57]. Since active TB is associated with increased inflammatory response, the presence of DUSP3 in several signature gene sets is expected. Another gene found in many signature gene sets is ANKRD22, an ankyrin repeat protein with four copies of the ankyrin motif. The motif interacts with an array of unrelated proteins to affect many cellular processes [
58,
59] and it is likely that ANKRD22 expression is upregulated because of the enhanced inflammatory response in TB. Basic leucine zipper transcription factor ATF-like (
BATF)
2, is a transcription factor that belongs to the activator protein 1 family of transcription factors and contains the basic leucine zipper domain.
BATF2 dominance in the TB signature gene sets is consistent with its upregulation by type I IFNs [
60], and by IFNγ and Mtb in macrophages [
61].
The single gene biomarkers NPC2 and BATF2 were very effective in distinguishing between TB and LTBI in malnutrition. Although these single gene biomarkers are highly effective, activation of these genes are not specific to TB infection, but are associated with common inflammatory pathways (this may also be the case for some of the multi-gene “Disease” signatures). We note that NPC2 plays a key role in lysosomal cholesterol egress [
62,
63] and the expression of NPC2 is directly regulated by the nuclear factor kappa B subunit 2 (NF-κB2) protein [
64]. In addition, NPC2 plays a significant role in other infectious diseases, for example, upregulation of NPC2 is crucial for viral replication in Chikungunya, Zika, West Nile and Dengue infections [
65]. BATF has been shown to directly control TH17 differentiation [
66], and transcriptomic analysis has established that up regulation of BATF2 in HIV-specific CD8+ T cells leads to the inhibition of T cell function [
67]. Thus, although these genes are sensitive biomarkers for separating TB from LTBI, they lack in specificity to TB as their expression is associated with common processes involved in host immune responses to multiple infectious agents. Thus, we would recommend using more specific, multi-gene signatures if specificity is needed for the context.
The TBSignatureProfiler was applied to samples from severely undernourished individuals with TB and LTBI in India. This analysis found that existing blood RNA signature gene sets of TB generally work in the setting of severe undernutrition, although some differences in performance do exist. Differences seen in the application of the signature gene sets may reflect the size of the gene sets (i.e., smaller gene sets may not perform as well) and/or the settings in which those data were trained. A few selected signature gene sets do not perform optimally in the setting of severe undernutrition. These findings suggest that most TB signature gene sets are robust and could work in many different settings and with different comorbidities, but some gene sets perform slightly better in different contexts. This finding has important implications in India and many high TB-burden countries.
We had hypothesized that malnutrition might modulate the transcriptional profiles in different ways and using different mechanisms than in well-nourished individuals, but this was generally not the case. Malnutrition clearly affects the immune response with effects on macrophage activity and phagocytosis, antigen presentation, and induction of the Th1 immune response among other sequelae [
29]. It is plausible that these effects were not detected because the dominant immunomodulatory effect of TB that are common between well-nourished and malnourished individuals outweigh the more specific transcriptional impacts induced by changes in nutritional status. It is also likely that some of the signature gene sets themselves were developed in settings with high rates of malnutrition, so the effect of malnutrition on TB signature gene sets was incorporated. For example, Sambarey_HIV_10 signature was trained on data obtained from participants in Chennai and Bengaluru, India where malnutrition is highly prevalent. Further investigation is needed to understand the role of inflammation and immune response in the setting of malnutrition, although we show here that most existing TB signature gene sets work well in the setting of malnutrition.
Malnutrition is not the only comorbidity that is associated with TB incidence. Endemic countries have high rates of alcohol use, diabetes, HIV and other immunomodulatory conditions [
68‐
70]. Little has been done to explore whether blood-based transcriptional TB signatures may be altered in the setting of such comorbidities. Such studies are needed before these signatures can be accepted as validated diagnostic modalities. For example, it has been shown that the Zak_RISK_16 signature has a lower AUC in the setting of HIV infection [
13]. Furthermore, transcriptional profiling of individuals with diabetes and TB demonstrate activation of pathways associated with diabetes complications [
24]. It is possible that signature performance in other TB-endemic settings may also be affected by genetic or Mtb strain differences. Additional work is needed to determine the impact of other common comorbidities. The TBSignatureProfiler can play an important role in facilitating future analyses in these different settings.
This work is a demonstration that existing signature gene sets can be effectively used on samples from comorbid TB contexts, although the efficacy of the gene sets may vary. While it is unlikely that these gene signatures will be used in clinical practice to distinguish pulmonary TB from LTBI controls, our work does provide the promise that existing gene sets can be used to detect TB in circumstances where existing diagnostics are less effective, e.g. distinguishing extrapulmonary, paucibacillary, and pediatric TB from controls in malnourished individuals. In addition, evaluation of the subtle differences between signature gene set performance combined with the dissection of the gene set content may provide insight on potential mechanisms specific to demographic, comorbidities, or other context-related specifics for each patient group under consideration.
We recognize that this study has several limitations. While the study has large enough sample size to determine the significance of the signature gene sets’ abilities to distinguish between TB and LTBI, the sample size was not large enough to clearly distinguish between the performance of the top-scoring gene sets. Therefore, we can only conclude that many of the gene sets work well, but we cannot determine which is the best gene set in this context. It is possible that our results do not reflect the full spectrum of gene sets in severely malnourished individuals with LTBI, as severe malnutrition may blunt the TST response; however, our previous analyses suggest this is not universally true [
71]. In addition, the characteristics of the participants with TB and LTBI differed with regard to demographics (e.g. age) and risk factors (e.g. smoking and alcohol), and we do not have power to control for these differences in our analysis. While this may lead to the confounding of signature gene set strength differences between TB and LTBI, we point out that differences in demographics and co-morbidities are quite common among the TB and LTBI populations; these data represent the population dynamics of these groups. In addition, several of our signature gene sets were trained in pediatric cohorts [
13,
72], but we see no difference in performance between these child/adolescent gene sets between those trained on adults.
One final limitation of our TBSignatureProfiler platform is that many existing signature gene sets were trained on different transcriptional profiling platforms (microarrays, RNA-seq) using different machine learning and predictive modeling tools. Gene set scoring methods may not perform as well with the signature gene set compared to the original platform or method—this is an area of further development for the package that is beyond the scope of this paper. However, here we evaluate existing signature gene sets across multiple scoring methods to highlight which gene signature sets of TB are the most robust across platforms and methods, and thus should work well across a variety of predictive modeling approaches and contexts. This approach may also have the benefit of reducing the likelihood of model overfitting for individual signatures trained on specific datasets.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.