Background
Tuberculosis (Tb), caused primarily by
Mycobacterium tuberculosis (Mtb), is a major world-wide disease affecting millions of individuals every year, with high mortality rates. The World Health Organization’s goal of ‘End-Tb Strategy’ and the United Nation’s Sustainable Development Goals (SDGs) (Goal 3; target3) lay the roadmap for achieving a global goal of ending the Tb epidemic by 2030. The unmet medical need followed by the recent emergence of multi drug resistant (MDR) and extreme drug resistance (XDR) strains of Mtb [
1,
2] continues to be a roadblock in achieving this goal [
3‐
5]. There are very few drugs for treating Tb (MDR/XDR) and various reasons exist for the lack of new medicines, including the lack of funding in Pharmaceutical Research & Development for such neglected diseases. The prohibitive cost of drug development has been attributed to poor target selection and due to this, 87% of the late-stage failures can be avoided, as they show poor efficacy and side effects [
6]. In addition, the market size of Tb drugs is also low and not attractive to multi-national companies.
In the present situation, understanding of the complex biological responses or the systems biology of an organism is highly significant to improve and fasten the process of drug development by reducing the failure rates. Methods of selective chemical tailoring of molecules based on the knowledge of existing lead compounds against Mtb, which can also address the emerging resistance issues, has the potential of fueling the Tb clinical pipeline. In order to minimize the chances of failure and cost of Tb drug discovery, innovative approaches for designing newer chemical entities, using data intensive in silico approaches, involving experimentally validated data is the need of the hour. Keeping this in mind, the Open Source Drug Discovery (OSDD) project was initiated to facilitate the data-driven drug discovery [
7,
8].
We have previously reported an integrated model involving Systems Biology approach, incorporating an extensive genome wide evaluation, as well as understanding the sites of mutations in 1623 genome of clinical isolates of Mtb, to identify 33 potential non-toxic metabolic targets [
9,
10]. Our previous work emphasizes the use of systems biology approach to identify novel non-toxic targets with a motivation to shorten the process of drug discovery by exploiting computational methods focusing on Mtb. In order to identify drug targets with least likelihood of side effects, all 116 in silico essential genes were compared with the human genome and human microbiome at the sequence level. Of the total of 116 essential genes obtained from in silico gene knockout, 104 genes were found to have no homology to human genome sequences. In order to build a system biology approach to identify novel non-toxic target, it is desirable that all such target genes, share no homology to human genome and least homology to microbiome, to be a part of an important metabolic pathway, and to be evolutionary invariant in the clinical isolates.
In the present study, out of these potential 33 targets, 15 proteins having available crystal structures, were evaluated for the development of novel inhibitors. These targets were found to have no significant human homology. The concept of incorporating a proteome scale analysis in understanding the sites of mutations, followed by a comprehensive structure based drug design approaches [
11], and digging into the wealth of experimental data to generate potential leads against these specific targets, is presented here.
With an increase in the generation of data in medicinal chemistry (both computational and synthetic), understanding of the relationships and patterns between the available data, using in silico approaches, in order to initiate a hypothesis driven drug discovery becomes imperative [
12].
The published results of GlaxoSmithKline’s (GSK) large-scale high throughput screening of a library of chemical compounds against Tb were apprehended for their unique and non-redundant chemical structures. A list of total 776 compounds, out of which 426 compounds had a predicted target (based on computational studies) and 177 were potent non-cytotoxic drug sensitive Mtb H37Rv hits identified by the company, were made available [
13,
14].
A detailed chemical analysis of the existing small molecule databases, as well as the evaluation of any existing lead candidates available as Mtb inhibitors in these databases was performed for the current set of targets. We evaluated our set of potential 33 targets for their existing reported GSK inhibitors. Targets were shortlisted (Table
1); based on their availability of a GSK inhibitor in the database, Protein Data Bank (PDB) structure, essentiality (experimental/in silico) and a part of Metabolic Persister Genes (MPGs). The selected 11 targets were taken up for an extensive evaluation using various in silico drug discovery tools, involving pharmacophore analysis [
15,
16], molecular docking (Glide, Schrodinger and AutoDock) [
17,
18] and molecular dynamics (MD) simulations [
19,
20] in a few cases, using the Schrodinger suite (2015). Polypharmacological [
21] studies on the above targets, with an attempt of repositioning [
22] and recalibrating the old and existing drug families, are also reported here. All the targets were pre-screened using GSK open access database and OSDDChem database (
http://crdd.osdd.net/osddchem/) to generate new starting leads. Herein, we report the identification of 20 lead molecules including 4 FDA approved drugs as potential candidates for the inhibition of the proposed targets in Mtb metabolism.
Table 1
The output and input metabolite for the shortlisted 33 each genes
Targets involved in nucleic acid transactions |
Purines metabolism |
dfrA
| 7,8-dihydropteroate | Tetrahydrofolate |
folB
| 7,8-dihydroneopterin | 6-hydroxymethyl-7,8-dihydropterin |
Pyrimidines metabolism |
pyrF
| Phosphoribosyl pyrophosphate | Phosphoribosyl amine |
Tmk
| 2′-Deoxyuridine 5′ diphosphate/2′-deoxyuridine 5′-phosphate/deoxythymidine 5′-diphosphate/thymidine monophosphate | 2′-Deoxyuridine 5′-diphosphate/2′ deoxyuridine 5′-phosphate/deoxythymidine 5′-diphosphate/thymidine monophosphate |
Nucleotide metabolism |
rpiB
| Ribose-5-phosphate/ribulose-5-phosphate | Ribose-5-phosphate/ribulose-5-phosphate |
Dcd
| dCTP/dUTP | dCTP/dUTP |
atpE
| ADP | ATP |
nrdI
| met-NrdFox
| met-NrdFred
|
DNA replication |
nrdF2
| Ribonucleotides | Deoxyribonucleotides |
RNA pseudouridine synthesis |
Rv1711
| Pseudouridineguide snoRNAs (Pseudouridine) | RNA pseudouridine |
Targets involved in membrane biosynthesis |
Fatty acid metabolism |
fcoT
| Acyl-ACP | Fatty acids |
acpM
| FASII complex | AcpM (FAS-II complex) |
desA2
| Stearoyl-CoA (saturated fatty acids) | oleoyl-CoA (unsaturated fatty acids) |
echA3
| Δ2-enoyl-CoA | 3-hydroxyacyl-CoA |
echA18.1
| Δ2-enoyl-CoA | 3-hydroxyacyl-CoA |
Targets involved in carbohydrate metabolism |
Kerb cycle |
Carbohydrate metabolism |
pntAb
| Ethanol/citrate/Fd
red
2−
| Acetyl-CoA/2-oxoglutarate/Fdox
|
nuoA
| NADH | NAD+
|
canB
| CO2
| Bicarbonate |
Electron transport cycle |
ctaE
| Cytochromered
| Cytochromeox
|
Rv0763c
| NADP+ reductaseox
| Ferredoin NADP+ reductasered
|
nrdH
| CDP/UDP | dCDP/dUDP |
Mycothiol biosynthesis |
Mca
| (Mycothiol (MSH)/MS-electrophiles (MSR) | AcCys + GlcN-Ins AcCySR (N-acetyl-CyS-conjugate)/(mercapturic acid) + GlcN-Ins |
Targets involved in de novo pathways |
Essential cofactors |
kdtB
| 4′-phosphopantetheine | 3′-desphospho-coenzyme A |
Rv2361c
| Isopentenyldiphosphate | Decaprenyldiphosphate |
Mog
| Molybdopterin | Adenylatedmolybdopterin |
moaD2
| Cyclicpyranopterin monophosphate/molybdopterin converting factor | Molybdopterin/molybdenum cofactor |
Vitamin biosynthesis |
pdxH | Pyridoxamine 5′-phosphate | Pyridoxal 5′-phosphate |
Amino acid biosynthesis |
prsA
| Ribose-5-phosphate | 5-phospho-α-d-ribose 1-diphosphate |
Gap
|
D-glyceraldehyde 3-phosphate | 3-phospho-d-glyceroyl phosphate |
Peptide metabolism |
dapE | CysGly + Glu/N-succinyl-ll-2, 6-diaminoheptanedioate | Cys + Gly/succinate + ll-2,6-diaminoheptanedioate |
Carbon, nitrogen and sulfur metabolism |
Rv3600c | Pantothenate | 4′-phosphopantothenate |
The integrated analysis reported here, includes in silico toxicity evaluation for both the targets and the molecules; involves the consideration of the drug resistance and therefore, has a potential to generate new drug candidates. These can, thus be taken up for in vitro and in vivo screening against H37Rv and MDR strains of Mtb. The study should also serve the wider anti-tuberculosis research community by providing a list of genes and their potential inhibitors that are more likely to be validated for Tb drug discovery and development.
Discussion
We have thoroughly investigated the 15 metabolic genes (in silico and experimentally essential genes as well as a metabolic persister gene), in Mtb, which are highly invariant across the available 1623 strains including 1084 MDR strains of the bacteria, for detailed structure based drug discovery approaches. The Mtb specific invariant genes in the available genome were evaluated for their relevance in drug discovery, as these genes can form good targets for the inhibition of the growth of the organism. Based on the metabolic pathway analysis, it was observed that all of these 15 genes were found to be crucial candidates for structure based drug designing and none of the gene showed any convergence. The genes were found to act on the specific input metabolite, thereby suggesting that these metabolites can be further exploited to discover drugs based on the specific essential metabolic pathways. The analysis of input and output metabolites for the short-listed 15 genes revealed that all the genes, except Rv0390 (with unknown function), are involved in specific functions, without any interference amongst their primary metabolites, in any of their metabolic pathways. As there was no interference in the metabolic pathways, all the genes were considered as independent structure specific drug targets. This makes every gene unique in its action and thereby suggested that if a drug is designed against these essential genes, it will remain highly specific in the inhibition of metabolic pathway of Mtb by effectively acting on them. The absence of any convergence in the mechanistic action of these genes ensured that the functioning of the drug will not bring about any other stochastic damage and will be highly exclusive in its action. The enhanced functional annotations of the Mtb genome, obtained through a crowd sourcing approach was previously used by us to reconstruct the metabolic network of Mtb in a bottom up manner [
9]. It is understood that the possible limitation of assuming pathway independence lies in the extent to which all the pathways and their interconnections are reported in literature. However, given that literature might not be comprehensive and every interconnection between pathways might not be known, there exist a slight possibility of these shortlisted genes ending up in same unique pathway. With the well-characterized PDB data, these genes were analyzed and subjected to conformational analysis for structure dependent drug designing.
A) Targets involved in DNA transactions
1) Rv2763c (dfrA/folA)
The gene is involved in an essential step in de novo glycine and purine synthesis and dihydrofolate reductase activity. In folate biosynthesis, dihydrofolate reductase coded by dfrA catalyses the reduction of folate to 5, 6, 7, 8-tetrahydrofolate. Molecular docking was carried out on a set of reported 24 GSK inhibitors (for folA) and it was found that SB-439950 in the NAD binding pocket and ChEMBL2098242 in the Trimethoprim binding pocket exhibited the docking score of − 10.88 and − 10.28 respectively (Table
2). Structural and pharmacophore similarities with NS and the GSK inhibitor (Additional file
1: Figures S1, S2), resulted into a set of 830 molecules, where ChEMBL432987 showed a highest docking score of − 12.085 and ChEMBL32039 exhibited a docking score of − 10.19 (Table
4). The interaction analysis of ChEMBl2098242 revealed that NH
2 and NH are involved in the hydrogen bonding with Asp27, Ile94, and a Phe31 Pi-stacking.
2) Rv3607c (folB)
The gene is a MPG, which is experimentally essential and is involved in dihydroneopterin/folate biosynthesis. Binding studies were carried out in reference to the NS to understand the poses and interactions. Molecular docking was performed for all the GSK molecules including the reported GSK inhibitor (GSK2168465A; docking score = − 4.21) (Table
2). A compound library of ~ 1200 compounds was generated and evaluated using molecular docking studies (CSID: 20211002; best docking score = − 7.41) (Additional file
1: Figure S3) (Table
4).
3) Rv3247c (tmk)
The gene is a thymidylate kinase (dTMP Kinase). molecular docking was carried out with all the GSK molecules as well as the proposed inhibitors (docking score = − 2.6) (Table
2). A compound library of 450 compounds was generated with high structural similarities with the best GSK molecules (Additional file
1: Figures S4, S5).
On analysis, it was observed that four lead compounds ChEMBL3184131, ChEMBL1467435, ChEMBL20734 and ChEMBL219916 exhibited the strong binding affinity with the docking score of − 11.55, − 11.32, − 10.67 and − 9.17 respectively (Table
4).
4) Rv0321 (dcd)
The gene is involved in the interconversion of dCTP and dUTP and did not have a reported GSK inhibitor. Therefore, OSDDChem database was screened against the target to identify the top 100 compounds exhibiting highest binding energy, better than the NS (docking score = − 9.9).Clustering was carried out for the top ranked compounds, leading to the generation of a pharmacophore model, with survival score of 3.43 (Additional file
1: Figure S6a). In order to validate the quality of the generated pharmacophore model, clinically approved Tb drug Rifampicin showed a two-feature mapping with good fit value of 4.74.A molecular library (~ 1000 compounds) was generated using various databases, based on the best structural and pharmacophore similarities. The best binding affinity was obtained for ChEMBL533912 with ΔG score of − 9.3 kcal/mol (Table
4). The lead compound showed hydrogen bond interactions between NH of propanamide flanked in the flurophenyl with Tyr162. Nitrogen atom in the 1, 2, 4 triazol ring showed interactions with Ala167 and Ser161 with an interatomic distance of 3.5 Å each respectively.
DNA replication
5) Rv3048c (nrdF2)
The gene is involved in the DNA replication pathway. It has no NS attached in its PDB structure. Molecular docking studies were performed with the reported GSK molecules and the entire GSK set of molecules for comparison (Table
2). Library of compounds (~ 350 molecules) was generated based on structural and pharmacophore similarities. ChEMBL2098385 and CSID353848 exhibited a highest binding affinity and the best docking score of − 9.01 and − 7.41 respectively (Additional file
1: Figures S7, S8) (Table
4).
B) Targets involved in membrane biosynthesis
6) Rv0098 (fcoT)
The gene is a long chain acyl-coenzyme A (CoA) thioesterase that hydrolyses fatty acyl-CoA to fatty acid, hence involved in fatty acid metabolism. Top 100 compounds (with improved binding energy as compared to the NS, ΔG score of − 6.9 kcal/mol), were identified by virtual screening of the OSDDChem database. Library of ~ 1000 molecules was generated based on the structural and pharmacophore similarities. This library was further screened against the target. Two lead molecules, ChEMBL3349754 and ChEMBL3037996 exhibited binding affinity ofΔG = − 8.6 and − 9.1 kcal/mol, respectively (Table
4) (Additional file
1: Figure S9).
The interaction study of ChEMBL3349754 revealed that the carbonyl group of the phenyl acetate ring showed interactions with Asn83 which is also present in the binding site with an inter atomic distance of 3.1 Å and oxygen atom present in the eleventh position of trioxatricyclo rings showed strong interactions with Leu115 and tyr87 present in the binding site at a distance of 3.4 and 3.4 Å.
7) Rv1094 (desA2)
The gene is involved in conversion of saturated fatty acids to unsaturated fatty acids. In the biosynthesis of unsaturated fatty acids, the gene codes for acyl-[acyl-carrier-protein] desaturase which catalyses the conversion of stearoyl-CoA to oleoyl-CoA. It has no NS reported in its PDB structure. All the GSK molecules and reported GSK inhibitors were screened with the protein, in the binding pocket generated using SiteMap tool of Schrodinger. A compound library (180 compounds) was screened against the target. ADMET property prediction (QikProp, Schrodinger) and the docking studies with the known drug molecules (based on structural similarities, generated using QikProp) were also carried out (Additional file
1: Figure S10). ChEMBL3302699 exhibited a docking score of − 6.68 whereas ChEMBL535116 showed the strong binding affinity of − 6.79 with the existing drug Droxidopa, which is a synthetic amino acid precursor and acts as a prodrug to the neurotransmitter norepinephrine (Table
4). ChEMBL535116 showed hydrogen bond interactions with Trp32 and Glu29 and a Pi-stacking with Trp32 and Arg102.
Targets involved in de novo pathways (Essential cofactors)
8) Rv2965c (kdtB)
The gene is involved in CoA biosynthesis (4th step) and reversibly transfers an adenylyl group from ATP to 4′-phosphopantetheine, yielding dephospho-CoA (DPCOA) and pyrophosphate. There is no NS attached to its PDB structure, however it has a CoA. Receptor grid was generated using this CoA and SiteMap (Schrodinger) predictions of the binding pocket. Molecular docking was carried out with the reported GSK molecules as well as the entire GSK library to compare the results (Table
2). Compound library (~ 50) was generated using similar structural analysis of the GSK molecules (Additional file
1: Figure S11). ChEMBL2097847 exhibited a docking score of − 6.92 (Table
4).
9) Rv2361c (uppS)
The gene is involved in Z-decaprenyldiphosphate synthesis. The gene codes for a protein, which is involved in the synthesis of decaprenyldiphosphate, a molecule with a critical role in the biosynthesis of most features of the mycobacterial cell wall. The gene is also a part of MPGs. Molecular docking was performed with NS and top 10 poses were generated (Additional file
1: Figure S12). A library of molecules (~ 800 compounds) was generated based on the best binding from the set of 426 GSK molecules (Table
2). Highest docking score achieved for the compound ChEMBL2098151was − 12.62 (Table
4). The compound showed most important interactions of Arg244, Ser252, Arg292 and Arg250 with the cyclopropyl ester functionality. The interaction analysis also revealed an important Pi- interaction, which results in a drastic increase in the binding of the pyridine ring with Arg127.
10) Rv0865 (mog)
The gene is associated with the molybdopterin biosynthesis in Mtb. It has no NS/PDB ligand associated with the crystal structure. The OSDDChem library was computationally screened against the binding pockets of the target protein using AutoDock Vina. Molecular docking carried out on a set of 100top scored pose ligands exhibited strong binding affinity (ΔG value between − 8.5 and − 9.9 kcal/mol) and were further selected for compound clustering. The cluster generated from fingerprint based similarity and chemical clustering was used for the development of feature models. Pharmacophores were derived for the clustered and structurally similar compounds (matching to the feature model) available in ChEMBL and ChemSpider databases. Pharmacophores satisfying drug-like properties were further employed for virtual screening. Highest binding free energy obtained for ChEMBL255979 was ΔG as − 9.9 kcal/mol (Table
4). Molecular binding interaction of the protein complex revealed that carboxyl group which is placed in-between trimethyldecahydro-3, 12-epoxy and biphenyl ring showed interactions with Val11 at an atomic distance of 3.6 Å and the same carboxyl group showed two hydrogen bond interactions with Ser13 with a bond length of 3.1 and 3.3 Å respectively (Additional file
1: Figure S6b).
Targets with unknown function
Rv0390
This is a gene with undefined function. A diverse set of OSDDChem database, containing 1192 compounds, was docked and a series of top scoring compounds with ΔG = − 6.8 kcal/mol or above, were obtained. Clustering analysis was performed to determine the structural similarity between compounds. The large cluster representative structures were employed for the development of pharmacophore models, and compounds with survival score of 3.54 were considered to be active in the set. 3-dimensional Pharmacophore based virtual screening resulted in the retrieval of top ranked 100 compounds. Of these, two lead compounds viz., ChEMBL217735 and ChEMBL76817 exhibited the predicted binding energy of ΔG = − 8.0 kcal/mol each with acceptable pharmacokinetics properties (Table
4, Additional file
1: Figure S13). The Oxygen of butanoate moiety of ChEMBL217735 showed interactions with Ile 65 and Asp 62 at distance of 3.1 and 3.6 Å respectively. Hydrogen bond interactions were observed between carboxylate group of Ala 66 with a bond length of 3.1 Å.
Assays for the in vitro activity of dihydrofolatereductase (dfrA/folA, Rv2763c), dihydroneoterinaldolase (folB, Rv3607c), thymidylate kinase (tmk, Rv3247c) and Z-decaprenyldiphosphate synthase (uppS, Rv2361c), with the set of inhibitors having good IC
50 and MIC-
50 values have been reported in the literature [
30‐
34]. We have evaluated the structural similarities of these inhibitors (reporting highest activity) with the inhibitors of the targets shortlisted in the present study. The shortlisted inhibitors developed primarily in silico were subjected to molecular docking analysis with their respective targets for comparative studies. Our studies revealed that the inhibitors proposed for targets 1G3U (tmk, Rv3247c) and 1DG5 (dfrA/folA, Rv2763c) showed better in silico binding affinity as compared to their previously reported activities using in vitro analysis. The docking score of the theoretically proposed leads for tmk 1G3U (Rv3247c, docking score = − 7.01) and folA, 1DG5 (Rv2763c, docking score = − 9.48) were found to be higher than the inhibitors with reported IC
50 in vivo values. It may be noted that many successful inhibitors do not show the desired in vivo activity and similarly many in silico best inhibitors may not show the similar activity. However, in silico work does have a potential of reducing the failure rates and increases the chance of success in drug discovery.
As previously reported, these 15 shortlisted targets were further subjected to ‘druggability’ assessment. On analysis it was observed that out of these, 5 had unique crystal structures and 10 had multiple crystal structures available in PDB. The targets with more than 1 crystal structure were subjected to multiple sequence alignment for the selection of the best structure to be utilized for molecular docking studies. In the process, it was observed that these targets showed a significant deviation in the DS index. This suggested that the quality of the sequences of the PDB structures to be taken up for molecular docking studies play a vital role for the validity of results in a computational based study. On comparing the DS index of targets with unique crystal structures, it was observed that the ones with maximum sequence coverage exhibited high DS index as compared to the structures with minimal sequence coverage thus validating our approach for selection of potential targets, which are evolutionarily conserved as well. Therefore, this system analysis demands that the PDB structures for carrying out the analysis are relevant, only if the target sequence matches the invariant sequence of the genomes.
We had also reported the possibility of targeting NDH-I with an existing FDA approved drug for type-II diabetes, Metformin, as an adjunct therapy for Tb. Based on our previous analysis, it was evaluated that NDH-I has a putative role in giving rise to bacterial persistence [
35]. Additionally, similarity searches using QikProp tool of Schrodinger yielded some existing drugs having high structural similarities with the docked molecules. As an example, the structural comparison of the best-docked molecules for target 1G3U (Rv3247c), revealed Domperidone and Nemonapride (selective antagonist of the dopamine D
2 and D
3 receptors), as probable drug candidates for repurposing. For the target 1DG5 (Rv2763c), similarity studies with the best-docked molecule showed Tetroxoprim (a less used antimalarial and a derivative of Trimethoprim), as the closest known drug, which can be taken up for repurposing (docking score = − 10.19). Along with this, Droxidopa (analog of
l-Dopa) has shown a potential inhibition property for 1ZA0 (Rv1094) (docking score = − 6.68) (Table
4). We also performed an analysis in order to understand the effect of protein folding and conformational changes on the binding affinity. As an example, Rv2763c (dihydrofolate reductase) and its best PDB structure (PDB ID: 1DG5) was evaluated for comparison with its human homolog (PDB ID: 4QHV). The two proteins have very little homology in the sequence, but the structural comparisons indicated that the two proteins fold in a similar fashion. We observed that the ligand CHEMBL432987, which is the best binding molecule (docking score = − 12.08), does not bind well with the human homolog (docking score = − 8.00). This is considered as a drastic drop in the binding affinity between the two proteins. This could be attributed to the differences in the environment of both NADP and of the inhibitor between the Mtb and human structures. Residues like Ala101 and Leu102 nearing the N6 of NADP are very distinctly hydrophobic in pathogen as compared to the host [
36]. It, therefore, becomes important to address that the sequence homology is not 100% indicative of the similarities in the binding sites and hence, we do need to incorporate structural comparisons (protein folding) to understand the homology between the two structures.
Conclusion
We therefore, propose that with these methodologies, new potential drug-like leads can be generated with the success rate of 1/10 as compared to the existing 1/100 molecule entering clinical trials. These studies are expected to lead to the generation of a new anti-Tb drug candidate, primarily developed in silico. Therefore, our attempt to develop a comprehensive approach for the drug discovery by short-circuiting the research on generation of newer chemical scaffolds will positively influence the probability of clinical success of a drug candidate. We, therefore, suggest an integrated methodology, which will not only tackle the MDR form of Mtb but also the most important persister population of the bacterium.
Authors’ contributions
SKB conceptualized and designed the project. DK, MS, SM, CGSN and AB performed the molecular docking analysis. AKJ analyzed the biochemical pathways. DK, MS, AKJ and SKB wrote the manuscript. All authors read and approved the final manuscript.