Background
Design of the methodology
Confirmation of the objective
The acquisition of chemical compound information
Collection of chemical compound information
Databases | Development organizations | Websites | Overview | References |
---|---|---|---|---|
Alkamid Database | Ghent University, Belgium | Provided the source of plants, biosynthetic pathways and pharmacological information of N-alkyl amides compounds in traditional medicinal plants | [18] | |
Asian Anti-Cancer Materia Database | Institute of East-West Medicine, USA | Summarizes 700 kinds of anti-cancer drug information from Asia, which 80% is derived from medicinal plants. Afford the commonly used Chinese medicine name, Latin name, medicinal properties, the major compounds and other information | [19] | |
Chem-TCM | Institute of Pharmaceutical Science at King’s College, UK | Contains more than 350 TCMs and their more than 9500 compounds. Record their compounds related plants, chemical properties, common target activities and other information | [20] | |
Chinese National Compound Library | National Health and Family Planning Commission of the People’s Republic of China | A library of small molecule compounds consisting of core libraries and satellite libraries. Contains nearly 2 million small molecules of the physical and chemical information | [20] | |
CHMIS-C (A Comprehensive Herbal Medicine Information System for Cancer) | University of Michigan Medical School, USA | Provided 527 anti-cancer herb prescriptions, 937 components and 9366 small molecule structures for the clinical treatment of different types of cancer, combined a reference database and a molecular target aided database | [21] | |
CNPD (Chinese Natural Products Database) | Shanghai Institute of Materia Medica, Chinese Academy of Sciences, China | The CNPD database currently collects more than 57,000 natural products from 37 categories, of which 70% of the molecules are drug-like molecules. The relevant data include the CAS number, name, molecular formula, molecular weight, melting point and other physical and chemical properties of natural products | [21] | |
KEGG Compound Database | Kyoto University, Japan | Contains the name, molecular formula, relative molecular mass, structural formula, CAS number and corresponding chemical reaction or metabolic pathway of more than 17,000 metabolites and other small molecule compounds in the KEGG database | [22] | |
NAPRALERT (Natural Products Alert) | College of Pharmacy University of Illinois at Chicago, USA | A natural product relationship database according to more than 200,000 references which including pharmacology, biochemical information and data in various experiments (in vivo, in vitro, clinical, etc.) | [23] | |
NCI | National Cancer Institute, USA | Database contains the chemical properties of compounds, such as the molecular formula, CAS number, and other common physical and chemical properties; anti-HIV activity and other related biological activity prediction value | [24] | |
TCM Database@Taiwan | China Medical University, Taiwan | Including 443 Traditional Chinese medicine and 20,000 kinds of ingredients, for the physical and chemical properties and 3D structure. Support the complete contents of the compounds from Chinese medicine and related references | [25] | |
TCMID (Traditional Chinese medicine Integrated Database | East China Normal University, China | Provide information on all aspects of Chinese medicine, including formula, herbal and herbal ingredients. Also collected information on drugs that are studied in depth by modern pharmacology and biomedical science | [26] | |
TCMSP | Center for Bioinformatics, College of Life Science, Northwest A&F University, China | Contained 499 Chinese Parmacopoeia recorded TCMs and 29,384 compounds from these TCMs | [27] | |
Timtec | Timtec, Russia | Record more than 13,000 natural products and their derivatives structure information | [28] | |
TradiMed (Traditional Chinese medicine DB) | TradiMed, Korea | Contains 11,810 prescriptions in 3199 Chinese and Korean traditional medicines and 20,012 chemical composition information | [29] | |
ZINC | University of California, USA | Free compounds virtual screening database which included more than 35 million kinds of commercially available compounds 3D structure for docking | [30] |
Software and database of medicinal plant compounds
Pre-treatment of chemical compounds
Prediction of drug-like properties
ADME/T selection
Exclusion of false-positive compounds
The concept and performance of virtual screening
Methods | Molecular docking [38] | Pharmacophore model [39] | Small molecule shape similarity [40] |
---|---|---|---|
Theory basis | Molecular mechanics, quantum mechanics | Statistics | Graph theory and other mathematical methods |
Overview | Obtain the receptor structure information and locate its binding site, mimic the interaction between the receptor and its ligands | Establish pharmacophore model, evaluate the matching degree between ligands 3D conformation and pharmacophore models | To investigate the structural similarity of unknown molecules by druggable molecules at known targets |
Advantages | 1. Algorithm is mature 2. A variety of optional softwares | 1. High accuracy and efficiency 2. Several commercial pharmacophore databases | 1. Fastest screening 2. Abundant data resources |
Disadvantages | 1. Relatively large amount of calculation 2. Huge data preparation workload 3. Results analysis takes a long time | 1. Affected by the quality of pharmacophore model 2. Affected by the amount of protein crystals | 1. Low accuracy and rough results 2. Require operator able to develop chemical software |
Softwares | Websites | Features | References |
---|---|---|---|
Molecular docking | |||
Affinity | Based on simulated annealing, molecular mechanics and molecular dynamics simulation of molecular pairs of procedures, the calculation is more accurate | [44] | |
AutoDock | Famous molecular docking program developed by the Scripps Institute. Which is one of the most widely used docking software | [44] | |
DOCK | One of the most widely used molecular docking programs, free open access | [45] | |
DockVision | A set of docking applets that support multiple algorithms | [46] | |
DockIt | Provide Energy, PLP and PMF evaluation methods | [47] | |
eHiTS | eHiTS is an exhaustive flexible-docking method that systematically covers the part of the conformational and positional search space that avoids severe steric clashes, producing highly accurate docking poses at a speed practical for virtual high-throughput screening | [48] | |
FlexX | The method can be used in the design process of specific protein ligands. It combines an appropriate model of the physico-chemical properties of the docked molecules with efficient methods for sampling the conformational space of the ligand | [49] | |
Glide | Glide approximates a complete systematic search of the conformational, orientational, and positional space of the docked ligand. In this search, an initial rough positioning and scoring phase that dramatically narrows the search space is followed by torsionally flexible energy optimization on an OPLS-AA nonbonded potential grid for a few hundred surviving candidate poses | [50] | |
GoldDock | GOLD is an automated ligand docking program that uses a genetic algorithm to explore the full range of ligand conformational flexibility with partial flexibility of the protein, and satisfies the fundamental requirement that the ligand must displace loosely bound water on binding. Numerous enhancements and modifications have been applied to the original technique resulting in a substantial increase in the reliability and the applicability of the algorithm | [51] | |
SystemsDock | SystemsDock is a web server for network pharmacology-based prediction and analysis, which applies high-precision docking simulation and molecular pathway map to comprehensively characterize the ligand selectivity and to illustrate how a ligand acts on a complex molecular network | [52] | |
ZDOCK | Protein–Protein docking procedure based on geometric matching. With a goal of providing an accessible and intuitive interface, ZDOCK provide options for users to guide the scoring and the selection of output models, in addition to dynamic visualization of input structures and output docking models | [53] | |
Pharmacophore model | |||
Apex-3D | Activity prediction expert system with 3D-QSAR. pharmacophore identification based on logical structure analysis | [54] | |
DISCOtech | Distance comparison technique provide multi-drug group model for database search, auto recognize the priority | [55] | |
Discovery Studio | Powerful pharmacophore identification and database search software | [56] | |
GASP | Based on genetic algorithm to realize flexible stacking between drug molecules | [57] | |
SEAware | Chemically similar drugs often bind to biologically diverse targets, making it difficult to predict what off-target effects a drug might have by protein structure or sequence alone. The similarity ensemble approach (SEA) addresses this problem using a different strategy; it groups receptors according to the chemical similarity of their ligands, and can identify unknown relationships between ligands and receptors amenable to experimental testing | [58] | |
Small molecule shape similarity | |||
CerberuS | CerBeruS is a method developed for iterative screening. CerBeruS is based on Daylight fingerprints. CerBeruS proposes only highly similar molecules for testing. This strategy results in a high hit rate but is unlikely to identify new scaffolds or lead series | [59] | |
FlexS | FlexS is an incremental construction procedure. The molecules to be superimposed are partitioned into fragments. Starting with placements of a selected anchor fragment, computed by two alternative approaches, the remaining fragments are added iteratively. At each step, flexibility is considered by allowing the respective added fragment to adopt a discrete set of conformations. The mean computing time per test case is about 1:30 min on a common-day workstation | [60] | |
BRUTUS | BRUTUS aligns molecules using field information derived from charge distributions and van der Waals shapes of the compounds. Molecules can have similar biological properties if their charge distributions and shapes are similar, even though they have different chemical structures; that is, BRUTUS can identify compounds possessing similar properties, regardless of their structures | [61] | |
WEGA | (WEGA), is proposed to improve the accuracy of the first order approximation. The new approach significantly improves the accuracy of molecular volumes and reduces the error of shape similarity calculations by 37% using the hard-sphere model as the reference. The new algorithm also keeps the simplicity and efficiency of the FOGA (First Order Gaussian Approximation) | [62] |
Virtual screening based on pharmacophore model
Validation based on ligand molecule shape similarity
Accurate verification based on molecular docking theory
Analysis of target sets
Analysis and annotation of target information
Databases | Developers | Websites | Abstract | References |
---|---|---|---|---|
Binding DB | Skaggs School of Pharmacy & Pharmaceutical Sciences, USA | BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be candidate drug-targets with ligands that are small, drug-like molecules | [68] | |
BioGRID (Biological General Repository for Interaction Datasets) | BioGRID, Canada | BioGRID is an interaction repository with data compiled through comprehensive curation efforts. Our current index is version 3.4.151 and searches 63,354 publications for 1,493,749 protein and genetic interactions, 27,785 chemical associations and 38,559 post translational modifications from major model organism species. All data are freely provided via our search index and available for download in standardized formats | [69] | |
DAVID (Database for Annotation, Visualization, and Integrated Discovery) | Laboratory of Human Retrovirology and Immunoinformatics, USA | The database for annotation, visualization and integrated discovery Able to perform: Identify enriched biological themes, discover enriched functional-related gene groups, list interacting proteins and other functions | [70] | |
DRUGBANK | Canadian Institutes of Health Research, Canada | The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information. The database contains 9591 drug entries including 2037 FDA-approved small molecule drugs, 241 FDA-approved biotech drugs, 96 nutraceuticals and over 6000 experimental drugs. Additionally, 4661 non-redundant protein (sequences are linked to these drug entries | [71] | |
GeneMANIA | University of Toronto, Canada | GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional association data. Association data include protein and genetic interactions, pathways, co-expression, co-localization and protein domain similarity. You can use GeneMANIA to find new members of a pathway or complex, find additional genes you may have missed in your screen or find new genes with a specific function, such as protein kinases. Your question is defined by the set of genes you input | [72] | |
HPRD (Human Protein Reference Database) | Johns Hopkins University and the Institute of Bioinformatics, USA | HPRD is a database of curated proteomic information pertaining to human proteins. All the information in HPRD has been manually extracted from the literature by expert biologists who read, interpret and analyze the published data | [73] | |
IntAct | European Molecular Biology Laboratory | IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions | [74] | |
KEGG (Kyoto Encyclopedia of Genes and Genomes) | Kyoto University, Japan | KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database resource that integrates genomic, chemical and systemic functional information. In particular, gene catalogs from completely sequenced genomes are linked to higher-level systemic functions of the cell, the organism and the ecosystem. KEGG is widely used as a reference knowledge base for integration and interpretation of large-scale datasets generated by genome sequencing and other high-throughput experimental technologies. In addition to maintaining the aspects to support basic research, KEGG is being expanded towards more practical applications integrating human diseases, drugs and other health-related substances | [75] | |
MAS3.0 | CapitalBio, China | MAS (Molecule Annotation System) is a whole data-mining and function annotation solution to extract and analyze biological molecules relationships from public knowledgebase of biological molecules and signification. MAS analysis platform is a web client program for interactive navigation in the knowledge base. MAS uses relational database of biological networks created from millions of individually modeled relationships between genes, proteins, diseases and tissues. MAS allow a view on your data, integrated in biological networks according to different biological context. This unique feature results from multiple lines of evidence which are integrated in MAS’ database. MAS Help to understand relationship of gene expression data | [76] | |
MINT (The Molecular INTeraction Database) | Department of Biology, University of Rome, Italy | The MINT is a public repository for protein–protein interactions (PPI) reported in peer-reviewed journals. The database grows steadily over the years and at September 2011 contains approximately 235 000 binary interactions captured from over 4750 publications | [77] | |
PharmMapper Server | Shanghai Institute of Materia Medica, China | PharmMapper Server is a freely accessed web-server designed to identify potential target candidates for the given probe small molecules (drugs, natural products, or other newly discovered compounds with binding targets unidentified) using pharmacophore mapping approach. Benefited from the highly efficient and robust mapping method, PharmMapper bears high throughput ability and can identify the potential target candidates from the database within a few hours | [78] | |
Potential Drug Target Database (PDTD) | Shanghai Institute of Materia Medica, China | PDTD is a dual function database that associates an informatics database to a structural database of known and potential drug targets. PDTD is a comprehensive, web-accessible database of drug targets, and focuses on those drug targets with known 3D-structures. PDTD contains 1207 entries covering 841 known and potential drug targets with structures from the Protein Data Bank | [79] | |
RCSB PDB (Protein Data Bank) | Research Collaboratory for Structural Bioinformatics: Rutgers and UCSD/SDSC | A global resource for the advancement of research and education in biology and medicine. Along with our Worldwide PDB collaborators, RCSB PDB curates, annotates, and makes publicly available the PDB data deposited by scientists around the globe. The RCSB PDB then provides a window to these data through a rich online resource with powerful searching, reporting, and visualization tools for researchers | [80] | |
Reactome | European Bioinformatics Institute (EMBL-EBI) | The Reactome provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations—an extended version of a classic metabolic map, in a single consistent data model | [81] | |
STITCH (search tool for interactions of chemicals) | European Molecular Biology Laboratory, Germany | STITCH is a database of known and predicted interactions between chemicals and proteins. The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, from knowledge transfer between organisms, and from interactions aggregated from other (primary) databases | [82] | |
STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) | Swiss Institute of Bioinformatics, Switzerland | STRING is a database of known and predicted protein–protein interactions. The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, from knowledge transfer between organisms, and from interactions aggregated from other (primary) databases | [83] | |
TTD (Therapeutic Target Database) | Department of Computational Science National University of Singapore, Singapore | Therapeutic target database (TTD) is a database to provide information about the known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets. Also included in this database are links to relevant databases containing information about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, clinical development status. All information provided are fully referenced | [84] | |
Uniprot (Univeral Protein) | UniProt Consortium | The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc) | [67] |
Construction of network pharmacology
Tools | Websites | Features | References |
---|---|---|---|
CADLIVE | CADLIVE (Computer-Aided Design of LIVing systEms) is a comprehensive computational tool for constructing large-scale biological network maps, analyzing the topological features of them, and simulating their dynamics | [89] | |
Cytoscape | Cytoscape is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data | [90] | |
Graphviz | Graphviz is open source graph visualization software. Graph visualization is a way of representing structural information as diagrams of abstract graphs and networks. It has important applications in networking, bioinformatics, software engineering, database and web design, machine learning, and in visual interfaces for other technical domains | [91] | |
Pajek | Pajek is a program package for analysis and visualization of large networks (networks containing up to one billion of vertices, there is no limit-except the memory size-on the number of lines). It has been available for 20 years | [92] | |
VANTED | VANTED is a tool for the visualization and analysis of networks with related experimental data. Data from large-scale biochemical experiments is uploaded into the software via a Microsoft Excel-based form. Then it can be mapped on a network that is either drawn with the tool itself, downloaded from the KEGG Pathway database, or imported using standard network exchange formats. Transcript, enzyme, and metabolite data can be presented in the context of their underlying networks | [93] | |
VisANT | VisANT is an application for integrating biomolecular interaction data into a cohesive, graphical interface. This software features a multi-tiered architecture for data flexibility, separating back-end modules for data retrieval from a front-end visualization and analysis package | [94] | |
YANAsquare | YANAsquare provides a software framework for rapid network assembly (flexible pathway browser with local or remote operation mode), network overview (visualization routine and YANAsquare editor) and network performance analysis (calculation of flux modes as well as target and robustness tests) | [95] |