Prediction of CTL epitopes using QM, SVM and ANN techniques

doi:10.1016/j.vaccine.2004.02.005

Vaccine

Volume 22, Issues 23–24, 13 August 2004, Pages 3195-3204

https://doi.org/10.1016/j.vaccine.2004.02.005 Get rights and content

Abstract

Cytotoxic T lymphocyte (CTL) epitopes are potential candidates for subunit vaccine design for various diseases. Most of the existing T cell epitope prediction methods are indirect methods that predict MHC class I binders instead of CTL epitopes. In this study, a systematic attempt has been made to develop a direct method for predicting CTL epitopes from an antigenic sequence. This method is based on quantitative matrix (QM) and machine learning techniques such as Support Vector Machine (SVM) and Artificial Neural Network (ANN). This method has been trained and tested on non-redundant dataset of T cell epitopes and non-epitopes that includes 1137 experimentally proven MHC class I restricted T cell epitopes. The accuracy of QM-, ANN- and SVM-based methods was 70.0, 72.2 and 75.2%, respectively. The performance of these methods has been evaluated through Leave One Out Cross-Validation (LOOCV) at a cutoff score where sensitivity and specificity was nearly equal. Finally, both machine-learning methods were used for consensus and combined prediction of CTL epitopes. The performances of these methods were evaluated on blind dataset where machine learning-based methods perform better than QM-based method. We also demonstrated through subgroup analysis that our methods can discriminate between T-cell epitopes and MHC binders (non-epitopes). In brief this method allows prediction of CTL epitopes using QM, SVM, ANN approaches. The method also facilitates prediction of MHC restriction in predicted T cell epitopes. The method is available at http://www.imtech.res.in/raghava/ctlpred/.

Introduction

T cells are a vital component of the machinery of protective immunity, both directly by recognizing and eliminating the self-altered cells and indirectly by controlling the production of antibodies by the cells of B lineage [1]. The former function is controlled by cytotoxic T lymphocytes (CTL) [2]. The CTL cells recognize proteolysed fragments of the protein in combination with MHC class I molecules [3], [4]. They recognize short peptides of 8–10 amino acids. The interaction of T cell receptor (TCR) with MHC peptide complex can be highly flexible, so that a single TCR can recognize large number of peptides in the context of single MHC molecule [5]. Hence, identification of CTL epitopes is crucial in understanding the rules of T cell activation and designing of synthetic vaccines [6]. The identification of CTL epitopes have paved a way towards cancer immunotherapy and many other infectious diseases.

In the past, a number of methods have been developed for prediction of T cell epitopes from protein sequences. These methods can be classified as direct and indirect methods. In 1980s, direct prediction methods based on structural and sequential analysis of T cell epitopes were developed [7], [8], [9], [10]. DeLisi and Berzofsky [7] proposed that the critical requirement of T cell epitopes is its ability to form stable amphipathic structure. Based on this hypothesis, a program AMPHI was developed [8], [9]. Another algorithm SOHHA was developed based on the assumption that T cell epitopes consist of a helix of 3–5 helical turns with a narrow strip of hydrophobic residues on one side. These approaches were superseded after analysis of MHC peptide complex by X-ray crystallography, which demonstrated that peptide bound in MHC groove have extended conformation [12].

Sequential models for T cell epitope prediction were also developed, which relies on the occurrence of motifs in the primary sequence rather than considering the secondary structure [13], [14], [15]. In 1988, Rothbard and Taylor collected nearly 57 T cell epitopes and based on the patterns, they published a list of motifs [14]. The proposed motifs are 3–4 residues consisting of glycine followed by hydrophobic residues. Further, an algorithm was developed based on association of cysteine containing T cell epitopes and certain other residues. The algorithm searches for triplets including CAK, CLV, CKL and CGS in the peptide sequence [13]. In 1995, two computational T cell epitope prediction tools EpiMer and OptiMer were developed based on knowledge of MHC binding motifs [11]. OptiMer predicts amphipathic segments of protein with high motif density and EpiMer locates the segments of protein with high motif density. These direct prediction methods based on structural or sequential models have low accuracy [16]. The main cause of low accuracy may be insufficient data and less specificity of T cell receptors (TCRs).

In the last decade, a number of indirect methods have been developed that predict MHC binders instead of T cell epitopes. The currently available indirect methods are based on structure, binding motifs, matrices or Artificial Neural networks (ANNs) [17], [18], [19], [20], [21], [22], [23], [24]. Due to more specific interaction of MHC and peptides, performance of these methods are better in comparison to direct T cell epitope prediction methods. The major limitation of these methods is that they cannot discriminate between T cell epitopes and non-epitope MHC binders. These methods only predict the MHC binders from antigenic sequences.

In this study, an attempt has been made to develop a direct method for prediction of CTL epitopes. The data of CTL epitopes and non-epitopes was obtained from MHCBN version 1.1, a comprehensive database of MHC binders and non-binders [25]. The methods based on QM, SVM and ANN have been developed to discriminate CTL epitope and non-epitopes.

The methods based on QM, ANN and SVM achieved an accuracy of 70.0, 72.2 and 75.5%, respectively, when evaluated through Leave One Out Cross-Validation (LOOCV). The results clearly illustrate that machine-learning techniques are better in comparison to quantitative matrices. The performance of machine learning techniques was further enhanced by devising consensus and combined approaches based on SVM and ANN. The combined prediction approach achieved a sensitivity of 79.4%, which is better as compared to any individual methods. The specificity of consensus approach is 88.4%, which is better as compared to any other individual methods.

The methods developed in this study were also evaluated on a blind dataset that does not contain any pattern used in training or testing. The performance of these methods were evaluated on two subgroups: (i) one subgroup having CTL epitopes and non-epitopes MHC binders, (ii) second subgroup having CTL epitopes and MHC non-binders. The performance of all methods was fairly good on both subgroups as shown in Table 6. This demonstrates that methods developed in this study are able to discriminate between CTL epitopes and non-epitopes MHC binders, which is not possible through MHC binder prediction methods.

Finally, MHC restriction of predicted CTL epitopes were examined using quantitative matrices-based MHC binder prediction method [23]. The quantitative matrices-based method will determine MHC binding specificity of T cell epitopes. A schematic view of prediction method has been shown in Fig. 1. In summary, this comprehensive method will speed up the process of vaccine development for various dreadful diseases like cancer and AIDS.

Section snippets

Datasets

All peptide sequences of the CTL epitopes and non-epitopes were drawn from MHCBN version 1.1 [25]. Initially, 1334 CTL epitopes of 9 amino acids with varying T cell activity were obtained from the database. All duplicate epitopes and epitopes having unnatural amino acids were removed. The final dataset consisted of 1137 CTL epitopes interacting with nearly 170 MHC class I molecules. A total of 340 CTL non-epitopes of 9 or more amino acids were extracted from MHCBN. They were chopped to obtain

Quantitative matrices

In case of QM, the contribution of each residue for each position of peptide in T cell activity was quantified. A matrix with weights for each amino acid residue in every position of peptide was generated using Eq. (1). The QM is shown in Table 1. The effect of each residue on T cell activity of peptide could be easily estimated. The QM-based method was able to classify the data with 70.0% accuracy at default threshold where sensitivity and specificity of prediction was nearly equal. The

Discussion and conclusions

It was observed in mid 1990s that the performance of all the previously published T cell epitope prediction methods was quite poor [16]. The performance of these methods were not even significantly better than random prediction. The lack of sufficient amount of data about T cell epitopes may be the prime cause of poor performance [16]. The success of a prediction method depends on the quality and quantity of data. To predict T cell epitopes with fair accuracy, a large number of MHC binders

Acknowledgements

The authors are thankful to Sanjoy Paul and Amrita Lama for carefully reading the manuscript. The authors are thankful to Council of Scientific and Industrial Research (CSIR) and Department of Biotechnology (DBT), Govt. of India for financial assistance. Manoj Bhasin is a recipient of a fellowship from CSIR. This report has IMTECH communication No. 016/2003.

References (34)

E.O. Long et al.
Pathways of viral antigen processing and presentation to CTL: defined by the mode of virus entry?
Immunol. Today
(1989)
S. Buus
Description and prediction of peptide-MHC binding: the ‘human MHC project’
Curr. Opin. Immunol
(1999)
C.J. Stille et al.
Hydrophobic strip of helix algorithm for selection of T cell-presented peptides
Mol. Immunol
(1987)
G.E. Meister et al.
Two novel T cell epitope prediction algorithms based on MHC-binding motifs; comparison of predicted and published epitopes from Mycobacterium tuberculosis and HIV protein sequences
Vaccine
(1995)
A.J. Deavin et al.
Statistical comparison of established T cell epitope predictors against a large database of human and murine antigens
Mol. Immunol
(1996)
K. Gulukota et al.
Two complementary methods for predicting peptides binding major histocompatibility complex molecules
J. Mol. Biol
(1997)
H.P. Adams et al.
Prediction of binding to MHC class I molecules
J. Immunol. Methods
(1995)
P.A. Reche et al.
Prediction of MHC class I binding peptides using profile motifs
Human Immunol
(2002)
A.S. De Groot et al.
Immuno-informatics: mining genomes for vaccine components
Immunol. Cell Biol
(2002)
G.J. Hammerling et al.
Antigen processing and presentation—towards the millennium
Immunol. Rev
(1999)

C. Watts et al.

Pathways of antigen processing and presentation

Rev. Immunogenet

(1999)

S. Brunak et al.

Identifying cytotoxic T cell epitopes from genomic and proteomic information: “The human MHC project”

Rev. Immunogenet

(2000)

C. DeLisi et al.

T-cell antigenic sites tend to be amphipathic structures

Proc. Natl. Acad. Sci. U.S.A

(1985)

Cornette JL, Margalit H, DeLisi C, Berzofsky JA. The amphipathic helix as a structural feature involved in T cell...

J.L. Spouge et al.

Strong conformational propensities enhance T cell antigenicity

J. Immunol

(1987)

L.J. Stern et al.

Crystal structure of the human class II MHC protein HLA-DR1 complexed with an influenza virus peptide

Nature

(1994)

S. Mouritsen et al.

T-helper-cell determinants in protein antigens are preferentially located in cysteine-rich antigen segments resistant to proteolytic cleavage by cathepsin BL, D Scand

J. Immunol

(1991)

Cited by (312)

Protein subunit vaccines: Promising frontiers against COVID-19
2024, Journal of Controlled Release
The emergence of COVID-19 has posed an unprecedented global health crisis, challenging the healthcare systems worldwide. Amidst the rapid development of several vaccine formulations, protein subunit vaccines have emerged as a promising approach. This article provides an in-depth evaluation of the role of protein subunit vaccines in the management of COVID-19. Leveraging viral protein fragments, particularly the spike protein from SARS-CoV-2, these vaccines elicit a targeted immune response without the risk of inducing disease. Notably, the robust safety profile of protein subunit vaccines makes them a compelling candidate in the management of COVID-19. Various innovative approaches, including reverse vaccinology, virus like particles, and recombinant modifications are incorporated to develop protein subunit vaccines. In addition, the utilization of advanced manufacturing techniques facilitates large-scale production, ensuring widespread distribution. Despite these advancements, challenges persist, such as the requirement for cold-chain storage and the necessity for booster doses. This article evaluates the formulation and applications of protein subunit vaccines, providing a comprehensive overview of their clinical development and approvals in the context of COVID-19. By addressing the current status and challenges, this review aims to contribute to the ongoing discourse on optimizing protein subunit vaccines for effective pandemic control.
Genome-based solutions for managing mucormycosis
2024, Advances in Protein Chemistry and Structural Biology
An uncommon opportunistic fungal infection known as mucormycosis is caused by a class of molds called mucoromycetes. Currently, antifungal therapy and surgical debridement are the primary treatment options for mucormycosis. Despite the importance of comprehensive knowledge on mucormycosis, there is a lack of well-annotated databases that provide all relevant information. In this study, we have gathered and organized all available information related to mucormycosis that include disease's genome, proteins, diagnostic methods. Furthermore, using the AlphaFold2.0 prediction tool, we have predicted the tertiary structures of potential drug targets. We have categorized the information into three major sections: “genomics/proteomics,” “immunotherapy,” and “drugs.” The genomics/proteomics module contains information on different strains responsible for mucormycosis. The immunotherapy module includes putative sequence-based therapeutics predicted using established tools. Drugs module provides information on available drugs for treating the disease. Additionally, the drugs module also offers prerequisite information for designing computationally aided drugs, such as putative targets and predicted structures. In order to provide comprehensive information over internet, we developed a web-based platform MucormyDB (https://webs.iiitd.edu.in/raghava/mucormydb/).
Characterization of exclusively non-commensal Neisseria gonorrhoeae pangenome to prioritize globally conserved and thermodynamically stable vaccine candidates using immune-molecular dynamic simulations
2023, Microbial Pathogenesis
Neisseria gonorrhoeae (Ngo) has emerged as a global threat leading to one of the most common sexually transmitted diseases in the world. It has also become one of the leading antimicrobial resistant organisms, resulting in fewer treatment options and an increased morbidity. Therefore, in recent years, there has been an increased focus on the development of new treatments and preventive strategies to combat its infection. In this study, we have combined the most conserved epitopes from the completely assembled strains of Ngo to develop a universal and a thermodynamically stable vaccine candidate. For our vaccine design, the epitopes were selected for their high immunogenicity, non-allergenicity and non-cytotoxicity, making them the ideal candidates for vaccine development. For the screening process, several reverse vaccinology tools were employed to rigorously extract non-homologous and immunogenic epitopes from the selected proteins. Consequently, a total number of 3 B-cell epitopes and 6 T-cell epitopes were selected and joined by multiple immune-modulating adjuvants and linkers to generate a promiscuous immune response. Additionally, the stability and flexible nature of the vaccine construct was confirmed using various molecular dynamic simulation tools. Overall, the vaccine candidate showed promising binding affinity to various HLA alleles and TLR receptors; however, further studies are needed to assess its efficacy in-vivo. In this way, we have designed a multi-subunit vaccine candidate to potentially combat and control the spread of N. gonorrhoeae.
Immunogenic multi-epitope-based vaccine development to combat cyclosporiasis of immunocompromised patients applying computational biology method
2023, Experimental Parasitology
Cyclospora cayetanensis infections, also known as cyclosporiasis, persist to be the prevalent emerging protozoan parasite and an opportunist that causes digestive illness in immunocompromised individuals. In contrast, this causal agent can affect people of all ages, with children and foreigners being the most susceptible populations. For most immunocompetent patients, the disease is self-limiting; in extreme circumstances, this illness can manifest as severe or persistent diarrhea as well as colonize on secondary digestive organs leading to death. According to recent reports, worldwide 3.55% of people are infected by this pathogen, with Asia and Africa being more prevalent. For the treatment, trimethoprim-sulfamethoxazole is the only licensed drug and does not appear to work as well in some patient populations. Therefore, the much more effective strategy to avoid this illness is immunization through the vaccine. This present study uses immunoinformatics for identifying a computational multi-epitope-based peptide vaccine candidate for Cyclospora cayetanensis. Following the review of the literature, a highly efficient, secure, and vaccine complex based on multi-epitopes was designed by utilizing the identified proteins. These selected proteins were then used to predict non-toxic and antigenic HTL-epitopes, B-cell-epitopes, and CTL-epitopes. Ultimately, both a few linkers and an adjuvant were combined to create a vaccine candidate with superior immunological epitopes. Then, to establish the vaccine-TLR complex binding constancy, the TLR receptor and vaccine candidates were placed into the FireDock, PatchDock, and ClusPro servers for molecular docking and iMODS server for molecular-dynamic simulation. Finally, this selected vaccine construct was cloned into Escherichia coli strain-K12; thus, the constructed vaccines against Cyclospora cayetanensiscould improve the host immune response and can be produced experimentally.
Employing an immunoinformatics approach revealed potent multi-epitope based subunit vaccine for lymphocytic choriomeningitis virus
2023, Journal of Infection and Public Health
Lymphocytic choriomeningitis virus (LCMV) infects many individuals worldwide and causes severe infection in the immunosuppressant recipient, spontaneous abortion, and congenital disabilities in infants.
There is no specific vaccine or therapeutics available to protect against LCMV infection; thus, there is a need to design a potential vaccine to combat the virus by developing immunity in the population. Herein, we attempted to design a potent multi-epitope vaccine for LCMV using immunoinformatics methods.
The whole proteome of the virus was screened and mapped to extract immunodominant B-cell and T-cell epitopes which were fused with appropriate linkers (EAAAK, GGGS, AAY, GPGPG, and AAY), PADRE sequence (13aa) and an adjuvant (50 S ribosomal protein L7/L12) to formulate a multi-epitope vaccine ensemble. Codon adaptation and in silico cloning of the constructed vaccine were carried out using bioinformatics tools. The secondary and tertiary structure of the vaccine construct was predicted and refined. The physicochemical profile of the designed vaccine was analyzed, and the multi-epitope vaccine's potential to bind Toll-like receptors (TLR2 and TLR4) was evaluated through molecular docking and molecular dynamics simulations. Computational immune simulation of the designed vaccine antigen was performed using the C-ImmSim server.
The designed multi-epitope-based vaccine (613 aa) comprised 26 immunodominant (six B-cell, nine cytotoxic T lymphocytes, and 11 helper T lymphocytes) epitopes and is predicted antigenic, non-toxic, non-allergen, soluble, and topographically accessible with a suitable physicochemical profile. The designed vaccine is expected to cover a broad worldwide population (96.35 %) and stimulate a robust adaptive immune response against the virus upon administration. In silico cloning of the constructed vaccine in PET28a (+) vector ensured its optimal expression in the Escherichia coli system. Molecular docking, molecular dynamics simulation, and binding free energy estimation collectively support the stability and energetically favourable interaction of the modeled vaccine–TLR2/4 complexes.
The designed multi-epitope vaccine in the present study could serve as a potential vaccine candidate to protect against LMCV infection; however, the experimental validation and safety testing of the vaccine is warranted to validate the study’s outcomes.
An in silico reverse vaccinology approach to design a novel multiepitope peptide vaccine for non-small cell lung cancers
2023, Informatics in Medicine Unlocked
Non-small cell lung cancer (NSCLC) is the most prevalent and fatal lung cancer. The multiepitope vaccine is one of the immunotherapies successfully applied to treat NSCLC. We designed a multiepitope vaccine with MHC-I, MHC-II, CTL, and linear B cell epitopes of MAGE-A3, EGF, and MUC-1 oncoproteins employing in silico immunoinformatics approach. The structural assessment of the vaccine showed it as a well-stable protein (Z score of −7.53). The molecular docking between the vaccine and human receptors (TLR-2, TLR-4, MHC-I and MHC-II alleles) implied a high affinity of the vaccine to the receptors. The codon optimization and in silico cloning of the vaccine into the pET-28a (+) plasmid of the E. coli K12 strain revealed its potentiality upon expression (CAI value of 0.9607). Furthermore, immune simulation of the vaccine depicted its ability to stimulate immune responses (B cell, T cell, antibody, and cytokines) against NSCLC. Almost all developed NSCLC vaccines cannot treat or prevent NSCLC satisfactorily, and there is still no multiepitope vaccine available that contains all three significant oncoproteins; therefore, our designed vaccine could be a significant weapon against NSCLC. This novel multiepitope vaccine could be developed upon considering its safety, efficacy, and adverse effects on humans through further studies.

View all citing articles on Scopus

^☆: Supplementary data associated with this article can be found at doi: 10.1016/j.vaccine.2004.02.005.

View full text

Prediction of CTL epitopes using QM, SVM and ANN techniques☆

Abstract

Introduction

Section snippets

Datasets

Quantitative matrices

Discussion and conclusions

Acknowledgements

Immunol. Today

Curr. Opin. Immunol

Mol. Immunol

Vaccine

Mol. Immunol

J. Mol. Biol

J. Immunol. Methods

Human Immunol

Immuno-informatics: mining genomes for vaccine components

Immunol. Cell Biol

Antigen processing and presentation—towards the millennium

Immunol. Rev

Pathways of antigen processing and presentation

Rev. Immunogenet

Identifying cytotoxic T cell epitopes from genomic and proteomic information: “The human MHC project”

Rev. Immunogenet

T-cell antigenic sites tend to be amphipathic structures

Proc. Natl. Acad. Sci. U.S.A

Strong conformational propensities enhance T cell antigenicity

J. Immunol

Crystal structure of the human class II MHC protein HLA-DR1 complexed with an influenza virus peptide

Nature

T-helper-cell determinants in protein antigens are preferentially located in cysteine-rich antigen segments resistant to proteolytic cleavage by cathepsin BL, D Scand

J. Immunol