Background
Phylogenetic analysis indicates that the human immunodeficiency virus type 1 (HIV-1) originated from simian immunodeficiency virus infecting chimpanzees (SIVcpz) through a chimpanzee-to-human zoonotic transmission [
1‐
4]. Until recently [
5], the natural hosts of the virus, the chimpanzee, have been thought to remain asymptomatic throughout infection despite high viral loads [
6‐
8] In humans, however, an increase in viral load is usually associated with progression to the acquired immuno-deficiency syndrome (AIDS) and subsequently death [
9‐
13]. The causes of the difference in disease progression may involve either differences in the host and/or between the HIV-1 and the SIVcpz viruses.
A zoonotic (
i.e. cross-species) event is expected to be accompanied by mutations that enable the pathogen to adapt to the new host environment, (e.g. as observed in a study by Baric et al [
14]). Indeed, sequence changes have been identified in HIV-1 that are evidence of selective pressure associated with the genetics of the human host [
15‐
17]. In particular, the human cytotoxic T-lymphocyte (CTL) immune response directed against foreign antigens plays a major role in exerting selective pressure on antigenic proteins, including those of HIV-1. The activation and characteristics of the immune responses against the virus have been found to differ remarkably between human and chimpanzee [
7,
18‐
20]: an elevated anti-HIV immune response upon infection is characteristic in humans, but the chimpanzee generally maintains a low level of immune activation. The human immune response may therefore exert higher selective pressure on the virus sequence compared to immune responses of the natural host. However, the virus is capable of overcoming the immune response, leading to AIDS.
The CTL immune response is mediated by Human Leukocyte Antigen (HLA) molecules that bind to endogenous antigenic peptides known as epitopes, and transport them to the surface of the infected cell for recognition by CTLs resulting in killing of the infected cell [
21]. The HLA gene is highly polymorphic and each HLA molecule binds to peptides that contain specific sequence motif patterns (known as anchor residue motifs) [
22,
23]. For binding to occur between a peptide and the HLA binding groove, only limited amino acid variation at the main anchor positions of the peptide is allowed [
21,
24,
25]. Successful binding, efficient transport and presentation of a peptide to a CTL depend on the presence of the appropriate anchor residue motif and the overall affinity between the HLA binding groove and the epitope [
26,
27]. The strength of selective pressure varies between specific CTL immune responses directed by different HLA alleles [
28]. Some HLA molecules have been associated with immune escape mutations at anchor sites which enable the virus to adapt to the host, thus increasing viral load [
8,
29‐
31].
Investigation of the evolutionary dynamics of immune escape has focussed primarily on escape mutations that incur a fitness cost and consequently revert to wild type, upon transmission to a host that mounts different immune responses. This can result in a pattern of toggling between escape and wild-type amino acids that is detectable using evolutionary modelling [
32]. In this study the focus is on escape mutations that do not incur a cost in terms of viral fitness. Such escape mutations do not experience selection pressure to revert to the wild-type state following transmission to a new host. Consequently, they are associated with episodic selection, rather than the ongoing rapid evolution associated with escape and reversion. Upon transmission to human, SIV is likely to have experienced selective pressure to escape from common human immune responses. Some of these escape mutations would not have had a significant effect on the fitness of the virus and thus would not have experienced strong selection to revert. Consequently, we hypothesized that the branch of the SIV-HIV-1 phylogenetic tree leading to the ancestor of the HIV-1 sequences would include evidence of episodic selection to escape from common HLA alleles.
To investigate the evidence of episodic selection for CTL escape along this branch, we predicted epitopes for HLA alleles, using the SIV consensus sequence to approximate the sequence that was transmitted to humans. We used a structure-based method that estimates the strength of binding between a viral amino acid sequence and an HLA molecule from amino acid pair-wise potentials for the epitope prediction. We selected regions where known anchor residue motifs were present and which had high binding affinity, limiting our analysis of selective pressure to these regions. Finally, we used models of codon sequence evolution to quantify the selective pressure, inferring positive selection from the ratio of nonsynonymous substitution rates (dN) to synonymous substitution rates (dS) for individual branches in a phylogeny. Branch-specific analysis of selective pressure enabled us to investigate selective pressure along the branch ancestral to the HIV sequences, and hence to study how HIV-1 adapted to the human host upon transmission from chimpanzee.
Discussion
The PREDEP program provides binding predictions for a limited number of HLA molecules with solved crystal structures and preferred binding anchor residue motifs that were predicted from HLA-peptide structural conformations. Only six such HLA alleles known to mediate cytotoxic T-lymphocyte immune responses are available for analysis. Amongst these, we only observed HLAs A*0201, A*6801 and B*2705 to bind strongly to some regions of the consensus SIVcpz genome. Our analysis was therefore restricted to selective pressure potentially exerted by each of these three alleles following the chimpanzee-to-human zoonosis event of HIV.
It is interesting that neither the
a priori nor the GABranch analysis found evidence for positive selection in the HLA B*2705 alignment, whether along the ancestral HIV-1 branch or any other branch. This is surprising because B27 alleles have been associated with delayed progression to AIDS in HIV-1 infected individuals [
40], which in turn is associated with persistent strong positive selection at specific sites [
41]. Also, delayed progression is a result of reduced viral replication - this indicates that these sites are important for the fitness of the virus. One possibility that could explain the observed HLA B*2705 result is that it may have caused positive selection on only a few sites. Such selection is hard to detect because ω is averaged over all sites of the sequence. Selection may also have been weak due to the fact that this is a rare HLA allele (1%) [
42].
In the HLA A*0201 dataset, both the
a priori and the GABranch analyses inferred positive selection on the HIV-1 ancestral branch, with very high support for dN>dS. Of the alleles available for analysis in this study, HLA A*0201 is the most frequent in African populations (see Table
1). It is also the most frequent HLA allele in Caucasian populations and many studies have been carried out to determine its effect on HIV disease progression [
42]. Even though the allele recognizes immunodominant peptide regions of the HIV-1 sequence, it fails to exert strong selective pressure on some virus peptides [
43]. Some studies have also shown that the outcome of an immune response does not only depend on the HLA molecule but also on the specific peptide sequences that are targeted [
44‐
48]. Our results suggest that immune escape mutations that occurred for HLA A*0201 mediated CTL responses may have been selected for in the period immediately following zoonosis. If these adaptations subsequently became fixed in the viral population they would no longer be under diversifying selection today.
HLA A*6801 (another common allele in African populations) appears to have exerted strong selective pressure on the HIV-1 ancestral branch compared to the rest of the tree. High support (99.6% of the tested models) for ω > 1 was observed at the ancestral HIV branch. This allele has anchor residue motif restrictions that are shared within the HLA A3 supertype, the second most frequent supertype in the human population [
49]. The HLA A*6801 allele itself targets the Tat protein, which is expressed in the early stages of the HIV-1 lifecycle, and CTL responses to this protein cause a significant reduction in disease progression rate [
50]. Escape mutations from the CTL immune response have also been identified within Tat at the population level, causing reduced viral load [
51,
52]. The virus may have adapted well to the A*6801 responses early after the cross-species transmission event at sites that do not affect the replication of the virus. The recently observed association with a reduction in viral load indicates that there were also functionally important sites contained in A*6801 epitopes - this would have made it difficult for these regions to adapt to the immune response.
Conclusion
This is the first study that analyses HLA-associated selective pressure following the transmission from chimpanzee to human across all potential target sites of the HIV-1 genome. We identified regions of the HIV-1 sequence that were initially targeted by the CTL immune response immediately after the cross-species transmission of HIV-1 from chimpanzee to human using the chimpanzee consensus sequence. Of the six HLA alleles with crystal structures available for analysis, we found strong binding regions; this could imply successful immune responses
in vivo, for HLAs A*0201, A*6801 and B*2705. We determined the average extent of selective pressure exerted by each HLA allele along the branch leading to HIV-1 sequences. This branch represents the sequences that first encountered human immune response-directed selective pressure immediately following the zoonosis event. Our results suggest that HIV-1 adapted to CTL responses directed by HLAs A*6801 and A*0201, which are amongst the most common HLA genotypes in African populations (Table
1). It is therefore likely that the virus was frequently exposed to selective pressure exerted by common immune responses during initial exposure to the human host following transmission of the virus from chimpanzees. As observed from the results, we did not find evidence for strong selective pressure exerted by the HLA B*2705, which has extremely low frequencies in the African populations (Table
1) [
53,
54].
In this study we focussed specifically on epitopes that we infer were likely to have been present in the viral sequence that first infected humans. We propose that the selection we observe at these positions along the branch of the phylogenetic tree leading to all of the HIV-1 sequences reflects episodic selection to evade human cytotoxic immune responses. Episodic selection has been proposed to be an important aspect of cross-species pathogen transmission and, in fact, observed in a laboratory setting previously [
14]. However, this is the first time, to our knowledge, that evidence has been presented of transient positive selection associated with human immune responses against unconstrained regions of the virus shortly after transmission to human.
Acknowledgements
This study was funded by the South African National Bioinformatics Network. NKN was supported by a training grant under the Stanford-South Africa Biomedical Informatics Training Program, which is supported by the Fogarty International Center, part of the National Institutes of Health (grant no. 5D43 TW006993).
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
NKN performed the analysis, interpreted the results and wrote the manuscript. CS conceived and supervised the study and edited the manuscript. KS supervised and co-wrote the manuscript. All authors read and approved the final manuscript.