Background
The HIV-1 integrase catalyses the integration of the reverse transcribed viral DNA into the host genomic DNA via a two-step process. In its active form the integrase forms a tetramer. The monomeric enzyme consists of 288 amino acids (aa) and contains three functional domains: the N-terminal zinc-binding domain (NTD, aa 1–49), the central catalytic core domain (CCD, aa 50–212), and the C-terminal DNA-binding domain (CTD, aa 213–288) [
1‐
4]. Each region comprises motifs essential for the proper function of the enzyme, e.g. the zinc finger motif H12-H16-C40-C43 in the NTD, the active site D64-D116-E153 in the CCD, and the minimal nonspecific DNA-binding region ranging from I220 to D270 in the CTD [
2‐
6].
Raltegravir was the first integrase strand inhibitor (INSTI) to be approved in Europe in 2007, followed by elvitegravir in 2012 and dolutegravir in 2014. Bictegravir [
7] and cabotegravir [
8] are in clinical trial development. INSTIs target the CCD, thereby inhibiting the strand transfer of the double-stranded viral DNA into the host genome [
1]. Various allosteric inhibitors of integrase (ALLINIs), which modulate integrase multimerisation [
9] and interfere with the cellular transcription factor LEDGF/p75 [
10] are in development but have not so far made it further than Phase I clinical trial [
11].
Due to its high rate of replication, mutation, and recombination, HIV is a virus of high genetic variability. The viability of virus variants in turn is limited by structural and functional constraints. At the same time, variants in the viral quasispecies can be selected by the pressure of the human immune system [
12,
13] or antiretroviral treatment (ART) [
14,
15]. In general, the HIV diversity at the intra-patient level increases during the course of infection [
16] driven by both drift and selection [
17]. During transmission to a new host, several stochastic and selective bottlenecks reduce the viral diversity to a few variants [
18], and factors that contribute to shaping the HIV diversity at the inter-patient level are extensively discussed [
19‐
24].
Naturally occurring polymorphisms can affect the genetic barrier to drug resistance by influencing the selection of resistance mutations, enzymatic activity, and replicative capacity [
22,
25,
26]. Epistatic interactions between polymorphisms can further modulate viral fitness and the development of drug resistance in complex ways and have been shown to play an important role in the HIV-1 protease and reverse transcriptase [
27‐
30]. Thus, to understand the selection of resistant variants in the presence of INSTIs, it is important to investigate the evolutionary dynamics of the polymorphic sites in the integrase.
The prevalence of HIV-1 integrase polymorphisms and INSTI resistance mutations has been investigated before in INSTI-naïve individuals [
26,
31‐
35] and ART-naïve individuals [
35‐
40]. However, time trends and covariation of complex mutation patterns preceding the availability of INSTIs have not so far been analysed. The aim of this study was to investigate covarying clusters of naturally occurring resistance- and non-resistance-associated amino acid variants and their frequencies over time at the inter-patient level to consider their potential relevance for INSTI resistance. To this end, HIV-1 integrase sequences were obtained from samples of ART-naïve individuals newly diagnosed with HIV-1 between 1986 and 2006, a 20-year period prior to the first approval of INSTI in Germany [
41].
Discussion
Our analyses could confirm and considerably extend previously published results based on INSTI-naïve [
26,
31‐
35] or ART-naïve [
35‐
40] study populations that either focused on HIV-1 subtype B [
31,
36,
38] or included various HIV-1 group M subtypes [
26,
32‐
35,
37,
39,
40]. We restricted our analyses to HIV-1 subtype B strains because these are predominant in Germany [
43‐
45] and because different HIV-1 subtypes have different consensus amino acids at some sites, which can bias the degree of variability when compared to consensus B [
33,
35].
We found the highest amino acid variability determined by entropy, time trends, and direct coupling analysis within the CCD, the NTD, and between CCD and NTD. Sites important for enzymatic activity were in general conserved, however, some positions involved in binding the cellular cofactor LEDGF/p75 (sites 125, 165) and within the minimal nonspecific DNA binding region (sites 220, 230, 232, 234, 256, 265) were polymorphic with an overall variability ≥5%. Covariation between positions 125, 165, 256, or 265 and other sites was observed, and the DNA-binding site 234 covaried with the DNA-binding site 253. The most frequent substitutions were T125A, V165I, I220L, S230 N, D232E, L234I, D256E, and A265V. All of them occurred within the same biochemical class of amino acids, with the exception of T125A that represents a switch from a hydrophilic to a hydrophobic amino acid. The effect of this switch is not known and should be investigated experimentally. A time trend in frequency was observed for variants T125A, D256E, and A265V. The knowledge about the variability of the integrase should be taken into account for the design of genotypic resistance assays.
Most time trends were based on an exchange of two amino acids, however, a general diversification was observed at sites 119 and 124. 17 out of 22 amino acid variants with increasing or decreasing frequency covaried among each other. In general, we observed a concordant time trend for pairs with a positive direct coupling term and a discordant time trend for pairs with a negative direct coupling term. Exceptions to this rule were couplings between positions 154–265 and 201–256 (Table
4). The time trends for the individual variants 154I–A265, M154-265 V, and V201-256E were discordant despite positive direct coupling terms. The reason for this may be unidirectional coupling, i.e. 154I requires the presence of A265, but not vice versa. Likewise, the time trends of M154-A265, 154I-265 V, and 201I-256E were concordant despite negative direct coupling terms.
The prevailing concordance of significant time trends and significant couplings in our study suggests the selection of coevolving epistatic clusters. However, due to the transmission bottlenecks [
18], genetic drift may be another viable explanation for the observed time trends in the frequency of certain amino acid variants. The role of genetic drift in HIV evolution is debated and has been quantified to some extent at the level of intra-patient evolution and for known transmission pairs rather than on inter-patient level [
60,
61]. Genetic drift on inter-patient level requires inheritance of selectively neutral substitutions. Large parts of the integrase may be under negative selection to maintain enzymatic functionality; nevertheless, particular positions and certain substitutions of the integrase may be selectively neutral. Therefore, we considered the possibility of genetic drift for all time-trending substitutions by assessing whether there was evidence for (i) inheritance of the time-trending substitutions and (ii) whether the time-trending substitutions were selectively neutral. Ad (i): We statistically compared the mean patristic distance of random sequences versus the mean patristic distance of sequences carrying a specific time-trending substitution to investigate if time-trending substitutions appeared more frequently in phylogenetically closely related sequences (Additional file
1: Table S1). Sequences carrying the time-trending substitutions 72 V, 154I, 165I, and 265 V had significantly smaller mean patristic distances than random, by this indicating inheritance. Interestingly, all of these substitutions decreased in frequency over time. Ad (ii): First, we performed Tajima’s D test [
60,
61] with a result of D = −1.44, by this indicating negative selection. Next, we calculated the ratio of nonsynonymous over synonymous mutations (dN/dS ratio) [
60,
61], finding that most regions of the integrase were under strong negative selection, including sites 72, 154, 165, and 265 (Additional file
2: Figure S1). In summary, we could not observe a clear contribution of genetic drift to the time trends of the examined substitutions.
By using Sanger sequencing of bulk RT-PCR products with a sensitivity of approximately 30% [
62,
63] and excluding ambiguous amino acids in our analyses we could only investigate the major virus variant from each patient sample. Minor variants and linkage between minor variants can only be investigated by using more sensitive techniques like single genome sequencing (SGS) or next generation sequencing (NGS) [
63‐
65]. To minimize the probability that technical errors during RT-PCR and Sanger sequencing lead to false positive predictions with regard to coupling terms, we combined our direct coupling analysis with a power analysis, which essentially requires that an amino acid pair has to be present in multiple sequences to be repeatedly identified by direct coupling analysis. Recently, the use of covariation methods as a measure of coevolution has been questioned by Talavera et al. [
66]. Based on a computational study, the authors point out that a strong covariation signal is caused by a low evolutionary rate. We therefore assessed our results accordingly but could not find a relation between the rarity of pairwise substitutions and high coupling terms or the occurrence of single substitutions in couplings (Additional file
3: Figure S2).
16 minor INSTI resistance mutations and 11 INSTI-selected mutations were observed as naturally occurring in our ART-naïve study population, which originated from the time prior to INSTI approval. Among these resistance-associated variants, three increased in frequency over time and seven covaried with non-resistance-associated variants. The complex interdependent evolution of these mutations might control enzymatic activity and replication capacity independent of selective pressure through INSTIs at the inter-patient level. Indeed, accessory drug resistance mutations that compensate viral fitness are often already polymorphic in drug-sensitive HIV-1, suggesting that these mutations may naturally enhance viral fitness and virulence with progression of the HIV-1 epidemic [
21,
22]. INSTI-independent linkage between non-resistance-associated sites and resistance-associated sites or sites targeted by INSTIs can affect the selection of resistance mutations in the presence of INSTIs. This knowledge should be taken into account for the improvement of resistance prediction algorithms as well as for the development and preclinical evaluation of new INSTIs and ALLINIs. Deeper analyses of the observed resistance-associated variants are needed to evaluate their clinical relevance. In particular, those with naturally increasing frequencies that were linked to covariation should be investigated, i.e. L101I, T122I, and V201I. The absence of major INSTI resistance mutations in our ART-naïve study population underscores the suitability of INSTIs for first-line antiretroviral regimens.
Because the analysed dataset was rather small (n = 337), our results may require further validation from the analysis of larger, independent datasets. Due to the relatively small number of samples, some of our results might not have reached statistical significance, e.g. the temporal trend of T122 and D256. Generally, given the small sample size, overrepresentation of almost identical sequences (i.e. from transmission chains) may profoundly bias any downstream analysis of time trends and covariation patterns. To assess whether our analyses were affected by such sampling bias, we additionally performed them using a reduced dataset in which clusters of closely related sequences were replaced by one representative only. The results obtained from the reduced dataset confirmed all results obtained from the full dataset.