The study participants (referenced as patients 5A and 10) were recruited at Centre Hospitalier Universitaire de Montreal (CHUM). They were Caucasian individuals infected with an HIV-1 subtype B and under successful antiretroviral therapy (ART) for more than 4 years. For each patient, a sigmoid biopsy was collected during colonoscopy 4 years after initiation of successful ART and processed as previously described [
15]. Matched peripheral blood was collected on the same day and immediately processed with Ficoll for PBMC isolation, performed in parallel with cell extraction from biopsy tissue. DNA was extracted from both compartments and used for next generation sequencing (NGS) analysis of HIV-1 provirus; we used the method already published [
16,
17] to amplify fragment B, i.e., polymerase (Pol) region including RT and integrase. The PCR products were purified and quantitated, the library was prepared using the Nextera XT DNA Sample Preparation kit; each individual library was then sequenced on a MiSeq Illumina platform. Raw data (FASTQ files) were submitted to the SmartGene
® NGS HIV-1 module to generate a BAM file for each patient and each sample was processed for further analysis [
18]. The study was carried out using only the Pol RT part region of the sequences obtained. Using Galaxy and Clustal software, RT gene sequences from the two compartments (GALT and PBMC) were selected for neighbor-joining analysis from matrix distances calculated after gapstripping of alignments, with a Kimura two-parameter algorithm and bootstrap analysis. To do so, an alignment was generated that included only reads with lengths > 400 bp corresponding to a given region of RT (variable according to the patient, from amino-acids 39 to 202). This length limitation explains the small number of reads used for this analysis compared to the total number of reads covering this region. Phylogenetic trees were visualized using Interactive Tree of Life (ITOL) software [
19]. Single genome sequencing (SGS) was carried out according to the method of Palmer et al. [
20]. The total extracted DNA of both compartments was diluted in TE buffer at a dilution yielding a PCR product in three out of 10 PCRs. In this case, according to Poisson’s distribution, the dilution contains one copy of cDNA per positive PCR at about 80% of the time. Two rounds of PCR for RT amplification were followed by visualization of the PCR products. The 1:9 dilution was found to be optimal for Sanger sequencing and the sequences (assuming that there was no mixture of population) of PBMC and GALT obtained were aligned by Clustal to obtain a neighbor-joining tree. For evaluation of evolutionary divergence, the median, mean and range of the number of base substitutions per site between RT sequences were calculated. Analyses were conducted using the Maximum Composite Likelihood model [
21]. The analysis involved 4100 nucleotide sequences. Codon positions included were 1st + 2nd + 3rd + Noncoding. All positions containing gaps and missing data were eliminated. There was a total of 376 positions in the final dataset. Evolutionary analyses were conducted in Molecular Evolutionary Genetics Analysis (MEGA) 7 software [
22].