Background
Hepatitis B Virus (HBV) is considered to be a major global health problem with more than 250 million chronic HBV (CHB) carriers and more than one million HBV-associated human deaths per year [
1]. In the nationwide investigation in 2006, Hepatitis B surface antigen (HBsAg) was identified in 7.2% of the whole population in China [
2]. In Qinghai-Tibet Plateau, due to high altitude environment, religion issue, delayed vaccine inoculation or other unknown reasons, the HBV prevalence is over 10% according to our recent study (unpublished), which is much higher than in other areas of China. These carriers of HBV are at increased risk of developing liver cirrhosis (LC) and hepatocellular carcinoma (HCC) [
3].
Qinghai-Tibet Plateau covers more than 2.5 million square kilometers, which is the second- largest plateau in the world and connecting south and northeast Asian [
4]. In this area, a special recombinant of HBV genotype C and genotype D (HBV/CD) was reported [
4]. However, the nature and origin of the recombinant are still poorly studied. At the same time, the coexistence of HBsAg and anti-HBs (HBsAb) in CHB patients has been occasionally reported [
5,
6]. Due to the low incidence of HBsAg/ HBsAb coexistence in HBV carriers, the sample sizes in many previous studies [
5,
7] were small (less than 20) and lack of adequate control subjects [
8]. The mechanism underlying the emergence of anti-HBs in CHB patients remains unclear and related information has not been reported in the indigenous population or in recombinant HBV sequences.
To gain deeper insights into HBV genomic diversity of the special recombinants, 1263 HBsAg positive serum samples were obtained from eight areas of Qinghai-Tibet Plateau and 183 complete genome sequences of HBV isolates were under further analysis. Twenty-seven HBsAg/HBsAb coexistence serum samples were identified and studied with genome variation.
Methods
Procedures to detect point mutations and recombination of HBV by PCR and direct sequencing analysis
Sera samples from Qinghai-Tibet plateau → HBV infection serological markers detection→ DNA purification → HBV complete genome amplification → PCR-DNA purification → Automatic sequencing → Sequences assembling → Comparing with reference sequences.
Subject sample collection
During 2014–2017, the subjects were collected from community-based populations from eight regions of Qinghai-Tibet Plateau based on the population density and covered most of the indigenous habitations, including Hainan, Lhasa, Shannan, Nyingchi, Ali, Nakqu, Chamdo and Rikaze. In this study, the multistage random sampling method was used to ensure the sample representativeness of the whole area. Firstly, two or three counties were selected at random from eight areas, respectively. Secondly, two villages were selected from every county. Thirdly, populations of 18–59 years old were selected from every village. Basic information was recorded on the questionnaire prepared beforehand, including name, gender, birth date, address, and medical information, and 5 mL of venous blood was taken from each participant. HBV infection markers including hepatitis B surface antigen (HbsAg), anti-hepatitis B surface antibody (HBsAb), anti-hepatitis B core antibody (HBcAb), hepatitis B e antigen (HBeAg) and anti-hepatitis B e antibody (HBeAb) were detected by chemiluminescent assays (AXSYM; Abbott Laboratories, North Chicago, IL, USA).
HBV DNA was extracted from 200 μL serum using QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany). Full-length HBV DNA (about 3.2 kb) was amplified by Nested PCR finally performed in seven fragments. Based on the previously reported methods [
9] used in our preliminary study [
10], the primers and thermal profile were optimized in this study to adapt to HBV/CD isolates and identify more complete genomes. The basal core promoter (BCP) region was generated by two rounds. First round was conducted using the primer combination of BcpF1 and BcpR1 in a 25 μL reaction volume containing 5 μL extracted DNA and 12.5 μL premix Taq polymerase. The whole- genome (without BCP region) consisting of six fragments was also generated by two rounds of PCR. First round was conducted using the primer combination of HBV1799FLong and HBV1801RLong in a 50 μL reaction volume containing 15 μL extracted DNA and 25 μL premix Taq polymerase. All the primers and thermal profiles were listed in Table
1.
Table 1
List of primers used to amplify the different regions of the HBV genome and their respective thermal profile
BCP/Precore region | First | BcpF1(1254–1273) | TCCTCTGCCGATCCATACTG | 80 °C(3 min); 95 °C (40 s), 63 °C (1 min), 71 °C (2.5 min)/40 cycle; 72 °C(7 min) |
BcpR1(2416–2439) | TTCCCGAGATTGAGATCTTCTGCG |
Second | BcpF2(1606–1625) | GCATGGAGACCACCGTGAAC | 95 °C(5 min); 95 °C (40 s), 50 °C (30 s), 72 °C (1 min)/35 cycle; 72 °C(7 min) |
BcpR2(1955–1974) | GGAAAGAAGTCAGAAGGCAA |
Complete genome (without BCP) | First | HBV1799FLong(1799–1826) | CTGCGCACCAGCACCATGCAACTTTTTC | 80 °C (3 min); 95 °C (40 s), 58 °C (1.5 min), 68 °C (4 min)/45 cycle; 72 °C(7 min) |
HBV1801RLong(1774–1801) | CAGACCAATTTATGCCTACAGCCTCCTA |
Second | | | |
1 | HBV1847FS (1847–1867) | TGTTCATGTCCCACTGTTCAA | 95 °C(5 min); 95 °C (40 s), 63 °C (30 s), 72 °C (1 min)/35 cycle; 72 °C(7 min) |
HBV2394RS (2394–2408) | GGCGAGGGAGTTCTT |
2 | HBV2298FS (2298–2315) | GACCACCAAATGCCCCTAT |
HBV2933RS (2933–2954) | TCGGGAAAGAATCCCAGAGGAT |
3 | HBV2821FS (2821–2840) | GGTCACCATATTCTTGGGAAC |
HBV272RS (272–291) | TGAGAGAAGTCCACCACGAGT |
4 | HBV179FS (179–197) | CTAGGACCCCTGCTCGTGTT |
HBV704RS (704–725) | CGAACCACTGAACAAATGGCACT |
5 | HBV0599FS(599–619) | GTATTCCCATCCCATCATCCTG |
HBV1286RS (1286–1305) | GCTAGGAGTTCCGCAGTATGG |
6 | HBV1175FS (1175–1190) | GCCAAGTGTTTGCTGA |
HBV1788RS(1788–1802) | GCCTACAGCCTCCTA |
SP6 | CGATTTAGGTGACACTATAGAAGAGAGGCT | |
T7 | CAGTAATACGACTCACTATAGGGAGAAGGCT | |
After purification of the PCR products with a QIA Gel Extraction Kit (Qiagen, Valencia, CA), the sequences were determined using the Sanger dideoxy terminator sequencing method with DNA sequencer ABI 3700 (PE Applied Biosystem). Then the sequences were assembled using SeqMan II software (DNAStar Inc.), and the correct nucleotide position of the complete HBV sequence was revised through alignment with the reference sequence in the GenBank database.
The whole-genome nucleotide sequences reported in this article have been deposited in the National Center for Biotechnology Information GenBank database with accession numbers MN683570- MN683729, MN657315- MN657318, KX660674- KX660690.
Recombination and serological analysis
Genotypes and recombination in genome sequences of HBV were investigated using SimPlot v3.5.1 software and JPHMM (jumping profile Hidden Markov Model) method [
11]. MEGA 7.0 software was used to calculate genetic distance and pairwise distance comparisons. The HBsAg serotypes were deduced from the sequences of the S gene region by the identification of amino acids at positions 122 (Lys-Arg for d-y determinants), 160 (Lys-Arg for w-r determinants), 127 (Pro-Thr-Leu/Ile for w2-w3-w4), and in the case of Arg122 Pro127 Lys160, also at positions 159 (Ala-not Ala for ayw1-ayw2 and ayw4) and 140 (not Ser-Ser for ayw2-ayw4) [
12].
Mutation analysis
Generally, the mutation definition and analysis were performed as previously described [
13]. The whole genome, including S gene, PreC/C gene, reverse transcriptase (RT), X gene and basal core promoter (BCP) were under nucleotide mutation analysis. Base on the structure of HBV/CD recombinants, genotype C and genotype D reference sequences were used as parental sequences for mutation detection [
14,
15]. Nucleotide sequences of datasets in this study and reference HBV sequences were analyzed using the MEGA 7.0 software and Mutation Reporter Tool [
16].
The nucleotide PreS/HBsAg sequences obtained were translated into amino acid sequences, aligned and compared with reference sequences. Amino acid variability was defined as the frequencies of residue substitutions at each position [
17]. For analysis, HBsAg was divided into subregions corresponding to structural and/or functional domains: the N- terminal region (aa 1 to 99), the major hydrophilic region (MHR, aa 100 to 169) and the C- terminal region (aa 170 to 226). The “a” determinant in MHR (aa 124 to 147), the first loop of “a” determinant (aa 124 to 137) and second loop of “a” determinant (aa 139 to 147) were also analyzed respectively.
Statistical analysis
Statistical analyses were performed using SPSS 22.0 (IBM Corp., Armonk, NY, USA). Chi-squared test and two-tailed Student’s t-test were used for analysis, as appropriate. A P-Value of ≤0.05 was considered as statistically significant.
Ethical approval
The study protocol conformed to the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the Ethics Committee of the Chinese Center for Disease Control and Prevention. The purpose of the study and the right to information were explained to the participants by research staff. Written informed consent was obtained from each participant before the interview and venous blood collection.
Discussion
HBV genotypes are related to the severity of liver disease and response to clinical therapy [
18]. Compare to other genotypes, HBV genotype C and genotype D carry a higher lifetime risk of liver cirrhosis and hepatocellular carcinoma development [
19]. It is believed that recombination can exert an influence on clinical important properties more dramatically than the steady accumulation of natural mutations, which suggests the potential pathogen significance of the HBV/CD recombinants [
20]. As far as we know, there was no detailed molecular epidemiology or genetic variability study carried out based on a large number of HBV/CD recombinant complete genome sequences.
In this study, HBV/CD recombinant was the main genotype (179/183, 97.81%) isolated in the plateau. This result is different from recent but smaller sample size reports in Tibet that genotype C and genotype D were the most dominant HBV genotypes [
21,
22]. The HBV genotypes showed a distinct geographical distribution all over the world [
19]. In both HBV CD1 and CD2 recombinant genomes, the ‘C fragment’ and ‘D fragment’ were genetically close to subgenotype C2 and D4, respectively. HBV/D4 isolates were mainly reported in South America and the Pacific islands [
14]. The only report of HBV/D4 in Asia was in northeast India [
23]. Specifically, subtype C2 was the main genotype in central and north China [
24], so Qinghai-Tibet Plateau was found at the presumed geographical junction to the distribution of these two HBV subgenotypes (Fig.
2). In this study, recombinants CD1 and CD2 had significant different geographic distribution in the plateau. It could also partly explain the different distribution of seromarkers in Tibet and Qinghai. HBV/CD2 isolates were mainly identified in two regions (Shannan & Rikaze) which are close to the boundary of northeast India (Fig.
2). Genotype D fragment was in around nt10–800 of HBV/CD1 and around nt10–1500 of HBV/CD2 (Fig.
1). Then both CD1 and CD2 have the same initial recombination site at nt10. Based on the facts mentioned above, it can be inferred that these two types of recombinations might have a mixed China (subgenotype C2) and Indian (subgenotype D4) origin.
Though lots of HBV genotype D or genotype C isolates were reported in India [
23,
25‐
27], no HBV/CD recombinant strain was reported in south Asia until now. Actually, nearly all the HBV/CD recombinants in this study and in previous studies scattered in indigenous of high altitude areas, such as Qinghai-Tibet Plateau, Yunnan-Kweichow Plateau [
28], Loess Plateau [
29] and Mongolia Plateau (Accession Number: AB270534, AB270534). A possible reason for this phenomenon might be due to the genetic background or migration history of highlanders.
As the diagnosis information in these rural areas is still not detailed enough to analyze the clinical feature of these HBV recombinants, the complete genome features in this study offer us a way to estimate the clinical prognosis of these HBV strains in the indigenous population. Serotype awy2 was the predominant serotype of HBV/CD in this study, as same as the results of genotype D [
30]. It indicates that the main serological character had not dramatically changed after the recombination. Previous studies reported that A1762T/G1764A double mutation in BCP (nt.1742–1849) was the strongest viral factor associated with the development of liver disease [
20,
31]. Other mutations, such as T53C, G1613A, C1653T, T1753C, A2189C, T3098C and PreS deletions were also reported associated with clinical progress [
32‐
38]. In this study, the A1762T/G1764A double mutations were observed in 27.93% in HBV CD recombinant sequences. These frequencies were less than previous reports of genotype C in CHB carriers [
39], which indicates a lower risk of HCC in the population. The same result was also found in frequencies of mutation A2189C (12.85%), G1613A (8.94%), T1753C (8.38%), C1653T (5.59%), T53C (4.47%), T3098C (1.68%) and PreS deletion (2.23%, 4/179), which were also lower than mutation frequencies of genotype C in CHB patients in Asia [
32,
34‐
40]. This indicates the clinical progress of CD recombinant seems genetically more temperate than genotype C which had caused the most liver disease and related death in China.
There were several mutations frequently identified in PreC/C and PreS/S regions, which potentially influence the clinical prognosis. However, many of these prevalent nucleotide mutations or amino acid substitutions were unique in the CD recombinants, which were not previously reported in genotype C or other genotypes of HBV genome. The function and influence of these mutations need further analysis.
The prevalence of anti-HBs coexistence was 2.14% in the population of HBsAg-positive patients. The coexistence frequency is lower than previous reports [
5‐
7,
41], suggesting possible geographical variability or special recombinant characteristics in these statistical differences. HBeAg status and DNA level of the serum were the focus of the argument about comparability of the HBsAg/HBsAb coexistence studies [
8]. In this study, no differences were identified in the distribution of DNA level, HBeAg positivity or recombination types between Group I and Group II. And the DNA level of serum samples in Group I was all above 5 log10 copies/mL. The aa variability of S region in Group I was significantly higher than that in Group II (Table
3,
P < 0.001), which was consistent with previous reports [
41]. In further analysis of each segment in the S region, the aa variability in Group I was significantly increased in N-terminal, MHR and C-terminal, compared with Group II (
P < 0.001,
P < 0.001,
P = 0.013, Fig.
3).
The “a” determinant in MHR is the hot zone of HBsAg/HBsAb coexistence studies, which is located in the hydrophilic region between aa124 and aa147 and act as the most important antigen- binding site in the S region of each HBV serotypes [
17,
41]. In this study, compared with Group II, the aa variability in the “a” determinant of Group I increased significantly (1.54% vs 0.06%,
P < 0.001). This region is composed of two stem- loop structures, which act as two binding sites of monoclonal antibodies and are of great significance for the effectiveness of hepatitis B vaccine and the clinical detection of antigens. In this study, the aa variation of the first and second loop of the “a” determinant were both statistically correlated with HBsAg/HBsAb coexistence, which consistent with previous studies [
5,
17]. There were different opinions about the effect of the second loop [
42] or even the entire S protein variablity [
7] on the HBsAg/HBsAb coexistence. However, former studies were about genotype B and genotype C [
42], while HBV/CD recombinant was the main genotype in this study and the “a” determinant was located in genotype D fragment (Fig.
1). It may be due to the amino acid variation background in different genotypes. Moreover, due to the low incidence of HBsAg/ HBsAb coexistence in all HBV genotypes, the sample sizes in previous studies were less than 20 [
5,
7], and the insufficient number of samples may affect the stability of statistical results.
The association between aa variation of S protein and HBsAg/HBsAb coexistence has been reported in many studies [
5,
6,
17,
41,
42]. However, the aa mutations in any part of S protein is not a sufficient and necessary condition for the explanation of HBsAg/HBsAb coexistence enigma [
8], so there should be at least one auxiliary or secondary condition in the emergence of the coexistence. Previous studies suggested that mutations or deletions in the PreS region were the reason for the coexistence of HBsAg/ HBsAb [
43]. The significant difference was also found in the distribution of deletions in PreS region between Group I and Group II (Table
3), suggesting the change of PreS region is also associated with coexistence. Interestingly, the frequency of PreS deletion in HBV/CD recombinants was 2.23%, which is lower than the frequency in other genotypes (4.9%) [
7]. Combine with the fact that the frequency of HBsAg/HBsAb coexistence is 2.14% in this study, which is also lower than other genotypes [
5‐
7,
41]. PreS deletion may act as a genetic variation background which occasionally affects the interaction between antigens and antibodies, together with the aa mutation in MHR to cause the coexistence.
This study has several limitations. First of all, multiple infections, minor populations of immune escape variants in viral quasispecies may not be identified in this study by PCR product sequencing. Secondly, the combined action of PreS deletion and aa variation in MHR need further support. Finally, the result would be more convincing if we could have a larger sample size of HBsAg/HBsAb coexistence, though 27 HBsAg/HBsAb coexistence in this study is more than most of the previous studies.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.