Background
Human astroviruses (HAstVs) are one of the most common causes of acute gastroenteritis in children worldwide [
1‐
3]. HAstV was first identified during an outbreak of gastroenteritis among hospitalized infants in 1975 [
4]. Its name is derived from its distinctive star-shaped appearance under the electron microscopy (EM). Molecular analyses indicate that HAstVs are non-enveloped viruses with a 6-8 kb single-stranded, positive-sense RNA genome consisting of three overlapping open reading frames (ORFs)--ORF1a, ORF1b and ORF2--as well as the 5'- and 3' nontranslated regions (NTRs) [
5]. ORF 1a encodes a serine protease; ORF 1b encodes an RNA dependent polymerase; and ORF 2 encodes a capsid precursor protein [
5].
HAstVs have been grouped into eight known serotypes (HAstV-1 through HAstV-8) based on their reactivity to polyclonal antibodies and on analysis by immunofluorescence assays, neutralization assays, and immunoelectron microscopy (IEM) [
5‐
7]. Phylogenetic analyses of the HAstV nucleotide sequence have defined eight genotypes, and further studies have indicated a strong correlation between the genotypes and serotypes [
8]. As such, genotypes are frequently applied to type HAstVs.
Genomic characterization studies are important to the understanding of the origin, molecular evolution, and phylogenetic relationships among HAstV genotypes. The full-length genome sequence for a HAstV (HAstV-2) was first determined in 1993 [
9]. Subsequently, the complete genomic sequences of five more genotypes (HAstV-1, HAstV-3, HAstV-4, HAstV-5, and HAstV-8) were reported [
9‐
12]. Because the dominant, disease-causing HAstV type and strain often fluctuate with time and geographic location, it is critical that we characterize the complete genomic sequences of all known genotypes in order to better control and prevent future epidemics [
13]. Limited sequence information for HAstV genotype 6 is available. Only a partial genome sequence has been reported [
14,
15], even though this genotype has been identified as one cause of sporadic or large scale outbreaks of acute gastroenteritis worldwide [
16,
17].
In 2007, we identified a case of HAstV-6 infection in Beijing, China, suggesting that this strain might be more epidemiologically relevant than previously recognized [
18]. Here we sequenced and analyzed the complete genomic sequence of this HAstV-6 192-BJ07 strain, and describe its genetic characteristics by comparing its sequence with other known HAstV genotypes. The characterization of HAstV-6 by whole genome sequencing provides critical insight into the genetics of this virus as well as valuable information for the control and prevention of HAstV-induced gastroenteritis.
Discussion
In this study, we report the whole genome sequence of HAstV-6 based on a strain (192-BJ07) identified in an etiological investigation of viral gastroenteritis in Beijing [
18]. The sequence analysis shows that the 192-BJ07 strain has a typical astrovirus genome organization with three ORFs (ORF1a, ORF1b, and ORF2), an 80-85 nt 5'-NTR, and an 80-85 nt 3'-NTR. Phylogenetic and homological analyses of the ORF2 regions indicate that the 192-BJ07 strain genome possesses a 95.9% amino acid identity to the documented HAstV-6 strain (GenBank accession number Z46658), but a <75% amino acid identity to other HAstV genotypes.
Consistent with previous reports of other HAstV genotypes, our results also show the existence of three potential cleavage sites at Lys 71, Arg 361, and Arg 395 in HAstV6 ORF2 [
3,
19,
20]. It is thought that the cleavage at Lys 71 leads to the generation of the 79-kDa capsid protein [
19]. The 79-kDa capsid protein can be converted into three smaller peptides--VP34, VP29, and VP26--and leads to an enhancement of HAstV infectivity [
19]. Our observations support the critical role of these three amino acid residues in HAstV replication and pathogenesis.
In our study, we found two insertional mutations, Arg 757 and Lys 758, in ORF1a. How these hydrophilic amino acids contribute to the characteristic/function of the virus is unknown at present and needs to be addressed in further functional studies.
Our phylogenetic analysis suggests that HAstV-6 may be an ancestor of other HAstV genotypes as shown by the phylogenetic analysis of the whole genome sequence (Fig.
4A). This observation was further supported by the phylogenetic analysis of the ORF1a protein region (Fig.
4B). Moreover, detailed analysis of all genotype ORF1b amino acid sequences indicates that HAstV-6 and HAstV-3 may have functioned as the common ancestor of other HAstV genotypes (Fig.
4C). However, the analysis of HAstVs ORF2 suggests that HAstV-8 and HAstV-4 may have been the common ancestor of other HAstV genotypes (Fig.
4D). Different evolutionary and selective pressures in different HAstV genomic regions may be responsible for this discrepancy of the evolutionary relationships [
25].
The secondary structure predictions indicate that stem-loop structures are not conserved in the 5'- and 3'-NTRs of known HAstV genotype genomes. This difference may be responsible for the possible discrepancy at the replication and/or transcription level among HAstV genotypes. The fact that the 5'-end of the 5'-NTR and the 3'-NTR and the 52 nt region at the ORF1b/ORF2 junction are highly conserved points to their critical role in the interaction with the viral replicative or transcriptive machinery. The variation in the 3'-end of 5'-NTR may influence the efficiency of viral genome replication or transcription, resulting in a difference in replication ability or virulence among different genotypes or strains [
5].
The -1 ribosomal frameshifting is critical for the translation of the astrovirus genome [
22]. The -1 ribosomal frameshifting requires two cis-acting signals: a shifty heptamer sequence (AAAAAAC) and a potential stem-loop structure [
10,
26]. This study showed that the HAstV-6 192-BJ07 strain also has such cis-acting elements, and further demonstrates the conservation of such elements among HAstV genotypes [
5].
At present, the mechanism of HAstVs' variations is unclear. One study has indicated that recombination may be responsible for HAstVs' variation [
24]. However, current studies have not broadly established the role of recombination in HAstV variation [
25,
27]. In agreement with most reports, we found no clear evidence of recombination between the 192-BJ07 strain and other HAstV genotypes based on similarity plot analysis. Diversification of the HAstV amino sequences may be attributed to accumulated single nucleotide mutations. This mechanism is similar to the antigen drift in other viruses, such as in influenza viruses [
28,
29], which could lead to HAstVs escaping from existing host immunities and could result in the emergence of a new epidemic HAstV strain [
30]. Additional studies, such as large scale whole genome sequencing, are needed to address the evolutionary patterns of HAstVs.
Methods
A stool sample (termed 192-BJ07) that tested positive for HAstV-6 by RT-PCR was collected from a 2-year old boy who visited the Beijing Children's Hospital with acute diarrhea in 2007 [
18]. Viral RNA was extracted from the stool supernatant using Trizol reagent (Invitrogen, Carlsbad, CA) according to the manufacturer's instructions.
ORF2 amplification
The primers ORF2-F (5'-atggctagcaagtctgacaagcagg-3') and ORF2-R (5'-gaagctgtaccctcgatcctactc-3') targeting ORF2 of 192-BJ07 were designed based on the only available HAstV-6 sequence in GenBank (GenBank accession number Z46658). For reverse transcription (RT) reactions, cDNA was generated with the SuperScript™ III RT kit (Invitrogen, Carlsbad, CA) using a random primer (Takara, Dalian, China) as described in the manufacturer's protocol. The PCR reaction was performed as follows: 94°C for 3 minutes, 35 cycles of amplification (94°C for 30 seconds; 50°C for 30 seconds; and 72°C for 3 minutes), and a final 10 minutes extension at 72°C. The PCR products were analyzed by 1.0% agarose gel electrophoresis and stained with ethidium bromide.
Genome amplification and sequencing
Rapid amplification of cDNA end (RACE) reactions were performed to obtain the entire sequence of the viral genome by using the 5'- and 3'-RACE System for Rapid Amplification of cDNA Ends kit (Invitrogen, Carlsbad, CA) according to the manufacturer's protocol. The ORF2 sequence obtained above was used as the starting point for the amplification. PCR-amplified products were cloned into the pMD18-T vector (TaKaRa, Dalian, China) and were introduced into chemically competent E. coli DH5α cells. The plasmid DNA was sequenced using an ABI3730 DNA Analyzer (Applied Biosystems). The complete genome sequence of HAstV-6 has been deposited in GenBank (GenBank Accession number GQ495608).
ORF prediction and RNA structure analysis
ORF1a and ORF2 were predicted for HAstV-6 192-BJ07 using the DNAStar ORF search program. ORF1b was predicted based on the "shifty"' heptanucleotide (AAAAAAC) that occurs in other HAstVs [
9]. RNA secondary structures were evaluated using RNAstructure 4.5 software.
Phylogenetic analysis
The MegAlign programs in the DNAStar software package were used to perform multiple sequence alignments. HAstV phylogenies with 1000 bootstrap replicates were created using the neighbor-joining method and the Kimura two-parameter model with the MEGA software version 4.0 [
31].
Similarity analysis
SimPlot software version 3.5.1 [
32] was used to analyze the relationships among the aligned HAstV genome sequences. The complete genome sequences of 192-BJ07, HAstV-1 (GenBank accession numbers L23513), HAstV-2 (GenBank accession number L13745), HAstV-3 (GenBank accession number AF141381), HAstV-4 (GenBank accession numbers AY720891), HAstV-5 (GenBank accession number DQ028633), and HAstV-8 (GenBank accession number AF260508) were first aligned by using Clustal W of the MEGA 4 program, and then 192-BJ07 was chosen as the query sequence for the similarity analysis. Similarity was calculated in each window of 200 bp using the Kimura two-parameter method.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
JW, RG, GPB, and GV conceived the study. LG and JW designed the experiments. LG and WW carried out the experiments and analysis. YL participated in sequence analysis. LG, RG, and JW wrote the manuscript. All authors critically read and approved the final manuscript.