Introduction
Rotavirus A (RVA) is the most common cause of acute gastroenteritis in children less than five years of age and is responsible for approximately 128,500 annual deaths worldwide, mostly in low-middle income countries. India accounted for 21,357 deaths from the rotaviral disease in 2015, accounting for 20% of all global rotaviral deaths in children under five years of age [
1,
2]. Rotavirus still remains the leading cause of diarrheal hospitalizations in low and middle income countries despite the large-scale use of rotavirus vaccines globally [
3].
Rotaviruses have a three-layered structure with icosahedral symmetry, and have 11 segments of double-stranded RNA (dsRNA), which encode six structural proteins (VP1-VP4, VP6 and VP7) and six non-structural proteins (NSP1-NSP6) [
4]. A binary classification has been widely used since the 1990s to classify RVA into G (VP7) and P (VP4) genotypes. Due to the segmented nature of the dsRNA genome, the genes encoding VP7 and VP4 can segregate independently leading to different combinations of G- and P- types [
5]. Recently the binary classification system was extended to include the other nine genome segments based on nucleotide identity cut-off values to describe RVA strains and study RV diversity. The classification nomenclature for the structural and non-structural proteins of RVA is Gx-P[x]-Ix-Rx-Cx-Mx-Ax-Nx-Tx-Ex-Hx, representing VP7-VP4-VP6-VP1-VP2-VP3-NSP1-NSP2-NSP3-NSP4-NSP5 genes respectively, where x indicates the number of corresponding genotypes [
6,
7]. At present, 42G, 58P, 32I, 28R, 24 C, 24 M, 39 A, 28 N, 28T, 32E and 28 H genotypes have been described [
8]. The majority of human rotaviruses possess the Wa-like (Gx-P[x]-I1-R1-C1-M1-A1-N1-T1-E1-H1) constellation, of porcine origin, or the DS-1-like (Gx-P[x]-I2-R2-C2-M2-A2-N2-T2-E2-H2) constellation, that is of bovine origin [
7]. In addition, a few strains belong to the AU-1-like (Gx-P[x]-I3-R3-C3-M3-A3-N3-T3-E3-H3) constellation which are of feline origin [
6].
The whole genome characterisation of all 11 genes of RVA has provided valuable insights into RVA diversity that results from the accumulation of point mutations, genetic reassortment, and intragenic recombination [
9‐
13]. In order to trace the evolutionary pattern of various strains, full genome characterization is required for interpretation of the origin of each segment of the RV genome [
6]. Global studies on RVA strain surveillance and characterization have revealed high diversity of human RVA strains, including the possibility of gene reassortment between Wa-like and DS-1-like strains [
14‐
19].
Surveillance from a single location over time can be valuable in understanding viral diversity and evolution. G1P[8] was the most predominant strain in India prior to vaccination (2012–2016), followed by G2P[4], G9 and G12 [
20]. Vellore reported 32% of diarrheal hospitalizations due to RVA from 2005 to 2016. G2P[4] was the predominant strain in the initial years and was gradually replaced by G1P[8]. The emergence of G9P[4] replacing G9P[8] and emergence of G12 strains was also documented [
21]. Molecular characterization of Indian strains has been performed mainly for the VP7 and VP4 genes, and data on the remaining nine genes are limited. In this study we report the whole genome analysis of common RVA strains circulating in Vellore, India, for the years 2002–2017.
Discussion
This study reports on 199 complete genome sequences of common RV strains from Vellore, India, selected over a period of 15 years (2002–2017). This new data will significantly increase the number of publicly available whole genome sequences from Indian settings. Genotypes G1P[8], G2P[4] and G12P[6]/P[8] are detected at high frequencies in India [
24‐
32] and also other parts of the world [
33‐
37]. Eighty-eight G1P[8], thirty G12 and sixty-seven G2P[4] strains were sequenced and characterized along with few other strains in small numbers. It is important to note that the majority of strains had a classical Wa like (92%) and DS-1-like backbones (86%), while few strains (8% of Wa and 14% of DS-like) bore a reassortant’s backbone. Amongst the G1P[8] strains 7% were reassortant, while 4% of G2P[4] strains 4% were reassortant. Similar data was reported from studies conducted in United states and Africa where majority of the G1P[8] and G2P[4] sequences had the consensus Wa and DS-1 like backbone and very less proportion of reassortants [
38‐
42]. This also supports the fact that these common circulating viruses carry a very stable backbone and the genotype 1 is linked with G1 specificity and genotype 2 is linked to G2 specificity in majority of the cases and reassortment is a rare event. The reassortant strains had one to nine reassortant genes. One DS-1 like G1P[8] strain (RVA/Human-wt/IND/TN020260/2017/G1P[
8]) was detected with rare E6-NSP3 (G1P[8]-I2-R2-C2-Mx-A2-N2-T2-E6-H2) and appear to have originated from 3 to 4 reassortment events between human DS-1-like and G2P[4] and Wa like G1P[8] strains. DS-1 like G1P[8] was first reported from South Africa in 2008 and had later been reported from various studies from Vietnam, Philippines, Japan, United States and Malawi [
15,
18,
39,
43,
44]. DS-1 like G1P[8] in combination with E6 genotype is reported for the first time from an Indian setting. It is noteworthy that, amongst the G12 strains, G12P[8] had no reassortant strains while amongst G12P[6] strains there were 19% reassortant strains. Among the few sequenced G9 (N = 9) strains, 44% were reassortant. Unusual reassortant G9 strain- RVA/Human-wt/IND/CM-1261/2015/G9P[8] was identified with the constellation G9P[8]-I2-R2-C2-Mx-A2-N2-T1-E1-H2 where six genes appear to have reassorted from a DS-1 like strain into a Wa like G9P[8] strain. Three atypical G9 strains from 2011 with G9P[4]I2-R2-C2-M2-A2-N2-T2-E6-H2 constellation was observed with rare E6-NSP4 gene. Similar reassortant strains of G9P[4] strains had been previously reported from India, Latin America and recently from Benin in 2016 [
41,
45‐
47]. In our study the rare E6 genotype was also observed with G1P[8] and G2P[4] strains during 2017 and 2012 respectively. The E6-NSP4 was first reported from Bangladesh in combination with G12P[6] in 2000 [
37]. During the same year, it was reported from New Delhi, India in combination with G8P[6] [
48]. Later in 2017, it was reported with G2P[4] strains from Pune, India, in samples from 2009 to 2013 [
49]. E6-NSP4 has now been reported only from the human hosts and is emerging in different geographical locations among the DS-1 like strains mostly and Wa-like strains in rare occasions. Unusual reassortant strains characterized in this study also includes two strains- (RVA/Human-wt/IND/IN1004655_CMC_00025/2012/G2P[8] and RVA/Human-wt/IND/IN1005086_CMC_00027/2012/G2P[8]) with genetic constellation G2P[8]-I2-R2-C2-M2-A2-N2-T2-E2-H2 detected in 2012 which appears to be VP4 reassortant between Wa-like and DS-1-like RV. Strains with similar reassortant constellation was identified previously in USA [
39].
Few vaccine reassortants were also detected in our setting. A RotaTeq (RV5) vaccine VP6 reassortant strain-RVA/Human-wt/IND/IRID-2631/2009/G1P[8] was identified with constellation G1(WT)P[8](WT)-I2(RV5)-R1(WT)-C1(WT)-Mx-A1(WT)-NI(WT)-T1(WT)-E1(WT)-H1(WT). VP6 gene from RV5 strain could have reassorted into Wa-like strain. Another RotaTeq derived human strain (RVA/Human-wt/IND/C-1351/2010/G4P[8]) was identified with the constellation G4(WT)P[8] (WT)-I1(WT)-R1(WT)-C1(WT)-M1(RV5)-A1(WT)-N1(WT)-T1(WT)-E1(WT)-H1(WT), where M1-VP3 gene appear to be of vaccine origin which reassorted with Wa-like strain. RotaTeq was not included in the national immunization program at this time but was licensed for private use. It is quite possible that shed vaccine strains could circulate and reassort. It is important to note that such strain appear like a classical Wa-like RV from its constellations, but the in-depth whole genome analysis reveals the true origin of the strain. Hence whole genome analysis becomes an important tool to fully characterize and understand the true origin of the circulating strains. This observation is consistent with the findings of another research group from Brazil, Nicaragua and United States who have reported that such reassortment events are expected considering attenuation of RotaTeq vaccine and the segmented nature of RVA genome [
50‐
53]. Vaccine reassortant strain with partial Rotarix and partial RotaTeq genes in backbone was previously reported from USA [
39]. Triple reassortant strain RVA/Human-wt/IND/RVGE-505/2012/G12P[6] was identified with constellation G12P[6]-I1-R1-C1-M1-A1-N1-T1-E1-H1 where the VP6, NSP3 and NSP5 gene clustered closely to the 116E strain and rest of the genes clustered with other Indian neonatal strains. 116E is a natural reassortant neonatal RV strain that was identified in 1985 in New Delhi, India, and was later developed into Rotavac vaccine strain [
54]. It was licensed in 2014 and included in National Immunization programme in 2016. Our strain was isolated in 2012 prior to licensure and introduction of 116E vaccine. Therefore, this strain does not appear to be of vaccine origin instead it could have originated from 116E like wildtype strains which could have been circulating at that time.
Rotaviruses are generally species specific but cross species transmission is possible and has been demonstrated frequently. Surveillance of circulating rotaviruses in the human population has revealed the presence of several uncommon genotypes. Many of these have been found in animals, and it is possible that they arose in the human population through zoonotic transmission [
55‐
58]. Six study strains showed multiple genes of animal origin in its backbone and six other strains with single gene reassortment from an animal source. Strain RVA/Human-wt/IND/IN1003238_CMC_00038/2011/G4P[6] with genetic constellation G4(AN)P[6]-I1(AN)-R1(AN)-C1(AN)-M1(AN)-Ax-N1(AN)-T1(AN)-E1(AN)-H1(AN) appears to be a human-porcine reassortant strain with majority of the genes clustering with porcine strain. Two P[14] strains with unusual T6-NSP3 and H3-NSP4 RVA/Human-wt/IND/
IN1004413_CMC_00022/2012/GxP[14], and RVA/Human-wt/IND/
TN020204/2017/G8P[14] shows constellation GxP[14]-I2(AN)-R2(AN)-C2(AN)-M2(AN)-Ax-N2(AN)-T6(AN)-E2(HU)-H3(AN) and G8P[14]-I2(AN)-R2(HU)-C2(AN)-Mx-A11(AN)-N2(AN)-T6(AN)-E2(HU)-H3(AN) respectively. Both the P[14] appears to be human animal reassortant. It is also noteworthy that these strains carry the unusual T6-NSP3 and H3-NSP5 genes which are mostly seen in animal rotaviruses. Several P[14] strains with DS-1 like backbone bearing T6-NSP3, H3-NSP2 and A11-NSP1 gene has previously been characterized from humans and were closely related to RV strains from sheep, goats and cattle [
59]. Two other strains RVA/Human-wt/IND/RO1-14518/2010/G2P[4] + P[8] with constellation G2P[4] + P[8]I2(AN)-R2(AN)-C2(AN)-M2(AN)-A2(HU)-N2(AN)-T2(HU)T6(AN)-E2(HU)-H2(HU)H3(AN)) and RVA/Human-wt/IND/IN1003535_CMC_00014/2011/G6P[x] with constellation G6P[x]-I2(AN)-Rx-C2(AN)-Mx-Ax-N2-T6(AN)-E2(AN)-H3(AN) shows multiple genes from animal origin including the unusual T6-NSP3 and H3-NSP5. G6 in combination with P[1], P[5], P[7] and P[14] strains are mostly of animal origin and have been reported from cow, sheep, antelopes and horses previously [
60]. Hence, our G6 study strain proves to be a pure animal RV strain transmitted to human host. An untyped Wa like strain RVA/Human-wt/IND/IN1000458_CMC_00052/2010/GXPX appears to be porcine derived RV with constellation GxP[x]-I1(AN)-RX-C1(AN)-MX-AX-N1(AN)-T1(AN)-E1(AN)-H1(AN) where the six characterized genes were of porcine origin. Four strains of typical DS-1 like backbone (CM-0180, CM-0170, CM-0423 and CM-0002) with single N2-NSP2 animal gene reassortment were detected. One strain (C-65) of typical DS-1 like backbone had single E2-NSP4 animal reassortant gene. Strain (CM-0059) of typical Wa like constellation also had single I2-VP6 animal reassortant gene. This suggests that not just unusual genotype but common genotypes with classical constellation houses genes of animal origin as a result of reassortment which goes undetected in genotyping assays and is evident only when sequences are subjected to phylogenetic analysis.
Gene specific analyses of the sequenced strains clearly reflects the heterogeneity and diversity among each genotype as indicated by multiple clusters of sequences for each gene. Among the Wa like strains the VP7, VP4, VP1, VP3, NSP1 and NSP5/6 genes appear less diverse than VP6, VP2, NSP2, NSP3 and NSP4 genes. While among the DS-1 like strains the VP4, VP7, VP1, VP3, NSP3 and NSP5 genes appear less diverse than VP6, VP2, NSP1, NSP2 and NSP4 genes. DS-1 like strain’s G2-VP7, P[4]-VP4, T2-NSP3 and H2-NSP5 genotypes showed only one sub-genotype circulating in the past 15 years suggesting these genotypes to be highly conserved. While genes of other genotypes showed at least two different circulating sub-genotypes. Amongst the Wa-like strain genotypes, only A1-NSP1 gene appeared to be highly conserved with only one human sub-genotype circulating in the past 15 years. Previous comparative studies have reported circulation of multiple sub-genotypes referred as alleles for each RVA gene which also differ at different geographical settings [
38‐
42]. Each RV gene and genotype has varying mutation/substitution rates, as reported by some studies [
12,
61‐
65]. This could be one the reasons for varying diversity observed for different RV gene or genotype. It is important to note that six strains with OP-354 like P[8]-VP7 sequences were circulating in our setting. OP-354 like P[8], the new divergent form of P[8] strains, originated in south east Asia and has rapidly spread across the continent [
22]. OP-354 like P[8] are known to exist with various G types. In our study, OP-354 like P[8] were seen in combination with G1 and G9 strains. There have been few reports which suggest that OP-354 P[8] are associated with severe forms of diarrhea [
66,
67] and in future it would be interesting to analyse such association in our setting. Equine like rotavirus strains are now being reported from various studies where equine like DS-1 backbone has been observed in combination with G3P[8], G3P[6] and G3P[4] [
68‐
70]. In our study, five equine like R2-VP1 sequences and 22 equine-like N2-NSP2 sequences were detected in G2P[4] strains. This result is quite unusual and new G2P[4] strains with equine like genes have not been reported earlier. There could be a possibility that these VP1 and NSP2 genes could have reassorted from equine like human strains however, more analysis would be required to support the data.
Our study has a few limitations since we were not able to sequence all the strains isolated during 2002–2017 and only few strains from each year were randomly selected for characterization. Other important prevalent strains like G3P[8], G1P[6] and G9P[8]/P[4] were not included whose genetic surveillance is equally important. Very few unusual strains (where evidence of reassortments and zoonotic transmission are more likely) were sequenced and there are chances that characterizing more of such strains will add to the knowledge on rotavirus diversity in Indian settings. For some of the study strains, few genes failed to be sequenced (specially for the unusual strains) and better sequencing protocols are required to meet such challenges.
Materials and methods
Ethics statement and sample collection
This study was approved by the Institutional Review Board of Christian Medical College (CMC), Vellore. In this study samples were selected from the biorepository of stool samples already collected and stored for different community- and hospital-based rotavirus studies at the Wellcome Trust Research Laboratory, at CMC, Vellore. The rotavirus positive stool samples were obtained from our laboratory biorepository. These diarrheal samples were collected from children under 5 years of age as a part of rotavirus surveillance studies conducted both in the hospital or community settings. The stool samples were first screened for rotavirus particles by ELISA and the rotavirus positive samples were genotyped by RT-PCR or sanger sequencing.
Selection of samples for whole genome sequencing
The VP7 and VP4 genotype and Sanger sequence data from the laboratory database were used to select a subset of stool samples for next generation sequencing (NGS). At least 3 strains each of G1P[8], G2P[4] and G12 genotypes were randomly selected from each year from 2002 to 2017. A small number of strains of G4, G6, G8 and G9 and uncharacterised genotypes were also included. A total of 127 Wa-like (88 G1P[8], 16 G12P[6], 12 G12P[8], 1 G12P[4], 2 G4, 2 G1P[x], 4 G9P[8], 1 G12P[x] and G1 + G12P[8]) and 80 DS-1-like strains (67 G2P[4], 2 G2P[8], 2 G2P[X], 1 G2P[4] + P[8], 4 G9P[4], 1 G8P[14], 1G6P[X], 1G9P[X] and 1 GXP[14]) were sequenced by NGS.
Viral nucleic acid was extracted from 20%(v/v) fecal extract using QIAmp 96 Virus QIAcube HT kit on the QIAcube HT automated platform (Qiagen, Germany) according to manufacturer’s instructions.
Amplification of RV genome segments and NGS
Genotype confirmation
Complementary DNA (cDNA) synthesized by reverse transcription using Moloney murine reverse transcriptase enzyme (superscript II MMLV-RT, Invitrogen, United States) and random primers (Invitrogen, United States) were used as templates for VP7 and VP4 typing by a hemi-nested multiplex PCR using published primers [
72,
73]. The untyped samples and unusual strains were sequenced by Sanger sequencing.
Amplification of 11 segments
The protocol followed in this study for amplification of RVA 11 genes segment was standardized and developed through a collaboration with the J. Craig Venter Institute, La Jolla, CA, USA. Briefly, a one-step reverse transcription-polymerase chain reaction (RT-PCR) was used to amplify 11 genes from extracted RNA in four multiplex reactions using Superscript IV One-step RT-PCR kit (Invitrogen, United States) and gene specific primers (see Table
S1). The Superscript IV PCR reaction uses Superscript IV reverse transcriptase enzyme for cDNA synthesis and a super fidelity Taq polymerase enzyme from amplification of DNA in consecutive steps. The gene segments amplified were confirmed by running the amplified product on 1% agarose gel. The PCR products were purified using AMPure XP magnetic beads (Beckman Coulter, United States) according to manufacturer’s specifications.
Next generation sequencing
Sequencing libraries were prepared using Illumina Nextera DNA flex library preparation kit following manufacturer’s guidelines (Illumina, USA). The quality of libraries was assessed using Tapestation D1000 screen tape auto electrophoresis system (Agilent, USA) and Qubit fluorometric assay (Invitrogen, USA). Libraries were diluted to 12 picomoles and spiked with 5% PhiX sequencing control (Illumina, USA). NGS was carried out on an Illumina MiSeq sequencing platform (Illumina, USA) using MiSeq Reagent Kit v3 for 600 cycles with 300 bp paired end reads method.
Whole genome sequence assembly
45 sample reads were assembled using CLC genomics Workbench (Qiagen) [
74] and the remainder were assembled using Geneious R11/Prime software (Dotmatics, United States) [
75]. Reads were first assembled using de-novo assembly followed by reference assembly to obtain full length sequences with a depth coverage greater than 100X. The length of open reading frames (ORFs) for gene segments were VP7 (981), VP4 (2328), VP1 (Wa-like 3283/DS-1-like 3267), VP2 (Wa-like 2685/ DS-1-like 2640), VP3 (2508), VP6 (1194), NSP1 (1461), NSP2 (954), NSP3 (Wa-like 933/DS-1-like 942), NSP4 (528), NSP5 (Wa-like 594/DS-1-like 603) base pairs.
Accession numbers
The sequences for all the study strains were deposited to GenBank under accession numbers listed in the Supplementary File 10.
Genotype assignment
Genotypes to each gene sequence was assigned using ViPR (Virus Pathogen Resource) Rotavirus A genotype determination tool (This web tool has now been upgraded as RIVM Rotavirus A genotyping tool version 1.0, manuscript is yet in preparation) [
76]. After the genotype assignment, the strains were classified into genogroups Wa-like (genogroup 1) or DS-1-like (genogroup 2) for further analysis.
Distance and phylogenetic analysis
Preparation of datasets
Fasta files of all the sequenced study strains were used for analysis. Pre-existing RV sequences were downloaded from GenBank database that included the primordial (referred as human classical rotavirus strains) RV strains from the globe (details in Table S4) Sequence similarity was analysed using p-distance method. Genetic distances were calculated for all the genes. Similarities were calculated within the study sequences (Table S5-S6) and in comparison, with several RVA reference sequences from GenBank (Table S7-S9). To compare Wa-like study sequences, human classical strains of genotypes G1P[
8] (Wa, KU, AM06-1, Dhaka16), G12P[6] (Dhaka12, Matlab13, GER172), G12P[8] (B4633, GER126, Dhaka25), G3P[8] (P), G4P[8] (DC-2241), G9P[8] (WI61), and G4P[6] (ST3) were used. Vaccine strain sequences from 116E and Rotarix strain were also included for the analysis. To compare DS-1-like study sequences, human classical G2P[4] strain sequences (116E3D and DS-1), vaccine strain sequences (RotaTeq), DS-1 like G1P[8] (UFS-1971 and UFS-1973), classical Equine like strain sequences (IS1078, S13-30, S13-45), recent strains (BEN-7194 and BEN-7196) were included Likewise, Wa-like and DS-1-like human strains circulating in India (referred as Indian reference strains) were included in the distance analysis. For the phylogenetic analysis, apart from human classical rotavirus strain and Indian reference strains, few strains of animal origin were included from GenBank.
Sequence alignment was carried out in Aliview software using MUSCLE algorithm [
77]. Nucleotide and deduced amino acid sequence identities among strains were calculated for each gene using
p-distance algorithm in MEGA 7 software [
78]. The Model Finder program was used to identify the optimal substitution model that best fit sequence datasets using Bayesian Information Criterion (BIC) [
79]. Maximum likelihood trees were constructed using IQTree-2 software with 1000 bootstrap replicates [
80,
81]. Maximum likelihood trees for each gene were built using the best substitution model according to the Bayesian criterion. The trees were visualized using iTOL online server and the sub-genotype clusters were defined based on branching patterns and bootstrap value [
82]. Where sub-genotype cluster is defined as a distinct phylogenetic branch observed within a genotype containing the sequences sharing high similarity (> 95%) and supported by a bootstrap value of > 70%. Branches with bootstrap higher than 70 were considered for inference. The clusters with large number of leaves were collapsed for easy visualization. The human sub genotypic clusters of each genotype were read as HC-1 to HC-N, in the order of its appearance in the tree from top to bottom. The sequences that do not group with any reference other study strain sequences exist as a singlet leaf. For comparison, various classical strains, vaccine strains, reassortant wild-type human strains and wild-type animal strains were included (Refer Table S4).
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.