Background
The family
Closteroviridae, to which
Citrus tristeza virus [CTV] belongs, comprises members with single-stranded (ss)-, positive polarity RNA genomes that are the largest among plant-infecting viruses. The approximately 19.3 k bases (kb) genome of CTV exists as a monopartite RNA component and consists of 12 genes potentially encoding 19 protein products [
1,
2]. Based on the analysis of complete genome sequences using a > 7.5% average nt variation as a guide for delineating genotypes, at least six major genotypes, T36, T30, RB, VT, T3, and T68, exist among CTV strains [
3,
4]. Among the different CTV genotypes, the frequency of variations at the genomic 5′ half is generally higher than that of the 3′ half [
5,
6]. Within each genotype, as with all RNA viruses, CTV isolates exist as a population with one predominant consensus sequence accompanied by a pool of genetic variants called quasispecies [
7‐
10]. The 5′ and 3′ terminal regions in the genomes of RNA viruses play important roles in many biological processes, such as translation, replication, virion assembly and pathogenesis [
11‐
16]. Therefore, a detailed knowledge of the nt heterogeneity at the ends of the CTV genome would contribute an indispensable perspective to our understanding of CTV functions and applications.
The current work is an important prelude to the construction of infectious cDNA clones of California (CA) strains of CTV to be used as a reverse genetics platform for studying viral gene functions, and for molecular biotechnology applications that are specific to the needs of California. Specifically, knowledge of the nt variations within each CTV genotype would guide the design of molecular cloning strategies aimed at incorporating these nts into the infectious cDNA clones. To this end, we have focused on determining the heterogeneity at the extreme termini of the genomes of two prevalent strains of CTV with the T36 and T30 genotypes (from here on referred to as T36-CA and T30-CA, respectively) that are widely distributed throughout California. Symptoms associated with T30-CA and T36-CA infection vary on different citrus hosts and may include leaf cupping, vein clearing, stem pitting, seedling yellow, and quick decline. In citrus scions that are grafted on most commercially grown CTV tolerant rootstocks, T30-CA and T36-CA are associated with relatively mild to sometimes asymptomatic infection. While data on polymorphism within the 5′ and 3′ untranslated regions (UTR) s of the CTV genome exist [
5,
17,
18], there is no published information on the stringent and comprehensive assessments of the heterogeneity at the extreme ends of CTV genomes in the literature. Furthermore, while the complete sequence of CTV with the T30 genotype isolated from
Citrus reticulata Blanco (Murcott mandarin) in Fillmore, CA became available recently [
4], whole genome sequence information of other isolates with the T30 genotype from California as well as CTV with the T36-CA genotype have not been determined. Thus, knowledge of the heterogeneity of the genomes and genome ends of California CTV strains exhibiting these two genotypes remains limited.
In the first part of this study, we analyzed the nt heterogeneity at the extreme genome ends of T36-CA and T30-CA. This information facilitated the design of specific strategies to clone the predominant genomic sequences of T36-CA and T30-CA. In the second part of this study, the complete nt sequences of T36-CA and T30-CA were assembled and compared with those of reference sequences of the same or different genotypes as our two queried sequences. To our knowledge, T36-CA and T30-CA are the first T36 and the second T30 genotypes, respectively, originating from California to be completely sequenced and analyzed.
Discussion
This study has identified the predominant consensus nts and the nt heterogeneity at the extreme 5′ and 3′ termini of two prevalent CTV genotypes, T36 and T30, from California, as well as the nt variants present in the genomes of these viruses. The T36-CA infected samples harbored two predominant groups of 5′ terminal consensus nts, such as those exemplified by 5′ variants 1 and 4 (Table
1) i.e. “AATTTCTCAA” and the unique “AATTTCAAA”, respectively, that are identical to (e.g. isolate FS674 [KC517485] and isolate FS703 [KC517487]), or similar with (e.g. isolate FS02–2 [EU937521] and T36-FL [AY170468]) those of other CTV isolates with the T36 genotype. In contrast, the T30-CA infected samples contained only one predominant set of 5′ terminal consensus nts (5′ variant 1) (Table
2) that is identical to one of two sets of T30 consensus nts in GenBank. This is consistent with a previous report showing a high degree of nt conservation of T30 strains from different geographic regions separated by more than a hundred years [
5]. Some of the nt variants of T36-CA and T30-CA were found to contain an extra “C” upstream of the consensus nts (5′ variants 3 and 6 [Table
1] and 5′ variant 2 [Table
2]). The presence of an unpaired “G” at the extreme 3′ end of the (−)-RNA of the CTV dsRNA (i.e. with no complementary “C” on the [+]-RNA) has been observed previously, and this is postulated to be a common feature for the alphavirus-like superfamily of viruses, to which CTV belongs [
1,
17]. This feature also was seen in T36-CA and T30-CA when we analyzed the (−)-DNA sequences (Tables
1 and
2). Interestingly, an extra “C” was found in the (+)-DNA sequences of T36-CA and T30-CA, suggesting that the (+)-strand of CTV dsRNA, with an extra “C” incorporated upstream of the consensus nts, exists in the sequence population. It is unclear whether the “C” on the 5′ end of the (+)-RNA serves as a template for the “G” at the 3′ of the (−)-RNA during replication.
Nucleotide variations at the extreme 3′ end of the CTV genomic RNA are less compared to those located at the extreme 5′ end, whether among or within CTV genotypes. For example, many CTV genomic sequences in GenBank have the highly conserved “AGGTCCA” at their 3′ ends. In this study, most of the 3′ (+)-DNA clones of both T36-CA and T30-CA sequenced were found to end with “AGGTCC” (Tables
1 and
2). Because the 3′ RACE procedure incorporates a poly “A” tail immediately downstream of the final “C”, any “As” that might exist downstream of it were indistinguishable from those of the poly “A” tail. Taking this caveat into consideration, it is likely that “AGGUCCA” are the predominant consensus nts at the 3′ termini of the (+)-RNA of both T36-CA and T30-CA. It might have been possible to determine the 3′ end nts of the genomic RNA by performing RLM-RACE using the (−)-RNA. However, in our hands, this procedure was unsuccessful in yielding any cDNA products corresponding to the 5′ end of the (−)-RNA. Another nt variant (AGGTCCAT) at the 3′ ends of T36-CA (3′ variant 2) (Table
1) also was observed in many sequences and this suggested that an additional “U” could be incorporated downstream of the consensus 3′ end ribonucleotides. For CTV, several studies reported that nt variants with non-template nts incorporated downstream of the 3′ end consensus nts in the (+)-RNA were identified when the DNA products generated from viral dsRNA were sequenced [
1,
17,
32]. However, there were disparities in these reports. For example, in one study, a non-template “U” was identified downstream of the last “CCA” in the (+)-RNA of a FL strain of T36 [
1]. In another study, the T36 genome was found to end with “CC” and it was suggested that an “A” might be incorporated downstream of “CC” in the (+)-RNA as a non-template nt [
32]. Our results suggested that the T36-CA population contains both “CCA” and “CCAU” at the extreme 3′ end of the (+)-RNA (3′ variants 1 and 2) (Table
1). The “CCA” has been shown to be important for virus replication [
14] but the additional “U” after the “A” appears to be neither required for virus replication nor systematic infection [
13,
30,
31].
The determination of nts at the extreme ends of the CTV genome by sequencing cDNA clones derived from viral dsRNA was previously done for other T36 and T30 isolates obtained from various sources and locations outside California, including Florida, USA and Spain [
1,
5,
6,
17,
18]. However, none of these studies included any comprehensive analysis on the nt heterogeneity at the extreme ends of the CTV genomes. Our study contrasts with that of Lopez et al. [
17] and others [
1,
5,
6,
18] in two regards. First, the nt variations seen in our results are specific to California strains of T36 and T30, and little or no information on nt heterogeneity in the genome ends is available for CTV of any genotypes. Furthermore, one of the predominant sets of T36-CA 5′ end consensus (AATTTCAAA) is unique and this has never been documented. Second, the information of the extreme 5′ end nts were derived from the (+)- and the (−)-RNA using 5′ RLM-RACE and 3′ RACE, respectively, thus allowing us to more thoroughly analyze the nts in the regions being queried. In contrast, polyadenylated (+)- or (−)-RNA of the viral dsRNA was used to determine the sequence information of the genome ends of CTV in all of the studies reported in the literature [
1,
5,
6,
13,
17,
18]. Our sequencing results clearly identified the nt heterogeneity at the extreme 5′ end of T36-CA (e.g. 5′ variants 2, 4, 5 and 7 [Table
1]) that has not been reported before for other isolates with the T36 genotype, and this information would have likely been missed had polyadenylated (+)- or the (−)-RNA alone been used.
The above findings have clearly provided useful guidelines for strategies to clone the predominant CTV sequences within their respective populations. For example, using an oligo primer containing the nts of 5′ variant 4 (Table
1), we were able to incorporate the unique consensus “AATTTCAAA” into the RT-PCR amplified T36-CA 5′ cDNA fragment. In addition, identification of the consensus nts (AAUUUCGAUU) at the 5′ end of T30-CA RNA population also allowed us to design the appropriate oligo primers to incorporate these nts into the T30-CA 5′ cDNA fragment. Knowledge of the additional 5′ and 3′ end nts or nt substitutions seen in some of the variants would give us the option of incorporating them into the full-length infectious cDNA clones of T36-CA and T30-CA in the future. These infectious clones can be used to investigate whether the nt variants at the extreme ends of the CTV genome are involved in any biological functions.
Recently, the complete genome sequence of T30-AT4, a CTV with the T30 genotype originating from Fillmore, CA, was determined using small (s) RNA deep sequencing [
4]. Here, our analyses have shown that the T30-CA genome shares 99.4% identity with that of T30-AT4. Although the T30-CA isolate also originated from Fillmore, CA, it was isolated from
Citrus aurantifolia (Mexican lime), while T30-AT4 was isolated from
Citrus reticulata Blanco (Murcott mandarin). This suggests that the host species may have some influence on the distribution of the major nt variant sequences in a CTV population [
33]. However, whether or not the two different host species could influence the nt heterogeneity at the extreme ends of the T30-CA and T30-AT4 genomes remains unknown since information on nt heterogeneity at the terminal ends of T30-AT4 is not available.
Pairwise comparisons of the complete genome sequences of T36-CA or T30-CA with those of other isolates from the same genotypes (Tables
3 and
4; data not shown) consistently showed a high degree of genetic conservation (> 97.3% nt identity for T36 and > 98.9% nt identity for T30), between the sequences [
3,
5]. In contrast, the nt identity between T36-CA and T30-CA is only 81.7%, and both sequences also show higher nt diversity compared to the reference sequences of the different genotypes (e.g. SY568, RB-AT25 and VT-AT39) found in California (Tables
3 and
4). Collectively, these results are consistent with our knowledge of the genotypic diversity within CTV. The determination of the complete genome sequences of the two genotypes, T36-CA and T30-CA, prevalent in California will pave the way for ongoing studies aimed at engineering a CTV-based vector suitable for specific molecular biotechnology applications.