Background
The members of the family
Geminiviridae, one of the two largest natural groups of plant viruses, are characterized by a circular, single-stranded DNA (ssDNA) genome encapsidated within virions whose morphology is unique in the known virosphere, consisting of two joined, incomplete T = 1 icosahedra [
1,
2]. Geminiviruses are classified into four genera, based on their genome organization, plant host range, and insect vector. Members of the most diversified genus,
Begomovirus, are transmitted by the whitefly
Bemisia tabaci (Hemiptera; Aleyrodidae), infect a wide range of dicotyledonous plant species, and have either monopartite or bipartite genomes [
3]. In recent decades, these viruses have emerged as major threats to food and fiber crop production throughout the world, apparently as a result of a great increase in vector population densities, expansion of crop monocultures, transport of plant materials between geographically distant regions, and introduction of foreigner whitefly biotypes [
4,
5].
Approximately 200 species of begomoviruses are currently known, grouped into two major lineages based on their genomic sequences: the Old World (OW; Europe, Africa, the Indian subcontinent, Asia, and Australasia) and the New World (NW; the Americas) begomoviruses [
6,
7]. The OW begomoviruses have either monopartite or bipartite genomes, while all NW begomoviruses (for simplicity, NW-Beg) have two genomic components, known as DNA-A and DNA-B. The DNA-A component of NW-Beg has one open reading frame in the virion sense (
AV1 or
cp gene) encoding the coat protein, and four overlapped ORFs in the complementary sense (
AC1 or
rep gene,
AC2 or
trap gene,
AC3 or
ren gene, and
AC4) that encode proteins involved in DNA replication, regulation of viral gene expression and suppression of host-defense responses [
1,
8]. The DNA-B component contains only two ORFs, one in the virion sense (
BV1 or
nsp gene) and other in the complementary sense (
BC1 or
mp gene), encoding proteins involved in intra- and intercellular movement of the virus [
9,
10]. The two genomic components are very different in overall nucleotide sequence, with the exception of a ~180-nt segment of the intergenic region (IR) displaying high sequence identity, termed the "common region" (CR). This region includes several repeated sequences (5 to 8-nt in length) called "iterons", which are closely associated to a ~30-nt conserved element that has the potential to form a hairpin structure that harbors in its apex the invariant nonanucleotide 5'-TAATATTAC- 3' [
1]. Both the iterons and the conserved nonanucleotide in the hairpin element are functional targets for Rep, the virus-encoded protein that initiates the DNA replication by a rolling-circle (RCR) mechanism. Rep recognizes and binds specifically to the iterons and subsequently introduces a nick into the invariant nonanucleotide to initiate the RCR process [
11,
12].
The NW-Beg have radiated to a great extent since its arrival to the American continent, and several secondary lineages or "clades" have been identified in phylogenetic studies [
6,
13,
14]. The most atypical of the NW-Beg clades is the one named after the
Squash leaf curl virus (SLCV) that encompasses more than 15 viral species distributed from Southern EUA to Brazil [
7,
13]. Members of the SLCV clade are differentiated from other NW-Beg by two main features: 1) the number and arrangement of the iterons in their replication origin, that are distinctive, and 2) the N-terminal domain (i.e., residues 1 to 150) of their Rep proteins display low aa sequence identity (< 50%) with proteins encoded by typical NW-Beg, lacking several amino acid motifs which are conserved in both NW- and OW- begomovirus Rep proteins [[
15‐
17]; unpublished data].
Among the earliest recorded members of the SLCV-clade is
Euphorbia mosaic virus (EuMV), which was associated with symptomatic
Euphorbia heterophylla plants throughout the Caribbean basin and the tropical Americas since the 1970's [
18,
19]. However, its molecular characterization was not carried out until 2007, when the complete genome sequence of EuMV-YP, the isolate associated with the former plant host in the Yucatan Peninsula of Mexico, was reported [
20]. Complete DNA-A sequences from two additional EuMV isolates were available at GenBank at that time, one from Puerto Rico (EuMV-PR) and the isolate whose complete sequence is now reported here, from Jalisco, Mexico (EuMV-Jal). According to their full-length DNA-A sequence identity, the EuMV isolates were classified into two different strains, simply termed "A" and "B". The first strain was represented by EuMV-YP and EuMV-PR, while EuMV-Jal was the only member of the "B-strain" [
7]. However, the recently described EuMV-JM, from Jamaica [
21], displays a very similar sequence identity to both EuMV-PR (A-strain, 95% identity) and EuMV-Jal (B-strain, 95.4% identity). Therefore, the relationship between EuMV isolates belonging to supposedly distinct strains should be experimentally addressed.
In this work we report the complete molecular characterization of EuMV-Jal, which was found infecting peppers and weeds in Jalisco, Mexico, and was shown to be incompatible in replication with EuMV-YP in reassortment experiments. The genomic analysis of this novel EuMV strain led to the unforeseen discovery of an assemblage of DNA-A homologous sequences in the intergenic region of its DNA-B, whose position and arrangement is conserved in several begomovirus species, hence suggesting the intriguing possibility of a functional role of those atypical sequences in the infective cycle of EuMV and its relatives.
Discussion
In this study, we described the molecular and biological characterization of a novel strain of
Euphorbia mosaic virus that was isolated from pepper plants in the state of Jalisco, Mexico, near to the Pacific shoreline. This virus displays 92% sequence identity with EuMV-YP, that was isolated in the same country but in a distant region, close to the Atlantic coastline [
20]. These viruses differ in two important features of their DNA-A replication origin region: the nucleotide sequence of their iterons, and the presence or absence of a G-box element, a
cis-acting sequence which is critical for
Rep promoter activity in some NW-Beg [
46]. The differences observed in the predicted Rep-binding sites of EuMV-Jal and EuMV-YP prompted us to explore experimentally their ability to form viable reassortants in pseudorecombination tests. The results of these experiments confirmed the presumption of replication incompatibility between EuMV-YP and EuMV-Jal, thus demonstrating that the latter is a new, biologically-defined strain exhibiting different replication specificity.
The finding of begomovirus strains that are not able to form viable reassortants is somehow bewildering because the common definition of a virus species is "A...class of viruses that constitutes a replicating lineage and occupies a particular ecological niche." [
47,
48]. Accordingly, it is not expected that strains of a virus species would be incompatible in replication because that implies that they do not constitute an actual replicating lineage. Nonetheless, it is generally recognized that several strains of begomoviruses probably are not complementary in replication because they display different putative
cis- and
trans-acting replication specificity determinants [
7,
17]. There is at least one report of strains belonging to a bipartite begomovirus that are not equivalent in replication functions (the "severe" and "mild" strains of
Tomato leaf curl New Delhi virus, ToLCNDV) [
49]. However, that case is different from the one examined here because the "mild" phenotype of one ToLCNDV strain seems to be related to an inefficient
trans-replication of the "cognate" DNA-B, which displays Rep binding-sites different to those of the associated DNA-A [
49,
50].
The case of the EuMV strains is significant because it is paradigmatic of an apparently common theme in begomovirus evolution, i.e., the sudden change of virus replication specificity determinants by intermolecular recombination between co-infecting viruses [
27,
51]. Indeed, the recombination analysis of EuMV isolates indicates that viruses of the EuMV A-strain probably evolved by an event of DNA intermolecular exchange involving a member of the EuMV B-strain and a virus related to CpGMV, which had donated a ~210-bp DNA segment encompassing the region of the virus replication origin and the first 44 nucleotides of the
rep gene. If this hypothetical scenario is accurate, then the recombination event should have changed simultaneously both the iterons and the Rep aa residues interacting with them, thus maintaining the proper matching of
cis- and
trans-acting replication determinants in the recombinant DNA-A component.
Diverse studies have identified the sequences encompassing the viral strand replication origin and the
rep gene segment encoding the Rep N-terminal domain, as the regions of geminivirus genomes most frequently exchanged during recombination [
28,
51‐
53]. This is consistent with the known genome localization of the Rep-binding sites and the coding sequence of the Rep domain that contains the putative DNA-binding specificity determinants of this protein, which have been theoretically mapped into the first 75 aa residues [
17,
54]. Consequently, a recombination event involving a genome portion as small as 200 to 360-bp might confers a completely different replication phenotype to begomoviruses involved in mixed infections, as presumably is the case for the EuMV strains.
Since that intermolecular recombination is/has been a major force in the evolution of geminiviruses, the concepts of both "species" and "strains" should be adapted to the peculiar nature of these entities, that are genetic mosaics in continual change, different in quality to cellular organisms. In fact, it is altogether possible that a significant part of the currently recognized begomovirus species would not constitute "replicating lineages" in a strict sense, as would be the case of EuMV, according to our experimental data. For instance, a thorough sequence analysis entailing the identification of the putative
cis- and
trans-acting Replication Specificity Determinants (RSDs) of the 182 recognized begomovirus species summarized by Fauquet et al. in 2008 [
7] revealed the existence of 34 species that include at least two groups of viruses exhibiting distinct putative RSDs, analogous to the strains A and B of EuMV. Furthermore, some ICTV-accepted species as
Ageratum yellow vein Hualian virus,
Honeysuckle yellow vein virus,
Tomato leaf curl Bangalore virus,
Tomato leaf curl Philippines virus, Tomato leaf curl Taiwan virus, and ToLCNDV, include three classes of viruses differing in their putative RSDs, and one viral species,
Ageratum yellow vein virus, comprises four types of viruses harboring distinct replication modules, plausibly acquired through independent episodes of intermolecular recombination (Arguello-Astorga, unpublished data). In view of the significant number of begomovirus species with variants that are seemingly analogous to the strains of EuMV, it would be important to establish a formal distinction between strains with similar RSDs, that represent actual replicating lineages, and replication-incompatible strains, that apparently do not.
What is the function of the DNA-B sRep HS elements?
During the analysis of the intergenic region of EuMV-Jal DNA-B we discovered a short DNA stretch identical to a segment of the
rep gene coded in the cognate DNA-A. It was subsequently find out that analogous s
Rep HS elements exist in the DNA-B IR of at least five begomovirus species, all them from the New World: EuMV from Mexico and the Caribbean basin, TMYLCAV from Venezuela, and EuYMV, ClLCrV and TGMV from Brazil. With the exception of the short
rep homologous sequence in the DNA-B IR of TGMV (that seems to be evolutionarily unrelated) the s
Rep HS elements of begomoviruses have in common several characteristics. All of them: (1) are short sequences, ranging from 35 to 51 nucleotides in length; (2) are 100% identical in nucleotide sequence to a segment of its cognate
rep gene; (3) have opposite polarity than the
rep gene; (4) are located 65 to 80-nt downstream to a putative internal promoter highly similar to
CP promoters of viruses of the SLCV clade (ClLCrV being an exception); (5) are positioned 7-9 nt downstream to a 23-bp partly palindromic element with a repeated motif similar to the CLE; and (6) are situated 115 to 145-nt upstream to the
BV1 gene. In contrast, the s
Rep HS elements of viruses that are distantly related, like EuMV, EuYMV and ClLCrV, have entirely different nucleotide sequences (see Figure
5), because the coding sequence represented in those elements corresponds to distinct sections of the cognate
rep gene.
An intriguing observation is that the identified sRep HS elements reproduce sequences encoding conserved aa motifs which are critical for Rep functions. For example, the sRep HS of EuMV strains and TMYLCAV correspond to the coding sequence of RCR Motif 1; the equivalent element of ClLCrV encodes the RCR Motif 3, and the analogous sRep HS of TGMV duplicate the rep sequence encoding the Walker B motif of ATPases/helicases. An apparent exception is the sRep HS of EuYMV, which displays the coding sequence of a conserved Rep motif of unknown function. The evolutionary conservation of sRep HS elements and the associated sequence motifs, suggests that those atypical elements play a definite but hitherto unknown function in the viral infective cycle. In absence of any factual data it is only feasible to speculate about the possible function(s) of the sRep HS on the basis of their common characteristics.
Certainly, the most remarkable feature of the sRep HS elements is its complete identity in nucleotide sequence with a specific segment of the rep gene in the cognate DNA-A component, because the evolutionary preservation of such an absolute matching between specific segments of distinct, physically separated DNA molecules, should involve very strong selective pressures against mutations diminishing the identity between the former DNA sequences. Therefore, the function of the sRep HS elements is most likely related to a process that requires a perfect or very high complementarity between DNA and/or RNA molecules, such as the gene regulation by microRNAs (miRNAs).
The miRNAs are ~22-nt-long noncoding RNAs that posttranscriptionally regulate gene expression by binding to specific mRNAs, thus repressing its translation and/or inducing its degradation [
55]. Several DNA viruses (i.e., herpesviruses, adenoviruses, ascoviruses and polyomaviruses) encode miRNAs which participate in the regulation of some processes of the viral infection cycle [
56,
57]. For example, the simian virus 40 (SV40) encodes a single miRNA which lie antisense to the viral mRNA encoding the T-antigen, a multifunctional protein essential for virus replication. This miRNA is expressed late in infection, hence promoting the T-antigen mRNA degradation and downregulating the synthesis of this protein at late stages of the SV40 replication cycle [
58]. In close analogy with SV40 miRNA, the s
Rep HS elements of begomoviruses are single, discrete noncoding DNA sequences highly similar to a specific segment of the gene encoding the viral replication protein. Further analogies between those heterologous viral sequences are the following: (1) The genomic location of the miRNA, but not its nucleotide sequence, is conserved among polyomaviruses (i.e., SV40, Merkel cell virus, human BK virus, JC virus, and mouse polyomavirus) [
59‐
61]; similarly, the location of s
Rep HS elements within the DNA-B intergenic region, but not its specific sequence, is conserved among begomoviruses (data from this study); (2) The temporal expression of the SV40 miRNA, that is restricted to the late stage of infection, is similar among all the examined polyomaviruses [
57,
59]; likewise, although the temporal expression of begomovirus transcripts including the s
Rep HS region (if any) is unknown, it is plausible than them would be late expressed, because the hypothetical promoter that lead its transcription is similar to begomovirus
CP promoters, which are typically active at the late phase of the viral infection cycle [
1,
36]; (3) Like the polyomavirus pre-miRNAs, the DNA-B sequences encompassing s
Rep HS and the neighboring sequences, have the potential to form extensive hairpin structures susceptible to cleavage by RNase III enzymes (i.e., Drosha and Dicer) involved in the processing of pre-miRNAs (data not shown). Taken together, these lines of indirect evidence suggest a potential function of the s
Rep HS elements in the posttranscriptional regulation of Rep expression, a hypothesis that must be experimentally examined.
Conclusions
The evidence gathered in this study indicates that EuMV-YP and EuMV-Jal, which are members from the strains A and B of Euphorbia mosaic virus respectively, are actually incompatible in replication, hence implying that these viruses probably represent distinct replicating lineages in natural ecosystems. The scenario we propose for the origin of the EuMV A-strain viruses involves a recombination event that substituted the DNA-A core replication module of an EuMV B- strain virus, with the analogous genomic region of a virus related to CpGMV. This intermolecular exchange suddenly changed the replication specificity of the recombinant DNA-A, thus triggering the process that led to the evolutionary differentiation of EuMV into two distinct strains. The fact that more than 30 recognized begomovirus species include two or more classes of viruses with distinct putative RSDs (i.e., analogous to the EuMV strains) suggests that intermolecular recombination events that involve the virion-strand origin of replication and the first part of the rep gene, are quite common in this group of ssDNA viruses, as has been previously pointed out (51, 52, 53). Another relevant result from this study is the discovery of atypical sequences within the intergenic region of the DNA-B component from some NW-begomoviruses, mostly related to EuMV. These sequences include short fragments of the cognate Rep gene located downstream from a potential internal promoter very similar in modular organization to CP promoters of viruses of the SLCV clade. Even though we do not know the actual function of these sRep HS elements, several lines of indirect evidence suggest their participation in the posttranscriptional regulation of Rep expression, an intriguing possibility that is currently being examined in our laboratory.
Competing interests
The authors declare that they have not competing interests
Authors' contributions
JGJ generated the infectious clones of EuMV-Jal, performed plant infections tests, carried out the phylogenetic analysis, and helped to prepare the manuscript. ABA collected isolates, cloned and sequence the viruses, analyzed the field data, and perform plant infection tests. BBH carried out the pseudorecombination experiments, and analyzed the experimentally infected plants. AAS helped in comparative sequence analyses, provided partial funding for the project's execution, and offered ideas and comments during manuscript preparation. CHZ carried out the recombination analysis, helped in plant infection tests. OMV provided the EuMV-YP clones and helped in plant infection tests with this virus. GFT collected isolates and helped to analyze the field data. GAA coordinated the project, carried out the comparative sequence analyses, secured funding for the project's execution, and prepared the manuscript. All authors read and approved the final manuscript.