Background
The
Theobroma cacao (L.) tree, or cacao, is the source of the chocolate bean. The species is endemic to the Amazon Basin in South America and was introduced into West Africa during the 1880’s [
1], where 70% of the world’s bulk cocoa supply is now produced. Soon after commercial cacao plantations were established, virus-like symptoms were reported in cacao trees consisting of foliar vein-banding, swellings on vegetative shoots (chupons), and reduced pod size, pod number, and quality of beans [
2,
3]. Three to five years post-symptom development, infected trees decline and eventually die [
4,
5]. Graft transmission was reported by Steven in 1936 [
3], leading to the hypothesis that the swollen shoot disease was caused by a plant virus, endemic to West Africa, referred to as
Cacao swollen shoot virus (CSSV) [
4,
6]. The virus is transmitted in a semi-persistent manner by 14 mealybug species [
7], and is not known to be seed-transmitted in cacao [
8]. Efforts to develop virus-resistant cacao genotypes have been largely unsuccessful, and so over time, crop losses have been substantial in Cote d’Ivoire [
9,
10], Ghana [
11‐
13], Nigeria [
14,
15], Sierra Leone [
16], and Togo [
17,
18]. Until now, CSSV and two other cacao-infecting virus species,
Cacao swollen shoot CD virus (CSSCDV), and
Cacao swollen shoot Togo A virus (CSSTAV), have been associated with cacao swollen shoot disease (CSSD) [
19], and are members of the genus
Badnavirus (family,
Caulimoviridae) [
20‐
23], also referred to as pararetroviruses.
Badnaviruses have a circular, double-stranded DNA genome 7.0–9.2 kilobase pairs (kbp) in size, encapsidated in a non-enveloped bacilliform particle [
22,
23]. Replication occurs by a viral-encoded reverse transcriptase (RT), and proceeds through an RNA intermediate [
24]. The three CSSD-associated genomes encode four to six open reading frames (ORFs), referred to as ORFs 1–4, X, and Y [
10,
25]. The precise function of the badnaviral 16 kDa ORF1 protein is not known. The ORF2 encodes a 15 kDa protein with DNA and RNA binding activity [
26], and ORF3 encodes a ~212 kDa polyprotein containing several domains that are cleaved to release a functional movement protein (MP), coat protein (CP), aspartic protease (AP), and the RT and ribonuclease H (RNase H) proteins. The ORFs 4, X, and Y, which overlap ORF3, encode 95, 13, and 14 kDa proteins, respectively, all of unknown function [
10,
27].
During the early 2000’s to the present, characteristic leaf and shoot swelling CSSD symptoms, accompanied by rapid tree decline and death about one year after the occurrence of initial foliar symptoms, have been observed in a portion of the trees growing in commercial plantations located in western Ghana and eastern Cote d’Ivoire [25; authors’ personal observations]. Concurrently, previously reliable serological and several polymerase chain reaction (PCR) amplifications failed to detect virus in 40–60% of samples, despite presence of CSSD-like symptoms [
2,
18,
28,
29]. Also, primers designed to amplify a hypervariable region of the viral MP (located on ORF3), referred to as the ‘ORF3A’ primers, reported to distinguish nine CSSD-badnavirus variants, collectively failed to detect virus in 25–50% of samples from Cote d’Ivoire and Ghana [
10,
18,
28,
30]. Even though this MP locus is not phylogenetically informative at the species level [
20], the region is informative of extensive intraspecific variability. Similarly, in a recent study of 124 field isolates from Cote d’Ivoire, only half of the samples were positive for CSSD badnavirus infection by PCR amplification [
31]. The primers were designed based on the seven CSSD-associated genome sequences from Cote d’Ivoire, Ghana and Togo available in GenBank, to direct amplification of CP, MP, RNase H, RT, and non-coding region fragments. While one or more primer pairs amplified badnavirus amplicon(s), confirmed as badnavirus by DNA sequencing, overall ~50% of symptomatic leaf and shoot samples yielded no PCR product, indicating greater than expected genomic variability among CSSD badnaviruses [
28]. Until now, there has been no systematic study of the extent of genomic variability of CSSD badnaviruses in West Africa.
The lack of sufficient representative full-length CSSD-associated genome sequences has precluded a reconciliation of evolutionary and epidemiological information required to inform disease management practices. The objective of this study was to better understand the extent of genomic variability of CSSD-badnaviruses in Cote d’Ivoire and Ghana associated with swollen shoot symptoms, including the recently observed ‘rapid decline’ phenotype. Here, a genomic pathology approach was taken utilizing Illumina Hi-Seq DNA sequencing for ‘virome discovery’, and validation by PCR amplification with virus-specific primers and Sanger DNA sequencing. The 14 apparently full-length genomes were characterized with respect to pairwise nucleotide (nt) identity, phylogenetic relationships, genome organization, and conserved protein domain architecture, in relation to the seven previously reported CSSD-badnaviral genome sequences from Ghana, Cote d’Ivoire and Togo.
Discussion
In this study, 14 full-length badnavirus genome sequences were determined by Illumina ‘virome discovery’, and confirmed by Sanger DNA sequencing of cloned, PCR amplicons. In all instances, the badnaviral genome sequences were obtained from leaf samples collected from cacao trees exhibiting foliar discoloration and swollen shoot symptoms, whereas, several of the trees also showed atypical symptoms consisting of accelerated tree decline, and rapid death. Although speculative, the latter symptoms are reminiscent of rapid necrosis and death, often associated with hypersensitive-like responses to pathogen infection.
Based on genome size and arrangement, and nt and aa sequence comparisons the genomic sequences determined here were most closely related to other previously reported badnaviruses, with the closest relatives being the three known cacao-infecting badnaviruses from West Africa: CSSV, CSSCDV, and CSSTAV [
10,
25] (Additional file
2: Table S2). In addition, all of the CSSD-genomes contained the badnavirus hallmark tRNA
met binding site, which is reminiscent of tRNA sequences present in plant genomes [
49], leading to speculation that badnaviral-associated tRNA sequences are host-derived.
The CSSD-genomes were variable with respect to predicted coding regions and genome arrangement, but were consistent among other badnaviruses, which are known to have three, four, or five predicted ORFs, variously arranged (Fig.
1). This observation has been reported previously for other CSSD isolates, which have been recently recognized as three distinct species [
10]. The proposed new badnavirus species of cacao, CRVV, identified for the first time in this study, is highly divergent from CSSV, CSSDV, and CSSTSV, with which it shares only 70–72% nt identity. And, the previously known species encode five or six ORFs, compared to four ORFs predicted for the CRVV genome, indicating that CRVV is unique among the West African CSSD-badnaviruses identified so far (Additional file
2: Table S2). Compared to the four CSSD species, the New World cacao-infecting badnaviruses, CaMMV and CYVBV, described thus far encode four ORFs [
48], whereas, the genomes of all known non-cacao infecting badnaviruses (genus-wide) have from three to seven ORFs.
The predicted conserved domains among the 14 genomes showed similarities to those reported in the genus
Badnavirus, including the DUF (ORF1), and Zn, Pep, RT, and RNase H, all found in ORF3 (Table
1, Fig.
3). The only exception was the CSSCDV genome [
28], which lacked a detectable Zn domain, and had an additional, unique DUF3187 domain in ORFY that is annotated as an ‘outer membrane hypothetical protein’ in certain
Proteobacteria [
42]. The absence of the Zn domain is difficult to reconcile because it has been reported to be an essential coat protein motif [
25]. Interestingly, no domains have been detected in all of the CSSD-associated genomes in ORFs 4, Y, and X, including the 14 reported here (Table
1, Fig.
2). The inability to detect predicted conserved domain(s) may suggest that ‘badnavirus-novel’ domains are not necessarily absent, but that they may not be discoverable using available CDD tools [
42].
A comparison between CSSD-associated genomes and the only two other cacao-infecting badnaviruses, CaMMV and CYVBV, showed interesting differences and some similarities in the CPDs. The DUF4200, which is unique to CaMMV ORF3, is annotated as a coiled-coil domain of unknown function (eukaryotic) [
42]. In pararetroviruses, coiled-coil domains have been identified in virion-associated proteins, implicated in aphid transmission and for
Cauliflower mosaic virus cell-to-cell movement [
50,
51]. The PHD, unique to CYVBV ORF3, has been associated with transcriptional regulation and chromatin-associated functions [
42], and the Trim also found in CYVBV ORF3, has predicted dUTPase activity e.g. catalyzes hydrolysis of dUTP-Mg complexes to dUMP and pyrophosphate [
42,
52]. Although the Trim domain is unstudied for CYVBV, its presence in CYVBV and in two divergent badnaviruses,
Piper yellow mottle virus [
53] and
Dioscorea bacilliform virus [
54], suggestive of a possibly conserved, genus-wide function. The presence of the three unique domains in the two Trinidad viral genomes, and their absence in the four CSSD-associated species, make them important targets for future study.
High genomic variability was discovered among the 21 genomes from cacao, and four distinct badnavirus species were identified based on the ICTV-established ≥80% nt identity threshold on the RT-RNase H region (Table
2). Here, the previously unreported CRVV is proposed to constitute a new badnavirus species, and it is partially characterized for the first time. Until recently, all CSSD-badnaviruses were referred to as CSSV. In 2015, CSSV became recognized by the ICTV as the type species, and CSSV, CSSCDV, and CSSTAV were formally designated as distinct badnaviral species [
19]. Based on these results, CRVV is the fourth cacao-infecting badnaviral species known to be endemic to West Africa. Previously, a single causal badnavirus was associated with cacao plants exhibiting different swollen shoot disease symptoms. However, the new evidence presented here shows that numerous badnaviral species and strains, evident from extensive within and between clade genomic variability, are associated with disease symptoms.
The concept of species group has not been frequently applied to plant viruses. However, a closely related ‘group of species’, banana streak virus complex, is recognized within the badnavirus genus that is reminiscent of CSSD-badnaviruses. Both comprise a group of closely related badnaviral species that are more closely related to one another than to other known badnavirus, with apparently restricted host-associations [
55‐
57]. Similarly, the
Sugarcane mosaic virus (
Potyviridae) group contains closely-related, species and strains that are divergent from other known potyviruses [
58]. And, within the genus,
Begomovirus (
Geminiviridae), five or more closely related species and multiple strains, identified thus far, incite leaf curl disease of malvaceous hosts, including the cotton crop, widespread on the Indian Subcontinent [
59,
60].
Phylogenetic comparisons of all available badnaviral genome sequences, including the genome sequences determined in this study, indicated that CSSD-badnaviral genomes are more closely related to each other than to other known badnaviruses, based on the tight affiliation as a ‘sister clade’, in relation to other badnaviruses. This suggests that all CSSD-associated badnavirus species known thus far to cause swollen shoot disease share a common ancestor(s), providing robust support for West African endemism, and for the region as the center of CSSD- badnaviral diversification.
The first outbreak of CSSD was reported in Ghana [
3], followed by Cote d’Ivoire ten years later. Among the available CSSV genome sequences, eleven originate from Cote d’Ivoire and three are from Ghana (Fig.
3b). The presence of CSSV in these neighboring countries is possibly suggestive of phylogeographical distribution e.g. regional CSSV endemism, however, additional sequences are required to clarify the centers of endemism of CSSV as well as the other CSSD species. Irrespective of RT-RNase H region or complete genome comparisons, CSSTAV and CSSCDV, are highly divergent from each other by 24%, and from CSSV and CRVV, by 20–25% and 28–30%, respectively (Table
3). This degree of genomic variability is suggestive of long-standing separation, and reminiscent of phylogeographical and/or host associated co-evolution. The CSSD-badnaviruses are considered endemic viruses to West Africa, and have been documented to infect many endemic tropical tree and shrub genera, including
T. cacao relatives at the family-level, making it likely that additional undiscovered badnaviruses infect wild hosts in West Africa, with potential to undergo host shifting under opportune conditions.
Although genomic sequences are available for a relatively small number of isolates overall, the hypothesis that CRVV may have emerged in cacao as the result of a host jump from its wild reservoir(s), may be supportable based on its low shared nt identity, at ~70–72%, with other CSSD-badnavirus species extant in cacao, and could also explain why it is most closely phylogenetically-related to them. Evidence for CSSD-badnavirus infection of endemic tropical tree species is based nearly entirely on biological studies, e.g. grafting and/or mealybug transmission tests, whose results have not been verified using molecular methods. Based on these studies, a large number of CSSD hosts have been reported, including
Adansonia digitata L.,
Ceiba pentandra L.,
Cola chlamydantha K. Schum.,
Cola gigantean A. Chev., and
Sterculia tragacantha Lindl [
61‐
63], and are among ~90 species in 30 plant families used as shade for cacao and other crops [
64]. Molecular confirmation of suspect CSSD host-infection and accurate badnavirus identification when found, are important first steps to enabling knowledge of CSSD-badnavirus evolution and origin to be reconciled with specific epidemiological factors leading to outbreaks, to inform short-term management approaches and CSSD breeding strategies to enable sustainable production of the crop in the long term.
Historically, CSSD isolates have been characterized as ‘mild’ or ‘severe’, with mild isolates causing mild foliar symptoms that persisted for only several days, such as CSSV-N1A (AJ609020), and the characteristically prevalent, severe isolates, such as CSSV-New Juaben (AJ608931), that cause persistent foliar and shoot symptoms, decline, and tree death tree in 3–5 years [
2,
65]. Although the N1A and New Juaben ‘strains’ are members of the CSSV species (Table
2), and same subclade (Fig.
3a, b), reported differences in pathogenicity are confounding. There is no definitive link between the rapid decline phenotype strains of recent, and strains previously recognized, as ‘severe’. The genetic basis for differences in pathogenicity e.g. virulence, among the CSSD-badnaviruses has not been investigated, however important clues may reside in the diverse genome arrangements and CPD architectures (Fig.
2b), particularly those associated with predicted functions in pathogenicity.
Despite nearly 100 years of CSSD research, there is no definitive understanding of the connections between origins or pathways of disease spread, or of the latter relationships with genomic diversification of cacao-infecting badnaviruses. The ‘unique to West Africa-genome type’ of the proposed CRVV species (Table
2; Fig.
3b), and its only recent association with the atypical, rapid decline and death phenotype observed in cacao trees in western Ghana during 2000, and next in eastern and then western Cote d’Ivoire by 2003, points to a possible CRVV-origin near the border of the two countries, with subsequent spread into plantations recently established in Western Cote d’Ivoire. Because CRVV-like MP sequences were obtained from symptomatic samples collected during surveys carried out in Cote d’Ivoire and Ghana in 2000–2003, [
28] albeit, unknowingly, until this report, CRVV and perhaps other unknown variants may possibly have already spread to other cacao-producing countries, including Nigeria and Togo. These observations underscore the urgent need to identify the causal agent of the ‘rapid decline’ phenotype, and circumvent its further spread in cacao. This goal can only be achieved through the coordinated development and use of reliable molecular diagnostic tools, and well-supported surveillance efforts to track this dynamic badnavirus complex, and carry out epidemiological studies on a regional level in cacao plantations and in nearby suspect, endemic plant hosts of the CSSD-complex.