Background
Viruses are ubiquitous and affects plant growth and yield. Over the years, crop losses due to viral infection have been very devastating and are of great concern especially in developed and developing countries [
1]. The unrestricted distribution of plant diseases and emerging infectious disease poses serious threat to food sustenance. A high percentage of this loss is caused by viruses because of their abundant presence in most environments among the biological entities [
2]. Early identification of these plant pathogens remains the focal point in the field of virology, aimed at preventing the spread of the viruses as well as developing ways of combating and reducing their effects on agricultural yield.
RNA viruses exists within host as a consortium of un-identical but similar sequences due to their inability to propagate without a living host, high mutation and recombination rates, referred to as viral quasispecies [
3]. RNA silencing is one of the defense mechanisms of plant against viruses, in which the double stranded RNA (dsRNA) serves as a substrates for Dicer-like ribonuclease (DCL) to produce small-interfering RNAs (siRNAs) of between 21 to 25 nt [
4]. Viral infection of plants involves the production of viral small RNAs (vsRNAs) and the plant host interacts with invading viruses by developing various cellular mechanisms. Viruses are both inducers and targets of RNAi [
5]. Double stranded RNA intermediates are produced by viral genomes during replication which serves as substrates for Dicer-like ribonucleases and cleaves into small virus derived siRNAs (viRNAs), and binds with the Argonaut protein (Ago) to form the RISC complex [
6].
The advent of new sophisticated technologies for parallel sequencing had increased our understanding of viral genome variability and evolution within the host and virus defense mechanism in plants. It is widely accepted that studies of viral abundance and diversity will lead and have led to novel insights into the functioning of the microbial biosphere. The relative abundance of a virus (or viral nucleic acid) in a sample, compared to that of other organisms such as bacteria or host cells (or their genomes), is a critical factor for the discovery of viruses when using metagenomics. Unlike traditional virus detection methods e.g., enzyme-linked immunosorbent assay (ELISA), polymerase chain reaction (PCR), or microarray which depends on prior knowledge of antibody or sequence of the potential virus [
7] as well as determining the existence of novel viral agents [
8,
9], the use of next generation sequencing technology (NGS) provides a powerful method for determining the causative pathogen without the prior knowledge of the disease pathogen. The genome of plant viruses can be rapidly determined even when occurring at extremely low titers in the infected host. The detection of both DNA and RNA viruses [
10] has been made possible by the reconstruction of partial or complete viral genomes [
11] and sequencing of the accumulated 21–24 nt virus-derived siRNAs generated by Dicer enzymes upon recognition of viral dsRNA. With the development of NGS technology and its relatively low cost, NGS has widen understanding and its potentials of diagnostics of viral pathogens without a priori knowledge of the invading pathogen, which provide accurate and timely detection of these viral pathogens in plant for effective disease management and control. Consequently, it is necessary to conduct an accurate and timely detection of these viral pathogens in plant for effective disease management and control.
Tobacco (
Nicotiana tabacum) is an important economic crop worldwide, with half of the world’s tobacco farmers in China and the world’s largest producer [
12], ahead of countries such as India, Zimbabwe, Indonesia, Turkey, Bangladesh, Egypt, Philippine and Thailand. The production and yield of tobacco have been seriously affected by the invasion of emerging and recurrent plant viruses with symptoms such as venial necrosis, mosaic, mottling, yellowing, ring spots, stunting, shoestring and deformation [
13‐
16]. Anhui province is in the center of China and surrounded by six other provinces. The typical geographical feature of this area is enriched with geochemical elements suitable for the flourishing of Tobacco plant [
17]. The tobacco plantation is usually surrounded or mixed with other crops, as mixed system of farming is commonly adopted in these areas. These however, had enhanced the transmission of plant virus from one plant to the other and consequently made Anhui province an idealistic open ecosystem to investigate the viruses infecting the crop in the province. In order to identify the etiological agents of the different disease symptoms observed in different Tobacco plantation across Anhui province, we used next generation sequencing of small RNAs to identify viruses from symptomatic Tobacco plants in farm fields. We also present the results of genome comparison between the resulting 22 isolates and genomes retrieved from GenBank. This study provides census of viral population and distribution in different ecosystem or cropping system through the characterization, discovery and molecular interaction of plant viruses. We also described the recombination events that occurred in the isolates and a bioinformatics pipeline that explores the siRNA generated in response to viral invasion and other molecular biology methods employed to discover a consortium of viruses infecting tobacco.
Discussion
In spite of the different plant disease control mechanisms, plant virus still cause significant economic losses in tobacco production every year in China [
14]. The effectiveness of disease control strategies can be affected by the genetic exchange and changes in composition of virus population [
35]. Therefore, prompt identification of invading plant virus, elucidation of the molecular determinants and genetic diversities involved in pathogenesis is important to better understand plant–pathogen isolates. In this study, our goal was to survey tobacco plant in Anhui province of China for virus infection that caused devastating infection in the fields and also capture the genetic diversity and molecular variability of the different isolates identified across the province as well as determine the effectiveness of the application of next generation sequencing technology coupled with molecular techniques in discovery of plant viruses, without the prior knowledge of the virus.
We describe a bioinformatics pipeline to efficiently identify viruses in a mixed infection of tobacco and to differentiate different strains infecting the plant across the province. The bioinformatics method is based on the deep sequencing and
de novo assembly of siRNAs. The assembled contigs generated from nine sRNA libraries were analyzed for the identification of the viruses associated with Tobacco. We determined the genome sequences of 22 isolates of plant viruses infecting tobacco collected from various regions across Anhui Province of China and validated through RT-PCR and Sanger sequencing. These identified viruses consist of 7 isolates of Cucumber Mosaic Virus, 5 isolates of Potato Virus Y, 3 isolates of Tobacco Mosaic Virus, 3 isolates of Tobacco Vein Banding Mosaic Virus, 1 isolate each of Pepper Mottle Virus, Brassica Yellow Virus, Chilli venial mottle virus, Broad Bean Wilt Virus 2 infecting Tobacco, a crop plant of paramount economic value. There are more isolates of CMV and PVY infecting Tobacco, compared to other identified isolates. This can be attributed to the diversity of the isolates [
36] which has been reported in other parts of the world to be a serious concern to crop production.
Sequencing of the libraries of small RNA isolated from infected leaves shed more light on the consortium of replicating virus in the plant sample and proved decisive for the identification of the novel isolates [
7,
9].
De novo assembly of the siRNA and BLAST search of assembled contigs to the non-redundant nucleotide and protein database identified virus sequence with more than 90 % similarity. To elucidate the molecular and genetic diversity of the isolates in Anhui province of China, we analyzed the isolates and sequences of previously reported recombinant and non-recombinant isolates [
36‐
38] retrieved from the GenBank. Some of the isolates had experienced various recombination events which are similar to other strains from other parts of the world. They were clustered in the same subgroup with other strains of viruses prevalent in other parts of the world [
36,
38,
39]. The CMV isolates identified in this study showed that subgroup I is of greater prevalence than subgroup II in China. The detected several subgroup IB isolates among historic CMV isolates and phylogenetic analysis further revealed presence of this specific subgroup in other parts of the world [
31]. PVY is also considered as one of the most dangerous plant virus with different strains causing about 80 % of plant losses [
23] which are dependent on infecting strains, time of infection and co-infecting species. The recombination events in plants plays a critical role in the virulence of plant viruses by generating genetic variation and producing new viruses [
40]. The designation of PVY strain groups is based on the biological differences of the PVY strains to overcome resistance genes in tobacco and also allow the invasion of other plant viruses by suppressing the immune response of the plant at different strain groups.
Viral evolution and host adaptation are best understood by examining the role of recombination in generating and eliminating variation in viral sequences. RNA viral replicates, apparently lack proof-reading ability and as a consequence, the frequency of mutations is much higher than in organisms with a DNA genome [
41]. The recombination events in some of the isolates are as a result of mutation and genetic reassortment which has been previously reported in other isolates [
36,
42,
43].
The coverage, dispersal and complexity of virus population detected in this study, calls for a need for a constant survey of not only symptomatic crops but also other crops used in mixed cropping, and proper monitoring of disease spread and efficient management. Also a fast and efficient detection method, as the Next generation sequencing that do not need a prior knowledge of the virus should be employed to identify viruses. Deep sequencing, bioinformatics and phylogenetic analysis, as well as comparison of the different virus species identified in Tobacco presents an important revelation of molecular variability of viruses causing devastating effects on the crop. Furthermore the proliferation of new genetic types signals a high risk for crops that must be addressed with efficient viral control and diagnostic methods.
Conclusion
In this study we describe the discovery of a consortium of plant viruses infecting Tobacco that are broadly distributed in Anhui province of China. We further characterized the genome of the 22 isolates, its variability and the siRNAs induced in tobacco plant in response to virus infection. Our result showed the effectiveness of the custom made bioinformatics pipeline coupled with molecular techniques and phylogenetic analysis, in diagnostics and identification of plant virus. Survey of plant viruses and prompt diagnostics should be frequently carried out in areas known for large cultivation of economically important crops.
Acknowledgements
This study was supported by grants from the Chinese National Natural Science Foundation (no. 31272011), Key Program of Anhui Province Tobacco Monopoly Administration (no. 20150551007) and Natural Science Foundation of Anhui Province NO.1608085QC59. We will like to thank the support of Chinese Academy of Science - The World Academy of Science (CAS-TWAS) PhD President’s Fellowship to Ibukun A. Akinyemi. The funders played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.