A new approach to determining whole viral genomic sequences including termini using a single deep sequencing run
Introduction
As a result of high throughput, increasing depth, and decreasing costs, the utility of next-generation sequencing (NGS) in the biological sciences continues to expand. Next-generation sequencing is now commonly used for a variety of applications in virology including virus discovery, investigation of quasispecies, viral evolution, metagenomics, and analyses of antiviral resistance (Mardis, 2008, Barzon et al., 2011). However, there are limits to the current methods used for sequencing viral genomes, especially during de novo sequencing (Wang et al., 2009, Alkan et al., 2011). For example, current methods are unable to capture the terminal sequences of viral genomes in an efficient and effective manner. Sequence information is often lacking at the 3′ and 5′ ends (Mortazavi et al., 2008, Nagalakshmi et al., 2008, Wang et al., 2009, Ozsolak and Milos, 2011). Some methods, such as Rapid Amplification of cDNA Ends (RACE), address this issue. However, these methods can be time consuming, may require some prior knowledge of the viral sequence, and require multiple independent procedures. The current study outlines a simple sample preparation technique that overcomes some of these shortcomings. The technique was tested on two RNA viruses, Ebola virus (EBOV) and hepatitis C virus (HCV).
An increasing number of zoonotic pathogens, particularly RNA viruses, are important targets for research because of their potential to emerge into new hosts or regions (Cleaveland et al., 2001, Woolhouse and Gowtage-Sequeria, 2005, Djikeng et al., 2008). For example, EBOV is a highly lethal RNA virus that can cause hemorrhagic fever with up to 90% case fatality rates. No approved vaccines or therapies exist for EBOV infections and the virus must be handled in a biosafety level 4 laboratory. Ebola virus is a filamentous, enveloped virus that has a negative sense single-stranded RNA genome of approximately 19 kb (Kuhn, 2008). The terminal ends of this genome are thought to be fairly unstructured (Sanchez et al., 1993). Hepatitis C virus (HCV) is the major causative agent in non A, non B hepatitis and chronic infections can lead to hepatocellular carcinoma (Plagemann, 1991). Hepatitis C virus has an approximately 9 kb single-stranded positive sense RNA genome and a highly structured 5′ untranslated region (UTR) and also a shorter 3′ UTR that is less structured (Thurner et al., 2004). Sequencing highly structured regions such as the 5′ UTR of HCV can prove problematic when using the current available methods (Devroe and Silver, 2002, Miyagishi et al., 2004).
With the shortfalls of current next-generation sequencing methods in mind, a whole genome sequencing (WGS) sample preparation method was developed. The method provides high depth of coverage over the whole genome including the termini and does not require sequence specific primers. This technique will improve the efficiency of virus discovery projects where the terminal ends are unknown. This technique was applied to negative and positive sense RNA viruses, including a virus with a highly structured region, and observed high depth of coverage and quality reads at the terminal ends of the genome.
Section snippets
Viral RNA isolation
Viruses used in this study included Ebola virus: family Filoviridae, genus Ebolavirus, species ebolavirus, and Hepatitis C virus: family Flaviviridae, genus Hepacivirus, species hepatitis C virus. EBOV nucleic acid was isolated utilizing Trizol LS reagent (Life Technologies, Carlsbad, CA, USA) according to the manufacturer's protocol. HCV RNA, isolated using RNAZol RT (Molecular Research Center, Cincinnati, OH, USA) according to the manufacturer's protocol, was graciously provided by Dr. Robert
Development and optimization of technique
RNA libraries were prepared previously using Illumina's TruSeq Total RNA sample preparation kit, according to the manufacturer's instructions. The initial RNA purification and fragmentation step was performed following an optional manufacturer suggested modification that omits the polyA RNA purification step because the selection for polyA RNA was not desired. Overall quality and average depth of coverage were satisfactory. However, the number of sequence reads from the 5′ and 3′ ends of the
Discussion
The use of NGS during research on emerging pathogens, such as RNA viruses, is becoming increasingly important and beneficial (Djikeng et al., 2008, Mardis, 2008, Barzon et al., 2011). Next-generation sequencing significantly increases the efficiency and quality of nucleic acid sequencing but it is often unable to capture the terminal sequences of viral genomes effectively. When using traditional sample preparation techniques, depth of coverage obtained at the 5′ and 3′ ends is often greatly
Funding
This study was funded internally.
Acknowledgment
The authors would like to acknowledge Dr. Robert Lanford (Texas Biomedical Research Institute) for kindly providing the hepatitis C virus RNA used in the study.
References (16)
- et al.
Sequence-analysis of the Ebola virus genome – organization, genetic elements, and comparison with the genome of Marburg virus
Virus Res.
(1993) - et al.
Limitations of next-generation genome sequence assembly
Nat. Methods
(2011) - et al.
Applications of next-generation sequencing technologies to diagnostic virology
Int. J. Mol. Sci.
(2011) - et al.
Diseases of humans and their domestic mammals: pathogen characteristics, host range and the risk of emergence
Philos. Trans. R. Soc. Lond. B: Biol. Sci.
(2001) - et al.
Retrovirus-delivered siRNA
BMC Biotechnol.
(2002) - et al.
Viral genome sequencing by random priming methods
BMC Genomics
(2008) Filoviruses. A compendium of 40 years of epidemiological, clinical, and laboratory studies
Arch. Virol. Suppl.
(2008)Next-generation DNA sequencing methods
Annu. Rev. Genomics Hum. Genet.
(2008)