A new approach to determining whole viral genomic sequences including termini using a single deep sequencing run

https://doi.org/10.1016/j.jviromet.2014.07.023Get rights and content

Highlights

  • We developed a technique to address limitations of sequencing viral genomic termini.

  • The number of terminal sequences captured relative to depth was greatly improved relative to traditional techniques.

  • The method increased coverage of terminal nucleotides for a positive and negative sense RNA virus.

  • The protocol was effective when used on a virus with a highly structured 5′ end.

Abstract

Next-generation sequencing is now commonly used for a variety of applications in virology including virus discovery, investigation of quasispecies, viral evolution, metagenomics, and analyses of antiviral resistance. However, there are limitations with the current sample preparation methods used for deep sequencing of viral genomes, especially during de novo sequencing. For example, current methods are unable to capture the terminal sequences of viral genomes in an efficient and effective manner; data representing the 3′ and 5′ ends are typically insufficient. Methods such as Rapid Amplification of cDNA Ends address this issue but these methods can be time consuming, may require some prior knowledge of the viral sequence, and require multiple independent procedures. The current study outlines a sample preparation technique that overcomes some of these shortcomings. The method relied on random fragmentation with divalent cations and subsequent adapter ligation directly to RNA, rather than cDNA, to maximize the quality and quantity of terminal reads. The technique was tested on RNA samples from two different RNA viruses, Ebola virus and hepatitis C virus. This method permits rapid preparation of samples for deep sequencing while eliminating the use of sequence specific primers and captures the entire genome sequence, including the 5′ and 3′ ends. This could improve the efficiency of virus discovery projects where the terminal ends are unknown.

Introduction

As a result of high throughput, increasing depth, and decreasing costs, the utility of next-generation sequencing (NGS) in the biological sciences continues to expand. Next-generation sequencing is now commonly used for a variety of applications in virology including virus discovery, investigation of quasispecies, viral evolution, metagenomics, and analyses of antiviral resistance (Mardis, 2008, Barzon et al., 2011). However, there are limits to the current methods used for sequencing viral genomes, especially during de novo sequencing (Wang et al., 2009, Alkan et al., 2011). For example, current methods are unable to capture the terminal sequences of viral genomes in an efficient and effective manner. Sequence information is often lacking at the 3′ and 5′ ends (Mortazavi et al., 2008, Nagalakshmi et al., 2008, Wang et al., 2009, Ozsolak and Milos, 2011). Some methods, such as Rapid Amplification of cDNA Ends (RACE), address this issue. However, these methods can be time consuming, may require some prior knowledge of the viral sequence, and require multiple independent procedures. The current study outlines a simple sample preparation technique that overcomes some of these shortcomings. The technique was tested on two RNA viruses, Ebola virus (EBOV) and hepatitis C virus (HCV).

An increasing number of zoonotic pathogens, particularly RNA viruses, are important targets for research because of their potential to emerge into new hosts or regions (Cleaveland et al., 2001, Woolhouse and Gowtage-Sequeria, 2005, Djikeng et al., 2008). For example, EBOV is a highly lethal RNA virus that can cause hemorrhagic fever with up to 90% case fatality rates. No approved vaccines or therapies exist for EBOV infections and the virus must be handled in a biosafety level 4 laboratory. Ebola virus is a filamentous, enveloped virus that has a negative sense single-stranded RNA genome of approximately 19 kb (Kuhn, 2008). The terminal ends of this genome are thought to be fairly unstructured (Sanchez et al., 1993). Hepatitis C virus (HCV) is the major causative agent in non A, non B hepatitis and chronic infections can lead to hepatocellular carcinoma (Plagemann, 1991). Hepatitis C virus has an approximately 9 kb single-stranded positive sense RNA genome and a highly structured 5′ untranslated region (UTR) and also a shorter 3′ UTR that is less structured (Thurner et al., 2004). Sequencing highly structured regions such as the 5′ UTR of HCV can prove problematic when using the current available methods (Devroe and Silver, 2002, Miyagishi et al., 2004).

With the shortfalls of current next-generation sequencing methods in mind, a whole genome sequencing (WGS) sample preparation method was developed. The method provides high depth of coverage over the whole genome including the termini and does not require sequence specific primers. This technique will improve the efficiency of virus discovery projects where the terminal ends are unknown. This technique was applied to negative and positive sense RNA viruses, including a virus with a highly structured region, and observed high depth of coverage and quality reads at the terminal ends of the genome.

Section snippets

Viral RNA isolation

Viruses used in this study included Ebola virus: family Filoviridae, genus Ebolavirus, species ebolavirus, and Hepatitis C virus: family Flaviviridae, genus Hepacivirus, species hepatitis C virus. EBOV nucleic acid was isolated utilizing Trizol LS reagent (Life Technologies, Carlsbad, CA, USA) according to the manufacturer's protocol. HCV RNA, isolated using RNAZol RT (Molecular Research Center, Cincinnati, OH, USA) according to the manufacturer's protocol, was graciously provided by Dr. Robert

Development and optimization of technique

RNA libraries were prepared previously using Illumina's TruSeq Total RNA sample preparation kit, according to the manufacturer's instructions. The initial RNA purification and fragmentation step was performed following an optional manufacturer suggested modification that omits the polyA RNA purification step because the selection for polyA RNA was not desired. Overall quality and average depth of coverage were satisfactory. However, the number of sequence reads from the 5′ and 3′ ends of the

Discussion

The use of NGS during research on emerging pathogens, such as RNA viruses, is becoming increasingly important and beneficial (Djikeng et al., 2008, Mardis, 2008, Barzon et al., 2011). Next-generation sequencing significantly increases the efficiency and quality of nucleic acid sequencing but it is often unable to capture the terminal sequences of viral genomes effectively. When using traditional sample preparation techniques, depth of coverage obtained at the 5′ and 3′ ends is often greatly

Funding

This study was funded internally.

Acknowledgment

The authors would like to acknowledge Dr. Robert Lanford (Texas Biomedical Research Institute) for kindly providing the hepatitis C virus RNA used in the study.

References (16)

  • A. Sanchez et al.

    Sequence-analysis of the Ebola virus genome – organization, genetic elements, and comparison with the genome of Marburg virus

    Virus Res.

    (1993)
  • C. Alkan et al.

    Limitations of next-generation genome sequence assembly

    Nat. Methods

    (2011)
  • L. Barzon et al.

    Applications of next-generation sequencing technologies to diagnostic virology

    Int. J. Mol. Sci.

    (2011)
  • S. Cleaveland et al.

    Diseases of humans and their domestic mammals: pathogen characteristics, host range and the risk of emergence

    Philos. Trans. R. Soc. Lond. B: Biol. Sci.

    (2001)
  • E. Devroe et al.

    Retrovirus-delivered siRNA

    BMC Biotechnol.

    (2002)
  • A. Djikeng et al.

    Viral genome sequencing by random priming methods

    BMC Genomics

    (2008)
  • J.H. Kuhn

    Filoviruses. A compendium of 40 years of epidemiological, clinical, and laboratory studies

    Arch. Virol. Suppl.

    (2008)
  • E.R. Mardis

    Next-generation DNA sequencing methods

    Annu. Rev. Genomics Hum. Genet.

    (2008)
There are more references available in the full text version of this article.

Cited by (0)

View full text