A general computational model for predicting ribosomal frameshifts in genome sequences

https://doi.org/10.1016/j.compbiomed.2007.06.001Get rights and content

Abstract

Programmed ribosomal frameshifts are frequently used by RNA viruses to synthesize a single fusion protein from two or more overlapping open reading frames. We previously developed a program called FSFinder for predicting -1 and +1 frameshifts for Windows systems. With our new web application and web service called FSFinder2, users can predict frameshifts of any type from any web browser and operating system. We tested it on Shewanella oneidensis MR-1, for which exact frameshift sites are not known. FSFinder2 is the first program capable of finding frameshifts of general type and it is a powerful tool for predicting and analyzing genes that utilize frameshifts for their expression. FSFinder2 is available at http://wilab.inha.ac.kr/FSFinder.

Introduction

During translation, two kinds of error can occur: missense errors and processivity errors. In missense errors, an aminoacyl-tRNA synthetase mischarges a tRNA with a wrong amino acid due to substitution of one of the three bases. Processivity errors, including frameshift errors, occur when a sense codon is recognized incorrectly by a release factor [1]. The frequency of programmed translational frameshifts has been estimated to be more than 10,000-fold lower than missense errors [1], [2], [3]. However, in some cases processivity errors are more sequence-dependent than in others [1]. This is mainly because of the existence of stimulatory sequences and structures in the mRNA [4], [5], [6]. On frameshift sites the ribosome shifts to an alternative reading frame by one or more nucleotides [7]. Frameshifts are classified into different types depending on the number of nucleotides shifted and the shifting direction. The most common type is a -1 frameshift, in which the ribosome slips a single nucleotide in the upstream direction. The -1 frameshifting requires a frameshift cassette that consists of a slippery site, a stimulatory RNA structure and a spacer. The slippery site generally consists of a heptameric tandem sequence of the form X XXY YYZ [6], but there are other slippery sequences. For example, Escherichia coli dnaX mRNA uses a Shine–Dalgarno (SD) sequence upstream of its frameshift sequence A AAA AAG, as well as a stimulatory RNA secondary structure such as a pseudoknot or stem-loop. A 5–9 nucleotide sequence called a spacer is located between the slippery sequence and the stimulatory RNA secondary structure.

The +1 frameshifts are much less common than -1 frameshifts, but have been observed in diverse organisms [5]. The prfB gene encoding release factor 2 (RF2) of E. coli is a well-known example of genes caused by +1 frameshifting [8], [9]. In this gene a SD sequence is observed upstream of the slippery sequence CUU URA C, where R is either adenine or guanine. The ornithine decarboxylase antizyme (oaz) gene encoding antizyme 1 is caused by +1 frameshifting, and its frameshift signal consists of a slippery sequence and a downstream RNA secondary structure [10].

Most frameshift elements are located near the end of an open reading frame. Since these elements are located between two overlapping genes, focusing on the overlapping region can be very efficient. However, Barry et al. [11] consider that frameshifting on Barley yellow dwarf virus RNA requires a viral sequence located four kilobases downstream.

No program exists to predict general types of frameshift. In addition, existing computational models predict too many false positives. In previous work we developed a program called FSFinder (frameshift signal finder) for predicting -1 and +1 frameshift sites [12]. Trials of FSFinder on 190 genomic and partial DNA sequences showed that it predicted frameshift sites efficiently and with greater sensitivity and specificity than other programs because it focused on the overlapping regions of open reading frames and prioritized candidate signals (for -1 frameshifts, sensitivity was 0.88 and specificity 0.97; for +1 frameshifts, sensitivity was 0.91 and specificity 0.94) [13], [14], [15].

FSFinder is written in Microsoft C# and is executable on Windows systems only. To remove this limitation and to handle frameshifts of general type, we developed a new web-based application program called FSFinder2. Users can predict frameshifts of any type online from any web browser and operating system. We tested FSFinder2 on the facultative gram-negative bacterium, Shewanella oneidensis MR-1 to search for six patterns of frameshifts defined by our collaborators. We believe this is the first program capable of predicting frameshift signals of general type.

Section snippets

Basic models of frameshifts

Three types of frameshifts are considered as basic frameshifts, and their models are predefined: the most common -1 frameshifts, +1 frameshifts of the RF2 type, and +1 frameshift of the type found in the ODC antizyme. The models for these frameshifts have four components: an SD sequence, frameshift site, spacer and downstream secondary structure (Fig. 1). FSFinder2 extends the models used in FSFinder to incorporate user-defined models. For the upstream SD sequence, FSFinder2 considers AGGA,

Implementation

FSFinder2 was implemented using XML, XSLT, ASP and JavaScript. If the user sends a query to the server after defining a new model, the computation is performed on the server side and FSFinder2 can read a file of DNA or mRNA sequence either in the FASTA or GenBank format. After loading the input sequence file, the user can choose an appropriate parameter to optimize the results. It is separated by three parts according to the type of target gene, length of sequence and directions. Fig. 2 shows

Comparison of the prediction results with the experimental data in GenBank

FSFinder2 was tested with the complete genome sequence of S. oneidensis MR-1 downloaded as a GenBank format file from the NCBI database (accession number nc_004347). S. oneidensis MR-1 is a gram negative bacterium and the length of its sequence is 4,969,803 bp. It is an important model organism for bioremediation studies because it has diverse respiratory capabilities and electron transport systems [16], [17].

In the GenBank descriptions of S. oneidensis MR-1, there are a total of 103 annotated

Conclusions

Understanding programmed ribosomal frameshifts is important because they are related to biological phenomena such as fidelity of mRNA–tRNA binding, and some genetic controls and enzyme activities. They are also involved in the expression of certain genes in a wide range of organisms. However, identifying programmed frameshifts is very difficult due to the diverse nature of the frameshift events. Existing computational approaches focus on a certain type of frameshift only and cannot handle

Acknowledgment

This work was supported by the Korea Science and Engineering Foundation (KOSEF) under Grant R01-2003-000-10461-0.

Yanga Byun received the B.S. degree in automation engineering from Inha University in 2003. She is currently pursuing a Ph.D. degree in computer science and engineering at Inha University. For the past several years, she has developed several algorithms and programs for visualizing and analyzing RNA pseudoknot structures and ribosomal frameshifting. Her research interests include data visualization and data mining.

References (17)

  • P.V. Baranov et al.

    Recoding: translational bifurcations in gene expression

    Gene

    (2002)
  • O.L. Gurvich et al.

    Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli

    EMBO J.

    (2003)
  • C.G. Kurland et al.

    Limitations of translational accuracy

  • N. Mejlhede et al.

    Ribosomal -1 frameshifting during decoding of Bacillus subtilis cdd occurs at the sequence CGA AAG

    J. Bacteriol.

    (1999)
  • S.L. Alam et al.

    Programmed ribosomal frameshifting: much ado about knotting!

    Proc. Natl. Acad. Sci.

    (1999)
  • P.J. Farabaugh

    Programmed translational frameshifting

    Annu. Rev. Genet.

    (1996)
  • T. Jacks et al.

    Expression of the Rous sarcoma virus pol gene by ribosomal frameshifting

    Science

    (1985)
  • P.V. Baranov et al.

    Release factor 2 frameshifting sites in different bacteria

    EMBO Rep.

    (2002)
There are more references available in the full text version of this article.

Yanga Byun received the B.S. degree in automation engineering from Inha University in 2003. She is currently pursuing a Ph.D. degree in computer science and engineering at Inha University. For the past several years, she has developed several algorithms and programs for visualizing and analyzing RNA pseudoknot structures and ribosomal frameshifting. Her research interests include data visualization and data mining.

Sanghoon Moon received the B.S. degree in genetic engineering from Chungju University in 1998 and the M.S. degree in computer engineering from Hallym University in 2003. He is currently pursuing a Ph.D. degree in computer science and engineering at Inha University. For the past several years, he has developed several algorithms for analyzing and predicting ribosomal frameshifting. His research interests include systems biology and molecular sequence analysis.

Kyungsook Han received the B.S. degree cum laude from Seoul National University in 1983, the M.S. degree cum laude in computer science from KAIST in 1985, another M.S. degree in computer science from the University of Minnesota at Minneapolis, USA in 1989, and the Ph.D. degree in computer science from Rutgers—The State University of New Jersey at New Brunswick, USA in 1994. She is a professor of computer science and engineering at Inha University, where she teaches and conducts research in the area of bioinformatics and visualization. Her current research interests include protein interactions and visualization of biological data, particularly visualization of protein interaction networks and RNA structures.

View full text