Background
The
Orbivirus genus is one of ≥12 genera within the family
Reoviridae. The
Reoviridae have segmented linear dsRNA genomes. There are 9–12 segments [
1] and these are usually, but not always, monocistronic. Subgenomic RNAs are unknown.
Orbivirus genomes have 10 segments. Many species infect ruminants while some infect humans. Transmission is via arthropods – including midges, ticks and mosquitoes. The type species is Bluetongue virus (BTV) which causes severe and sometimes fatal disease, particularly in sheep. BTV is endemic in many tropical countries, but there have also been recent outbreaks in Europe [
2,
3]. Another species is African horse sickness virus (AHSV) which is a fatal disease of horses. AHSV is endemic in many parts of sub-Saharan Africa, but has made incursions into Europe [
4]. Recent outbreaks of BTV in Europe may be a consequence of climate change – allowing the midge vectors to expand their range [
5].
The
Orbivirus proteins, structure, assembly and replication have been reviewed in [
6‐
8]. The BTV core is composed of two major proteins (VP3 and VP7). Transcription complexes – composed of three minor proteins (VP1 – polymerase, VP4 – capping enzyme, and VP6 – helicase) are located inside the core. Transcription occurs within the intact core and full-length capped mRNAs from each of the genome segments are fed out into the cytoplasm for translation. An outer capsid (VP2 and VP5) surrounds the core, but is removed during cell entry. There are four non-structural proteins – NS1, NS2 and NS3/3A. VP6 is a hydrophilic, basic protein that binds dsRNA and other nucleic acids and functions as the viral helicase [
9‐
13]. In some, but not all, BTV serotypes, VP6 migrates as a closely-spaced doublet [
14]. This is apparently due to the fact that in these serotypes the first VP6 AUG codon has weak Kozak context while a second in-frame AUG codon has medium context.
The genomes of RNA viruses are under strong selective pressure to compress maximal coding and regulatory information into minimal sequence space. Thus overlapping CDSs are particularly common in such viruses. Such CDSs can be difficult to detect using conventional gene-finding software [
15], especially when short. The software package MLOGD, however, was designed specifically for locating short overlapping CDSs in sequence alignments and overcomes many of the difficulties with alternative methods [
15,
16]. MLOGD includes explicit models for sequence evolution in double-coding regions as well as models for single-coding and non-coding regions. It can be used to predict whether query ORFs are likely to be coding, via a likelihood ratio test, where the null model comprises any known CDSs and the alternative model comprises the known CDSs plus the query ORF. MLOGD has been tested extensively using thousands of known virus CDSs as a test set, and it has been shown that, for overlapping CDSs, a total of just 20 independent base variations are sufficient to detect a new CDS with ~90% confidence.
Using MLOGD, we recently identified – and subsequently experimentally verified – a new short CDS in the
Potyviridae that overlaps the polyprotein cistron but is translated in the +2 reading frame [
17]. When we applied MLOGD to the
Orbivirus genome we also found evidence for a short CDS overlapping the VP6 cistron. Here we describe the bioinformatic analysis.
Discussion
Due to the segmented nature of their genomes, the
Reoviridae may escape a fundamental problem that many other eukaryotic viruses face – how to circumvent the host cell's general rule of 'one functional protein per mRNA'. Nonetheless, of the 352
Reoviridae RefSeqs in GenBank (10 Mar 2008; 33 species × 9–12 segments per species), ~5% are multicistronic. Among these are a few examples of fully overlapping genes apparently translated via leaky scanning, for example in
Phytoreovirus segment S12 or S9 [
21] and mammalian
Orthoreovirus segment S1 [
22,
23].
For optimal leaky scanning [
24], one would expect the VP6 CDS to initiate at AUG1 with weak context and ORFX to initiate at AUG2 with strong context. This indeed is the situation in the AHSV and PALV RefSeqs. Although there are two upstream VP6-frame AUG codons in many BTV serotypes, leaky scanning still appears fairly straightforward in this virus as a translational mechanism for ORFX (though potentially at a much lower abundance than VP6). In the YUOV and PHSV RefSeqs, leaky scanning may be possible, but requires scanning through or translation and reinitiation of two upstream short ORFs. It is interesting, and possibly relevant, that in another
Reoviridae species – Avian reovirus – a novel, as yet not fully understood, scanning-independent ribosome migration mechanism is used to bypass two upstream CDSs in order to translate the 3'-proximal CDS on the tricistronic S1 mRNA [
25,
26].
IRESs have not been reported in the
Reoviridae and, at this genomic location, use of an IRES would seem unlikely. However, it has been shown that a variety of poly-purine A-rich sequences – such as (GAAA)
16 – can serve as efficient IRESs without the requirement for a complex RNA secondary structure such as in the
Picornaviridae IRESs [
27], so it is interesting to note that there is an A-rich poly-purine tract just upstream of ORFX in all species except SCRV (Figure
4). In the BTV RefSeq, for example, the 68 nt immediately preceding ORFX comprise 32 A, 7 C, 25 G and 4 U nucleotides. In fact the entire sequences (except SCRV) are A- or AG-rich (Table
3). Nonetheless the region just upstream of ORFX is a peak in A-richness (Figure
4). Admittedly, this could be due to many other reasons (e.g. just amino acid coding constraints in VP6) and there is no strong reason to suspect an IRES here.
Table 3
Nucleotide frequencies for segment 9. Mean nucleotide frequencies for the six Orbivirus segment 9 RefSeqs in GenBank.
NC_006008 | BTV | 32 | 16 | 33 | 19 |
NC_006019 | AHSV | 32 | 16 | 32 | 20 |
NC_005992 | PALV | 36 | 16 | 26 | 23 |
NC_007753 | PHSV | 41 | 13 | 24 | 22 |
NC_007664 | YUOV | 36 | 18 | 25 | 20 |
NC_006005 | SCRV | 25 | 27 | 24 | 25 |
SCRV lacks a long ORF in the correct reading frame and location for an ORFX homologue. The number (six) and contexts (3 are strong) of upstream AUG codons make conventional leaky scanning to 'ORFXa' (38 codons; Figure
5) extremely unlikely. It is quite possible, therefore, that no ORFX homologue is present in SCRV. This is not too surprising – SCRV segment 9 is the most divergent, and the shortest, of the six RefSeqs (Figure
5) [
28]. SCRV is also the only species of the six which is tick-borne instead of insect-borne (BTV, AHSV and PALV are transmitted by midges; YUOV by mosquitoes).
At ~9.5 kDa, the putative ORFX product in BTV is too small to appear on most published protein gels. Nonetheless there are unidentified low molecular mass bands in a number of reported gels [
29‐
32], often running near the dye front, that
may represent ORFX product. Furthermore, ref. [
33] (
in vitro translation of the individual segments) noted, with reference to excluded data, that segment 9 may encode a low molecular weight protein in addition to VP6.
The ORFX product is largest in AHSV (~17 kDa in [GenBank:NC_006019] and ~20 kDa in [GenBank:U19881]). Ref. [
34] (
in vitro translation of the individual AHSV segments, and comparison with proteins extracted from infected cell lysate) clearly identified an additional non-structural protein translated from segment 9 – termed 'NS3' – migrating ~1.5 kDa behind the 'NS4/4A' proteins (equivalent to NS3/3A in our notation) translated from segment 10. 'NS3' is a good candidate for ORFX product migrating a little slower than expected, possibly as a result of post-translational modification. The protein labelled 'VP6' in ref. [
34] appears to be a truncated version of VP5 (translated from the same segment as VP5, and both were shown to have similar partial protease digestion products). Interestingly the VP6 protein (our notation) is not visible as a product of segment 9 translation in Fig. 6 of ref. [
34], but may be visible in Fig. 7 of ref. [
34] (migrating next to NS2), unless this is cross-contamination. An additional segment 9 product (~20 kDa), migrating ahead of 'NS4/4A', is also visible (albeit fainter) in Fig. 7 of ref. [
34]. If the 'NS3' band is post-translationally modified ORFX product, then this band could be unmodified ORFX product.
Ref. [
35] also identified a number of low molecular mass proteins in AHSV-infected cells – in particular P23, P20 and P21. Ref. [
35] equated two of these (P20 and P21) to the segment 10 products NS3/3A (~24/~22 kDa in AHSV). The third protein may be ORFX product.
In addition to its small size, the fact that ORFX product has not been widely reported suggests that it may be present only in low abundance and/or only expressed at certain stages (e.g. only in the insect vector) or cellular locations.
Conclusion
We have identified a conserved ORF (ORFX) overlapping the Orbivirus VP6 CDS in the +1 reading frame. ORFX ranges from 77–169 codons in length, depending on species, and is present in all Orbivirus segment 9 sequences analysed except for the highly divergent species SCRV. The software package MLOGD – designed specifically for identifying and analysing overlapping CDSs – finds a strong coding signature for ORFX when applied to BTV, AHSV, PALV and PHSV/YUOV sequence alignments. The location and Kozak context of the VP6 and ORFX initiation codons is generally consistent with a leaky scanning model for ORFX translation. ORFX product bears no homology to known proteins.
We hope that presentation of this bioinformatic analysis will stimulate an attempt to experimentally verify the expression and functional role of ORFX product. Initial verification could be by means of immunoblotting with ORFX-specific antibodies or gel purification of ORFX product from virus-infected cell protein extracts, followed by mass spectrometry.
Methods
In GenBank, there are whole-genome RefSeqs for six
Orbivirus species: Bluetongue virus (BTV), African horse sickness virus (AHSV), Peruvian horse sickness virus (PHSV), Yunnan orbivirus (YUOV), Palyam virus (PALV) and Saint Croix river virus (SCRV). All six genomes comprise 10 segments. The segments homologous to BTV segment 9 (encoding VP6) were identified by finding the best blastp-match, among the 10 BTV translated segments, for the longest ORF in each of the 50 non-BTV segments. The identifications were verified, where possible, by information in the GenBank-file headers and in the literature (AHSV [
36]; YUOV [
37]; PALV [
38]; SCRV [
28]).
As of 11 May 2007, there were 1273 Orbivirus sequences in GenBank (i.e. including partial sequences), however most of these are not segment 9. Incidently, none of these sequences has more than one CDS annotated. Segment 9 sequences were extracted (a) using the GenBank-file DEFINITION headers, and (b) by finding the best blastp-match for the longest ORF in each sequence among the 10 BTV translated segments. These were supplemented with all GenBank (16 Mar 2008) tblastn matches to the ORFX peptide sequences from the six RefSeqs (providing one additional recent sequence). After removing duplicate sequences, the following segment 9 sequences were found: (1) the 6 RefSeqs for BTV, AHSV, PHSV, YUOV, PALV and SCRV (all complete); (2) 47 other BTV sequences (mostly complete VP6 CDS; all cover ORFX completely; ~34 contain the full 5' UTR); (3) 2 other AHSV sequences (full genome); and (4) 10 PALV partial sequences (183 nt, completely contained in the ORFX region).
The GenBank accession numbers are as follows: BTV – NC_006008, A22393, AF403418, AF403419, AF403420, AF403421, AF403423, AY124373, AY493691, D10905, DQ289041, DQ289042, DQ289043, DQ289044, DQ289045, DQ289046, DQ289047, DQ289048, DQ289050, DQ825668, DQ825669, DQ825671, DQ832170, L08668, L08669, L08670, L08671, L08672, U55778, U55779, U55780, U55781, U55782, U55784, U55785, U55786, U55787, U55788, U55790, U55792, U55793, U55794, U55795, U55796, U55797, U55799, U55800, U55801; AHSV – NC_006019, U19881, AM883170; PHSV – NC_007753; YUOV – NC_007664; PALV – NC_005992, AB034675, AB034676, AB034677, AB034678, AB034679, AB034680, AB034681, AB034682, AB034683, AB034684; SCRV – NC_006005.
Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions
AEF carried out the bioinformatics analyses and wrote the manuscript.