A screen for conserved sequences with biased base composition identifies noncoding RNAs in the A–T rich genome of Plasmodium falciparum

https://doi.org/10.1016/j.molbiopara.2005.08.012Get rights and content

Abstract

Noncoding RNAs (ncRNAs) such as snRNAs, snoRNAs and microRNAs play important roles in transcription and translation control. These ncRNAs have yet to be discovered in the malarial parasite Plasmodium falciparum, an organism in which these basic biological processes are poorly understood. Inspired by a report by Klein et al., we initiated a bioinformatics screen to uncover several candidate ncRNAs from the parasite genome using two simple criteria: first, elevated GC content in the highly A–T rich intergenic regions of the P. falciparum genome and second, conservation of sequence homology between malaria parasite species. We show that all the annotated tRNAs can be successfully identified in our screen as well as several new candidates that show homology to snRNAs and snoRNAs, and ten candidate ncRNAs of unknown function. Three of the candidate snRNAs, a predicted selenocysteine tRNA and two candidates of unknown function are expressed in asexual stage parasites, further validating the screen. With these results, the biological processes underlying RNA-mediated regulation of transcription, translation and splicing can be studied in an important human pathogen.

Introduction

In the past decade, it has become increasingly apparent that RNA comes in many flavors. Familiar roles for RNA include its function as the intermediate for protein synthesis, and as an adapter molecule for translation and splicing. Depending on their ability to encode proteins, various classes of RNA have been classified as protein coding RNAs, i.e. messenger RNA or noncoding RNAs (ncRNAs) that include tRNA, small nuclear RNA (snRNA) and small nucleolar RNA (snoRNA). Most recently, with the identification of small molecular weight RNAs like microRNAs and silencing RNAs, RNA has come into its own as a regulator of gene expression [1]. Indeed, this new role of RNA as a modulator of gene expression has led to hypotheses suggesting that emergent regulatory networks modulated by both proteins and ncRNAs might explain the complexity of eukaryotic genomes that have relatively few protein coding regions [2].

The malaria parasite Plasmodium falciparum is one such eukaryote that exhibits a complex life cycle in two hosts, with several distinct life cycle stages each expressing a unique repertoire of proteins. Interestingly, despite this complex life cycle, the genome of P. falciparum has approximately the same number of predicted coding sequences as the baker’ yeast Saccharomyces cereviseae [3], [4], [5]. Mechanisms of regulation of the ∼6000 predicted coding sequences of P. falciparum are unclear and there is evidence suggesting that gene regulation in the malaria parasite may be dependent on mechanisms other than modulation of the transcription initiation complex. First, promoter analyses of upstream regions of several genes have not yielded consensus promoter/enhancer sequences [6]. Second, almost 15% of annotated genes show transcription of antisense RNAs in the asexual stages [7]. Finally, results from genome-wide transcription analyses and bioinformatics studies show that compared to the yeast genome, the Plasmodium genome has an under-representation of DNA-binding proteins and transcription factors and over-representation of RNA-binding proteins [8] Recent data that compares transcriptome and proteome profiles in different stages of the parasite life cycle has revealed that post-transcriptional regulation via RNA-binding proteins may be a major mechanism of regulating gene expression in gametocytes and sexual stages [9]. These accumulating data suggest that there may be novel mechanisms of gene regulation in the malaria parasite, including mRNA stabilization, antisense RNA and ncRNA-mediated regulation. Hence, identification of ncRNAs in the P. falciparum genome becomes an important exercise.

Several strategies have been used to identify ncRNAs in other organisms, including bioinformatics screens [10], direct cloning of small molecular weight RNAs [11] and genetic approaches; each has its own strengths and weaknesses and combinations of multiple approaches will yield maximum success. Strategies for bioinformatics screens include those that search for consensus promoter and terminator sequences in intergenic regions [12], conserved primary or secondary structure [13] or bias in base composition [14]. We describe a computational screen for ncRNAs in the P. falciparum genome that was inspired by an observation in A–T rich hyperthermophiles, viz. that increased GC content coupled with comparative genomics could identify ncRNAs successfully [14]. The rationale for this elegant screen from Sean Eddy's laboratory lies in the fact that small ncRNAs carry out their functions through intermolecular or intramolecular base pairing; hyperthermophiles can stabilize these interactions at high growth temperatures simply by introducing a bias in favor of GC bases in genes encoding small ncRNAs in their highly A–T rich genomes. Similarly, P. falciparum also has a highly A–T rich genome (80–90%) and although the parasite is a mesophile, we show that a similar strategy can be used to identify putative ncRNAs.

In this report, we have identified over a dozen candidate noncoding RNAs from the genome sequence of P. falciparum (strain 3D7). We have used the observation that there is a differential of 21% in GC percentage between rRNAs and intergenic regions and a differential of 45% between tRNAs and intergenic regions of the P. falciparum genome. In the second part of the screen, candidate ncRNAs are assessed for sequence homology between species since functionally important sequences are known to be conserved. We describe a bioinformatics strategy that identified GC-rich stretches in intergenic regions of the P. falciparum genome, followed by BLAST analysis of these stretches for conservation in the Plasmodium yoelii genome. Stringent parameters of greater than 35% GC combined with a BLAST e value of e  10, generated a list of eighteen candidate ncRNAs. Five of these candidates show sequence homology to U1, U2, U4, U5 and U6 snRNAs and two show sequence homology to CD-box and H/ACA-box snoRNAs from other organisms. One candidate is identical to a predicted selenocysteine tRNA that was identified recently from the P. falciparum genome [15]. The remaining candidates have no homology to known genes suggesting that if they exist, these ncRNAs may have novel functions. Northern analysis reveals that the seleneocysteine tRNA and the candidate ncRNAs with homology to U4, U5 and U6 snRNAs are indeed transcribed in the asexual stages of P. falciparum. Additionally, two candidate ncRNAs of unknown function are also expressed in asexual stages.

In a recent study, elevated GC content and sequence conservation was used successfully to identify a selenocysteine tRNA from the P. falciparum genome [15]. This study uses a similar approach for the computational identification and experimental validation of putative snRNAs, snoRNAs and novel ncRNAs from the P. falciparum genome. With these results, the stage is set for further understanding of RNA-mediated processes in the malaria parasite.

Section snippets

Parasite culture

P. falciparum (strain 3D7) was cultivated according to standard protocols [16] with minor modifications as 10% human plasma was used in place of 0.5% Albumax and the medium was supplemented with 0.2% glucose. Parasites were cultured in a candle jar in order to provide low oxygen tension.

Bioinformatics

The entire sequence of P. falciparum (strain 3D7) was downloaded chromosome-wise from PlasmoDB (http://www.plasmodb.org/) in October 2003. Chromosome 13 was downloaded in two sections (chr13_1 and chr13_2) as

P. falciparum tRNAs are on an average ∼35% more GC-rich than the genome

Klein et al. have shown that in A–T rich hyperthermophiles, there is a difference of 25–40% in the GC content of tRNAs relative to the genome and this observation formed the basis of their computational screen for uncovering novel ncRNAs. In order to assess whether tRNAs are also similarly higher in GC content than the P. falciparum genome, all annotated tRNA genes were downloaded from PlasmoDB [17] and their average GC percentage calculated. According to PlasmoDB, there are 35 ‘mapped’ and 8

Acknowledgements

We thank K.K. Rao, P.J. Bhat, G. Subrahmanyam, and their laboratory members for kindly sharing their equipment and reagents during the initial stages of this project. We also thank Steve Johnson and Frank Slack for sharing Northern blot protocols, and Pawan Malhotra for his generous help with parasite cultures. Funding for this work was provided by the World Health Organization (Tropical Disease Research) in the form of a Re-entry grant (A30193) to S.P. and from intramural funding from the

References (30)

  • R.W. Hyman et al.

    Sequence of Plasmodium falciparum chromosome 12

    Nature

    (2002)
  • S. Patankar et al.

    Serial analysis of gene expression in Plasmodium falciparum reveals the global expression profile of erythrocytic stages and the presence of anti-sense transcripts in the malarial parasite

    Mol Biol Cell

    (2001)
  • R.M. Coulson et al.

    Comparative genomics of transcriptional control in the human malaria parasite Plasmodium falciparum

    Genome Res

    (2004)
  • N. Hall et al.

    A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic and proteomic analyses

    Science

    (2005)
  • A. Huttenhofer et al.

    Experimental RNomics: a global approach to identifying small nuclear RNAs and their targets in different model organisms

    Methods Mol Biol

    (2004)
  • Cited by (27)

    • Noncoding RNAs as emerging regulators of Plasmodium falciparum virulence gene expression

      2014, Current Opinion in Microbiology
      Citation Excerpt :

      In eukaryotes, small RNA-induced post-transcriptional gene silencing via the RNA-interference (RNAi) pathway is the best-characterized regulatory role of ncRNAs; however, the P. falciparum genome lacks components of the canonical dicer-dependent RNAi pathway that generates this class of ncRNAs [10,21]. In contrast, NATs and lncRNAs, which have been shown to regulate gene expression in organisms lacking functional RNAi machinery, have been identified in P. falciparum using serial analysis of gene expression (SAGE), northern blots, microarrays and most recently, strand-specific RNA-seq [22–27,28•,29•,30•,31•,32•, see Box 1]. In this review, we describe the recent advances in P. falciparum ncRNA biology, with an emphasis on the putative contributions of ncRNAs to maintaining the epigenome and modulating mutually exclusive var gene expression.

    • Plasmodium falciparum spliceosomal RNAs: 3′ and 5′ end processing

      2011, Acta Tropica
      Citation Excerpt :

      Upadhyay et al., identified 18 new ncRNAs towards the end of 2005 by scanning short intergenic regions (70 pb) having more than 35% GC content and by assessing their preservation among species. Two of these 18 new ncRNAs showed features characteristic of snoRNAs and five showed sequence homology with U1, U2, U4, U5 and U6 snRNAs (Upadhyay et al., 2005). Chakrabarti et al. (2007) confirmed that P. falciparum predicted snRNA sequences are capable of folding into the same overall conformations as those for others snRNAs.

    • Epigenetics of eukaryotic microbes

      2011, Handbook of Epigenetics
    • Non-coding RNA in apicomplexan parasites

      2010, Molecular and Biochemical Parasitology
    • Epigenetics of Eukaryotic Microbes

      2010, Handbook of Epigenetics: The New Molecular and Medical Genetics
    • Control of gene expression in Plasmodium falciparum - Ten years on

      2009, Molecular and Biochemical Parasitology
    View all citing articles on Scopus
    View full text