Background
Malaria remains a major human infectious disease, killing hundreds of thousands of people each year [
1]. The molecular basis of various disease symptoms or pathogenesis remains largely unknown due to difficulties in studying malaria infections in human hosts, including ethical concerns, genetic variations in human hosts and parasite strains, and co-infections with other pathogens. Because of better-controlled experimental environments, several rodent malaria parasites (RMP, i.e.,
Plasmodium yoelii,
Plasmodium berghei, and
Plasmodium chabaudi) have been established as animal models for studying molecular mechanisms of malaria pathogenesis, drug resistance, parasite development, and host-parasite interaction [
2‐
5]. Although there are differences in disease mechanism between human and mouse malaria infections, they share many characteristics in disease symptoms and in host responses such as pro-inflammatory immune responses [
6]. Among the RMPs,
P. yoelii has been used as an animal model for vaccine development and for studying the genetic basis of parasite invasion and virulence [
7‐
9]. Recently, various genetic markers [
10‐
12] and genetic crosses have been reported for mapping important traits, such as red blood cell (RBC) invasion, parasite growth, and host cytokine/chemokine levels in
P. yoelii infection [
8,
13,
14] using clonal
P. yoelii lines that exhibit wide variation in disease genotypes [
9,
15,
16]. However, genetic mapping can only link chromosome segments containing various candidate genes to a phenotype. Identification of a causative gene generally requires further structural and functional confirmation of candidate genes, including experimental confirmation of individual gene structure (intron/exon boundary), expression, and function, which is usually time-consuming and labour-intensive. Systematic verification of predicted gene models and gene expression using cDNA sequences represents an efficient approach for gene model verification, which will greatly facilitate the genetic studies of the RMPs.
For
P. yoelii, a draft genome of the
P. y. yoelii 17XNL strain was published using Sanger dye-termination method more than 10 years ago [
17]. Additionally, RNA sequencing (RNA-seq) was performed using RNA extracted from wild-type and
pypuf2—sporozoites 14 days post-mosquito blood meal for comparison of gene expression level, but not for gene model verification [
18]. Recently, RNA-seq data from three rodent malaria parasites were also reported, which greatly improved gene-model, gene expression, and subtelomeric multigene families [
19]. However, errors and uncharacterized introns are still likely present in the RMP genomes. For example, errors in gene structure prediction such as missing introns, incorrect intron/exon boundaries, genes with alternatively spliced transcripts, and new transcripts missed by computer prediction were reported for the human malaria parasite
Plasmodium falciparum even after several RNA-seq studies [
20‐
25]. In this regard, additional RNA-seq data from different strains or subspecies of RMPs may reveal valuable information for further re-fining gene models and for detecting uniquely transcribed genes in specific parasite strains/subspecies. In addition, genome-wide sequencing of cDNA libraries from parasites with or without drug pressure may allow quantification and comparison of gene expression that could be linked to parasite response to a drug.
In this study, two cDNA libraries from a subspecies of P. yoelii, P. y. nigeriensis NSM (NSM), were sequenced with or without mefloquine (MQ) treatment. mRNA samples from mixed blood stages of the parasites were extracted, directional cDNA libraries were prepared, and RNA-seqs were performed to obtain large numbers of sequence reads. Comparison of the RNA-seq sequences to the assembled YM genome and recently published RMP cDNA sequences detected introns in 5′ and 3′ untranslated regions (UTRs), alternatively spliced transcripts, overlapping reads in opposite directions, and putative new gene transcripts. Differential/alternative splicing and SNPs between YM and NSM parasites, and variations in gene expression between NSM parasites with or without MQ pressure, were also observed. This study improves gene models in the P. yoelii genome and identified various UTR introns that may play an important role in regulation of gene expression.
Discussion
This study reports genome-wide sequencing and characterization of directional cDNA libraries from P. yoelii
nigeriensis parasite, generating ~56 million high-quality paired-end sequence reads. Approximately ~66 % of the reads were mapped to the assembled YM genome sequence, providing good coverage across the 14 parasite chromosomes. This study also shows that more than 50 % of the genes were transcribed at relatively high levels (RPKM ≥ 15) in the NSM parasite, and ~11–13 % of the genes were either not expressed or expressed at low levels in blood stages. The genes with no RNA detected could be expressed in liver or mosquito stages that were not examined in this study. The relatively high proportion of genes being expressed is not surprising because the cDNA libraries were prepared from mRNAs of mixed stages.
The relatively large numbers of cDNA sequences obtained in this study allowed comparison of gene structures and detection of gene model prediction errors. The results show that gene structures/models for this
P. yoelii subspecies (NSM) are very similar to those of YM parasite. Consequently, the majority of new introns identified were at UTR regions, particularly 5′ UTRs. The observations of UTR introns raise an interesting question of whether the alternatively spliced non-coding introns play a role in regulating gene expression in malaria parasites. Indeed, introns in UTRs have been found in many organisms; approximately 35 % of 5′ UTRs and ~16 % of 3′ UTRs of human genes contain introns [
45]. Recent studies have shown that genes with regulatory functions are more likely to have 5′ UTR introns, and human 5′ UTR introns can enhance the expression of some genes in a length-dependent manner [
46]. In
Arabidopsis thaliana, the density of introns in 5′ UTR was found to be much higher than in 3′ UTRs (~threefold), and the presence of a long intron in the 5′ UTR (i.e., intron not spliced out) in the EF1α-A3 gene could enhance the expression of the gene [
47]. 5′ UTR introns can also influence how the mRNAs are exported from the nucleus; mRNAs with 5′ UTR introns are generally exported by the canonical transcription export (TREX) pathway, whereas those without 5′ UTR introns are exported through an alternative mRNA export (ALREX) pathway [
45]. Similar to those found in
Arabidopsis thaliana, approximately three-fold more 5′ UTR introns than 3′ UTR introns were observed in
P. yoelii, which could partly reflect generally longer 5′ UTRs than 3′ UTRs in genes of malaria parasites. Among the genes with 5′ UTR introns include those encoding ribosomal proteins, DNA/RNA binding proteins, transcription factors, and heat shock proteins (Additional file
6). Therefore, in addition to the well-known mechanisms such as transcription factors, promoters, silencers, enhancers, epigenetic regulators, and antisense transcripts, introns in non-protein coding regions may also play an important role in regulation of gene expression in malaria parasites. Further functional investigations are necessary to provide experimental evidence to support this claim.
Another interesting observation of this study was differentially (different proportion) spliced introns in some genes between YM and NSM parasites. These differences in intron splicing between the two parasite subspecies will likely contribute to differences in parasite biology and/or disease phenotypes, suggesting that genetic investigations such as linkage or association studies may need to consider the differences in intron splicing. The causes of these splicing differences are unknown, but the majority of them are not due to mutations at intron splicing sites between the two parasites, suggesting the presence of unknown mechanisms of intron splicing and gene expression regulation in malaria parasites.
Approximately half of the genes in the genome were not included in the analysis of alternatively spliced events due to low or lack of expression because only genes with RPKM ≥ 40 were analysed. The actual numbers of genes with various AS events could be much higher than those observed. Nonetheless, this study still detected many alternatively spliced events that improved gene models and genome annotation of the P. yoelii parasite. The main goals of this study were to detect expressed genes in blood stages and to verify intron/exon boundaries in the predicted gene models. Many transcripts that were not detected in this study, including those only expressed in liver and mosquito stages, can be evaluated if RNA samples are collected from different developmental stages.
Sequencing cDNAs from a parasite evolutionarily different from the YM parasite also allows detection of a large number of SNPs (~84,000), which is consistent with the previous report of a high level of diversity between YM and
P. y. nigeriensis N67 [
12]. Both N67 and NSM were derived from NS, so the genomes of NS, NSM, and N67 are essentially the same [
12,
27]. To confirm the accuracy of the SNPs detected in this study, 26 putative SNPs with relatively low read coverage were randomly selected; DNA segments containing the SNPs were amplified; and the PCR products were sequenced directly. Except for one sequencing failure, all the 25 putative SNPs from RNA-seq were confirmed after PCR amplification and direct product sequencing using Sanger’s method. The results provide strong confidence in the SNPs detected in this study, and these SNPs will be useful for genetic studies of the
P. yoelii parasites.
Examples of alternatively and differentially spliced introns within the same and between different parasites were also PCR amplified to support the observations of differentially spliced introns between the two parasite subspecies, although a variety of factors may contribute to the differences observed, including sequencing depth, variations in library construction, and differences in the parasite stages when RNA samples were prepared. The mechanism and functional significance of alternative and differentially sliced introns between parasite strains or subspecies will require further investigations.
PCR amplification of putative introns with or without the GT-AG intron splicing boundaries suggests that P. y. nigeriensis parasite also follows eukaryotic intron splicing mechanism requiring the GT-AG intron boundary for intron splicing. All six putative introns (or regions with split reads) without GT-AG sites turned out to be false, showing no spliced PCR band, whereas 13 of the 17 genes with introns having GT-AG sites had bands smaller than those from genomic DNA, suggesting the presence of introns. The four genes with putative introns that had GT-AG boundary but did not produce a smaller PCR band could be due to low-level expression of the alternatively spliced intron, or the introns were only spliced in YM (not tested here), but not in NSM and 17XNL.
Authors’ contributions
JLI, BC, YQ, WZ, JWL, RX, ZT, LH, QP and SL carried out experiments and analysis. ML and MQ conducted data analysis and writing. JL and X-zS did project design, data analysis, and writing. All authors read and approved the final manuscript.