Introduction

Glycogen-storage disease type II (GSDII; Pompe disease, acid maltase deficiency, MIM no. 232300) is an autosomal recessive-inherited disorder due to the deficiency of acid α-glucosidase (GAA; E. C.3.2.1.20) that results in impaired glycogen degradation, which accumulates within the lysosomes. The GAA gene (MIM no. 606800) has been localized to human chromosome 17q25.2–25.3. The enzyme is synthesized as an inactive precursor of 110 kD, which is transported to the prelysosomal and lysosomal compartment via the mannose-6-phosphate receptor where it is processed into a 95 kD intermediate and the fully active forms of 76 and 70 kD.1, 2, 3, 4

Clinically, GSDII encompasses a continuous spectrum of phenotypes from a rapidly progressive infantile form to a slowly progressive late-onset (LO) form. Classic infantile GSDII manifests soon after birth and is characterized by absent or nearly absent enzyme activity, severe muscle weakness, cardiomyopathy and respiratory insufficiency, which typically lead to death within the first year of life.3, 5, 6, 7

LO GSDII comprises all milder subtypes: partial enzyme deficiency manifests in children and adults as slowly progressive skeletal muscle weakness without cardiac involvement. Respiratory muscle weakness, particularly of the diaphragm, is the leading cause of death in the LO cases.3, 5, 7, 8, 9

A large number of sequence variations in the GAA gene have been described to date (http://www-fgg.eur.nl/ch1/pompe/en/?Molecular_aspects:Mutations). Among them, 15% are variations that may affect pre-messenger RNA (pre-mRNA)-splicing process.

Pre-mRNA defects seem to have a role in almost all known genetic disorders.10, 11 However, unless the mutation affects the highly conserved nucleotides at the exon 3′ss and 5′ss boundaries, it has often been very difficult to show a clear correlation between a suspected mutation and the disease. Recently, several methods have been developed to evaluate the clinical effect of mutations that may cause splicing defects.12 As is intuitively obvious, direct analysis of the mature mRNA from the patient remains the most reliable method to determine whether or not a genetic variation affects splicing. However, cells/RNA from the patient might not be available or the transcript may be expressed only in highly selected tissues, making this approach not always possible. To overcome this problem, alternative systems such us minigene-based assays have been used.13 All these approaches, however, require a substantial amount of time and skill, if they have to be applied to a large number of putative splicing mutations.

For this reason, several in silico approaches to assess the effects of sequence variants on splicing have been developed. In general, these splice-prediction programs (SPPs) evaluate the effect of putative splicing mutations on the strength of 5′ and 3′ splice-site sequences, or search for potential changes within the vast array of splicing regulatory elements (SREs) known to this date. Although the predictions obtained are usually not enough to establish with sufficiently high accuracy the clinical impact of genetic variations on splicing, it has been proposed that SPPs could be used to perform a first selection of those variants that may have an effect on the pre-mRNA splicing before starting with time-consuming and labor-intensive mRNA analysis.14

In this study, we have comprehensively reviewed the available information on splicing mutations of the GAA gene and we have evaluated the possible impact of these genetic variations on pre-mRNA-splicing process using different in silico approaches. In addition, using a minigene system assay, we have performed the functional characterization of three sequence variants previously found in Italian patients affected with LO GSDII.

Materials and methods

Mutation nomenclature

All mutations are described according to the mutation nomenclature, considering nucleotide +1 as the A of the first ATG translation initiation codon (http://www.hgvs.org/mutnomen).15, 16 Nucleotide numbers are derived from cDNA GAA sequence (RefSeq cDNA Y00839.1).

Splice-site prediction of intronic variants previously described in the GAA gene

The sequence environment of all acceptor and donor sites was analyzed using Splice Site Prediction by Neural Network, NNSPLICE http://www.fruitfly.org/seq_tools/splice.html/.17 Maximum entropy scores were obtained using the software based on the maximum entropy principle, MaxEntscan (http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html/).18 The H-bond scores were calculated at http://www.uni-duesseldorf.de/rna/html/hbond_score.php.19 Finally, the Sroogle scores were obtained using the software available at http://sroogle.tau.ac.il/.20

In addition, the potential effect of the nucleotide variants on SREs was also analyzed with ESEfinder (http://rulai.cshl.edu/cgi-bin/tools/ESE3/esefinder.cgi),21, 22 RESCUE-ESE (http://genes.mit.edu/burgelab/rescue-ese/)23 and PESX (http://cubweb.biology.columbia.edu/pesx/).24, 25

Minigene constructs

To evaluate the expression of exons 7, 11 and 18 of the GAA gene, wild-type minigenes GAA1194wt, GAA1626wt and GAA2646wt were obtained by insertion in the pcDNA3 plasmid of PCR fragments containing the genomic GAA sequence from exons 6–8, 10–12 or 17–19, respectively. PCR amplification was performed using primers 6F–8R, 10F–12R and 17F–19R (listed in Table 1). The forward and reverse primers carried a HindIII and EcoR1 restriction site, respectively. Mutated minigenes GAA1194m, GAA1626m and GAA2646m carrying mutations c.1194+2T>A, c.1626C>G and c.2646_2646+1delTG, respectively, were prepared by site-directed mutagenesis (SDM) using the Quickchange Site-Directed Mutagenesis Kit (Stratagene, Cedar Creek, TX, USA) according to the manufacturer's instructions. Primers used for the SDM are listed in Table 1. Each clone was entirely sequenced to confirm that no other mutations were introduced.

Table 1 Sequences of oligonucleotides used for PCR amplification and SDM of GAA gene

Cell culture and transient transfection

COS-1, CHO and Hep3B cells were grown on monolayers in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum, 2 mM L-glutamine and 50 mg/ml penicillin/streptomycin (Gibco, Paisley, UK). HeLa cells were cultured in RPMI 1640 supplemented with 10% fetal calf serum, 2 mM L-glutamine and 50 mg/ml penicillin/streptomycin (Gibco). Cells were transfected with Lipofectamine 2000 (Invitrogen, Carlsbad, CA, USA) using 4 μg of total plasmid DNA Endofree purified (Sigma, St Louis, MO, USA) following the manufacturer's instructions.

Minigene splicing assay

COS-1, CHO, Hep3b and HeLa cells were transfected with the wild-type and mutant minigene constructs. Total RNA was extracted after 48 h using TRIzol reagent (Gibco) and analyzed by RT-PCR. Reverse transcription was performed using the oligo (dT) primer; the PCR reaction was carried out with a forward vector-specific primer (5′-AGGGAGACCCAAGCTTGATG-3′) and the reverse primers 8R, 12R or 19R (Table 1). PCR products were resolved in a 1% agarose gel and sequenced.

Results

The complete spectrum of GAA mutations that may affect pre-mRNA splicing is shown in Figure 1. So far, 39 different sequence variations that may alter the splicing process have been described (http://www-fgg.eur.nl/ch1/pompe/en/?Molecular_aspects:Mutations). All of them were analyzed in silico using three Splice Site Prediction (SSP) programs to evaluate the potential effect on 5′ss and 3′ss strengths (MaxEntScan, NNSplice and HBond). These programs were chosen on the basis that they are very common in the scientific literature that focuses on splicing mutations and use different approaches to measure splice-site strengths. For example, the MaxEntScan server scores 5′splice-site sequences taking into account known dependencies between adjacent and non-adjacent positions of the splice-site consensus, the NNSplice program is a machine-learning approach that recognizes sequence patterns, once it was trained with DNA sequences encompassing authentic splice sites and HBond analyses individual hydrogen-bonding patterns to the U1 snRNA 5′ end irrespective of nucleotide frequencies.

Figure 1
figure 1

Schematic representation of GAA gene. Exons are represented by black squares and introns by lines. The position and the sequence variation of the GAA mutations that may affect pre-mRNA splicing are indicated by arrows.

In addition, all mutations were analyzed with respect to the eventual presence of SR-binding sites or general enhancer/silencer SREs using the ESEfinder, RescueESE and PESX programs. Also in this case, these programs use different approaches to identify and score putative splicing regulatory sequences. The ESEfinder program is based on functional SELEX-binding analyses to identify, using a matrix-scoring approach, putative SR protein-binding sites and their eventual disruption following the introduction of nucleotide changes. On the other hand, both RescueESE and PESX servers have used statistical analyses to identify putative regulatory motifs (hexamers or octamers) whose frequency differs between exons/introns and between exons with weak or strong splice sites (RescueESE) or within internal non-coding exons vs adjacent pseudoexons or the 5′ UTR of intronless genes (PESX).

In general, for all these programs, the user is allowed to make his/her own decision with regards to decide whether a prediction is positive or negative. In our analyses, we have therefore decided to look predominantly to score variations rather than their basic numerical value. In general, we have considered that a mutant score should have at least a 10% difference from the wild-type one to be classified as a deleterious change. For all these programs, of course, the key question is represented by the degree of reliance that one can place in their predictions. In general, because the donor and acceptor elements tend to be reasonably conserved in human genes, the programs that evaluate these elements seem to be rather more successful than those that aim to target the much more loosely conserved SREs. However, all bioinformatics approaches tend to suffer from individual drawbacks that have been summarized recently by Spurdle et al.26 In addition, for obvious reasons, none of these programs can easily take into account all the factors that influence splice-site selection and include transcriptional effects, influence of genomic context, relative abundance of splicing factors and, of course, the presence of still unidentified regulatory elements.27 Therefore, it is common consensus that to evaluate a potential splicing mutation it is advisable to use as many programs as possible to perform the analysis.28 For this reason, in addition to all these programs, we have also decided to test these mutations using a more recent in silico prediction program (Sroogle)20 that employs an integrated approach to answer the question whether a nucleotide variation has a good chance of representing a splicing mutation as opposed to a benign polymorphism. The results of this analysis are reported in Table 2.

Table 2 In silico evaluation of 39 GAA sequence variants that may affect the pre-mRNA splicing

Out of the 39 mutations taken into consideration, 38 are clearly predicted to affect the splicing process by at least one of the programs used (but very often more), whereas only the c.-32-13T>G scores as negative with all programs except for MES and Sroogle where it barely clears the 10% cutoff score.

Among the programs tested, the most successful ones are represented by those that aim to predict the eventual disruption of the natural 5′ and 3′ splice sites. In fact, 37 mutations are clearly predicted to affect the 3′ or 5′ splice-site signal strengths. More interestingly, it is worthy to point out that with possibly just one exception (c.-32-3C>A), where MaxEntScan performs apparently better, in all other cases, the integrated Sroogle program performs in the same way as any of the individual programs. In some cases, such as c.1636+5G>C and c.1636+5G>T, it even performs better than the MaxEntScan and NNSplice programs.

We also looked at whether the programs were able to predict the degree of changes correctly. However, no correlation between the degree of score change and the consequence on the mRNA splicing observed in vivo was found.

Next, to analyze the possible correlation between the effect of GAA mutations on the splicing process and the clinical phenotype, we reviewed the clinical presentation of patients carrying these mutations. As shown in Table 3, 15 splicing mutations were reported in infantile patients and the impact on pre-mRNA processing has been reported for 11 of them.

Table 3 Correlation between the effect of GAA mutations on the splicing process and the clinical phenotype

Interestingly, 10 of these 11 mutations associated with infantile onset caused a severe defect on mRNA splicing, leading to either exon skipping, shift in the reading frame or the lack of the corresponding transcript in vivo.29, 33, 34, 36, 37, 38, 39, 40, 41, 42 In the light of these findings, therefore, it is not unexpected that patients carrying these mutations presented a severe form of GSDII. In fact, only in cells from a patient carrying the c.1437G>A mutation, a very low amount of normal mRNA was detected (1.2%),34, 35 suggesting that the amount of enzyme translated from this mRNA is probably not enough to prevent the development of a severe form of the disease.

In contrast to this picture, in patients affected with LO GSDII, eight mutations were reported in a homozygous state or in association with severe mutations. In this case, the effect on pre-mRNA processing has been analyzed in vivo for seven of them. It is interesting to note that in the cells of patients presenting six of these mutations, variable amounts of normal spliced GAA mRNA have been detected (from 3.6 to 13.7% of GAA mRNA expressed in normal cells).29, 31, 34, 38, 43 As a result, these data may explain the less severe phenotype reported in these patients.

Five of these LO GSDII-associated splicing mutations were found in patients with the common c.-32-13T>G mutation.30, 32, 33, 35 It is worth mentioning that the mutation c.-32-13T>G, the most common one among the Caucasian LO GSDII patients,3 has been previously studied in vitro using a minigene system assay.51 The results have shown that this mutation does not completely prevent normal splicing, as low levels of the correctly spliced mRNA were generated with the mutant construct. In fact, three splice variants (SV1, SV2 and SV3-Table2) were observed with both the wild-type and the mutant constructs, indicating that these forms represent normal alternative spliced products. Thus, the major effect of the mutation seems to affect mostly the overall splicing efficiency of the pre-mRNA transcript rather than in a qualitative way. As the presence of the c.-32-13T>G mutation in one allele is enough to determine the LO phenotype, independently of the mutation present in the second allele, it was not possible to establish a correlation between mutations present in association with the c.-32-13T>G allele and the clinical presentation. The only exception is represented by mutation c.1076-1G>C, which had been analyzed in vivo. In this case, RT-PCR analysis performed in cultured fibroblasts from a patient carrying the c.1076-1G>C mutation showed the inclusion of introns 6 (79 bp) and 7 (89 bp) into the transcribed mRNA. If we consider the fact that the c.1076-1G>C mutation has also been found in homozygosis in infantile GSDII patients,30 these data indicate that c.1076-1G>C could be classified as a severe mutation.

To complete this analysis of GAA-splicing mutations, it should also be noted that in a previous study, we have characterized the mutation profile of the GAA gene in 40 Italian patients with LO GSDII.33 Overall, five mutations that might have affected the splicing process were found. However, as RNA and/or cells from patients carrying three of them (c.1626C>G, c.1194+2T>A and c.2646_2646+1delTG) were unavailable, their deleterious effect could not be confirmed. The sequence variation c.1626C>G was found in the homozygous state and did not disrupt the reading frame and codon usage, whereas mutations c.1194+2T>A and c.2646_2646+1delTG affected the consensus 5′ splice donor sites of exons 7 and 18, respectively. Several SPPs (Table 2) clearly predicted that mutation c.1626C>G would create a novel donor site, which would have caused the exclusion of 11 bp of exon 11, whereas the mutated sequences c.1194+2T>A and c.2646_2646+1delTG would no longer be recognized as donor sites. Therefore, to test the predicted effects of these mutations on GAA pre-mRNA splicing, we have now performed a functional splicing assay.

As shown in Figure 2, cells transfected with the mutant constructs bearing these mutations produced aberrant transcripts in all cases. However, it is worth noting that in cells transfected with the GAA1626m construct, a low amount of a transcript similar in size to the normal one was also present. As the wild-type acceptor and donor sites of intron 11 seem not to be abolished, this data suggest that even in the presence of this mutation, a low amount of wild-type transcript would still be produced.

Figure 2
figure 2

Schematic representation of the GAA regions affected by splicing mutations. Mutations c.1194+2T>A, c.1626C>G and c.2646_2646+1delTG (panels a1, b1 and c1, respectively) are highlighted in bold. RT-PCR analysis of the GAA mRNA in cells transfected with wt and minigenes containing mutations c.1194+2T>A, c.1626C>G and c.2646_2646+1delTG (panels a2, b2 and c2, respectively) using a forward vector-specific primer that amplified only the minigene product.

Sequencing analysis of the aberrant PCR product showed that as predicted by the SPPs, the c.1194+2T>A and c.2646_2646+1delTG mutations led to the skipping of exons 7 and 18, respectively (Figure 3a and c). In the light of these results, both mutations should be considered to have a severe impact on pre-mRNA splicing. However, in both patients, these mutations were found in association with the common c.-32-13T>G mutation (Table 3) that, as mentioned above, would probably have been enough to determine the observed LO phenotype.

Figure 3
figure 3

Sequencing analysis of the RT-PCR products obtained from amplification of GAA mRNA in transfected cells. (a) Cells transfected with c.1194+2T>Am construct, (b) cells transfected with c.1626C>Gm construct and (c) cells transfected with c.2646_2646+1delTGm constructs.

Finally, the analysis of the RT-PCR products found in cells transfected with the GAA1626m construct showed that, unexpectedly, the shorter transcript was not only lacking 11 nt of exon 11 as predicted by the SPPs, but also a portion of 46 nucleotides of exon 12 was missing (Figure 3b). In any case, the presence of low amounts of the wild-type transcript was confirmed (data not shown). If we consider that the c.1626C>G mutation was found at the homozygous state, this finding may also explain the LO form of the disease observed in this patient (Table 2).

Discussion

The complexity of the splicing process has revealed that almost all types of nucleotide changes may potentially affect the pre-mRNA-splicing process. However, distinguishing between benign and disease-causing sequence substitutions may be challenging. In fact, it is not realistic to test all nucleotide modifications at RNA level. Therefore, several in silico approaches to assess the effects of sequence variants on splicing have been developed and it has been proposed that they could be used to perform a first selection of those variants that may have an effect on the pre-mRNA splicing.28, 52

In this work, we have reviewed the genetic variations reported in the literature that might affect the GAA pre-mRNA-splicing process. The in silico analysis of 39 sequence variants showed that this approach may be useful to select those mutations that affect the splicing process. As previously mentioned, among these programs, the most successful ones are represented by those that aim to predict the eventual disruption of the natural 5′ and 3′ splice sites. This observation is consistent with previous analyses that show that even a moderate consensus, such as the one present in the 5′ and 3′ ss splice sites of higher eukaryotes, greatly helps this kind of computational studies. It is common knowledge, in fact, that the much more loosely conserved enhancer or silencer elements are much more difficult to be defined, and in this respect, it is worthy to point out the comment by Rogan PK53 that highlights how only one in four programs, which aim to predict ESE motifs, was actually capable of correctly recognizing the creation of a novel ESE element in a pathological pseudoexon activation event.54

Considering that splicing signals are highly context specific, it is not surprising that they are hard to predict and that programs that uses an integrated approach perform better. In keeping with this hypothesis, the analysis performed here indeed has shown that the integrated Sroogle program performed at least as well as any of the individual programs. This conclusion is supported by the observation that another integrated program (Automated Splice Site Analysis, ASSA, available at https://splice.uwo.ca)55 used to analyze splicing mutations in the RB1 gene provided several advantages with respect to individual programs.28 Therefore, it is possible to conclude that integrated approaches, such as the one represented by Sroogle, may successfully substitute for the use of separate analyses programs to shorten the time to complete a preliminary screening of a large number of mutations.

There are also some cases, however, where in silico analysis did not perform so well. Not surprisingly, these cases include mutations that occur more distant to the splice site (ie, c.-32-13T>G and c.1195-8G>T). However, in the case of c.1195-8G>T mutation, the use of the ESEfinder and PESX programs might have substituted for the relative inability of the other softwares to detect the splicing-modifying potential of this mutations. In the case of the c.-32-13T>G substitution, even using a very favorable degree of score change (10%), it was barely predicted to affect the splicing process. There are several reasons that may account for this failure but the fact that this mutation still allows a certain amount of normal splicing (as discussed above) most probably places it in the limit of the detection threshold of these bioinformatics programs. Taken together, therefore, 97% of the GAA-splicing mutations taken in consideration in this analysis could have been clearly predicted to have an impact on the splicing process by in silico analysis alone.

The remarkable heterogeneity of mutations found in GSDII patients makes it difficult to correlate the genotype with the phenotype. However, the results reviewed here show a good correlation between the impact of the mutation on the splicing process and the clinical phenotype (ie, the more severe the impact on the splicing process, the more likely the presence of an infantile form of the disease and vice versa).

Finally, using a minigene assay, we confirmed the pathogenic effect of the three sequence variants found in Italian patients affected with GSDII. All these three mutations were predicted to affect the splicing process by SPPs. However, in the case of mutation c.1626C>G, although several SPPs predicted that the mutation would create a novel donor site, which would cause the exclusion of 11 bp of exon 11, the functional analysis showed that this mutation not only creates a new donor site as predicted but also causes the activation of a cryptic splice site localized in exon 12, downstream of the normal acceptor site of exon 11.

In conclusion, this study has shown that in silico analysis represents a useful tool to select mutations that affect the splicing process of the acid α-glucosidase. This type of analysis is quite straightforward and reliable. However, it is worth to highlight the importance of functional studies for the correct evaluation of sequences’ variations. In fact, a comprehensive analysis of the mechanism by which a sequence variant affect the mRNA splicing is crucial to analyze possible correlations between the mutation and the clinical phenotype, and to evaluate the feasibility of using emerging splicing-based therapeutic approaches.