Introduction

Mutations in the Fragile X Mental Retardation 1 (FMR1) gene, encoding the FMR1 Protein (FMRP), are the leading heritable form of intellectual disability and the main monogenic cause of autism. FMR1 contains a 5′ noncoding CGG repeat region that may expand beyond the normal range and cause fragile X syndrome (FXS) if fully mutated (>200 CGGs) or other fraxopathies (fragile X tremor/ataxia syndrome, FXTAS and fragile X premature ovarian failure, FXPOF) if in the premutation range (55<CGGs<200). Full mutations are generally methylated in the promoter region, with the consequent transcriptional silencing and absence of FMRP translation.1 On the basis of repeat size and instability, FMR1 alleles can be designated as normal (stable 5–44 CGGs), intermediate (slightly unstable 45–54 CGGs) and premutated (very unstable 55–200 CGGs, with the risk of expansion to full mutation dependent on the progenitor gender and allele size).2, 3, 4

A complex mechanism involving multiple steps seems to be on the origin of de novo FMR1 expansions, reason why this has been under debate for the last decades.5, 6, 7, 8 Cis-acting factors such as AGG interspersions in the CGG repeat region are, however, well known important stability elements.9, 10, 11, 12 Normal alleles have AGG interruptions usually after every 9 or 10 CGG triplets,13 whereas in other allele classes, the number of AGGs tends to be progressively lower as the repeat size increases. In approximately 50% of premutated alleles, an AGG interruption is observed, reducing the risk of expansion upon maternal transmission, particularly in alleles with less than 100 CGGs.11, 12, 13, 14, 15, 16 The loss of AGGs occurs in a polarized way at the 3′-end, creating a long pure (CGG)n with higher mutability.17, 18 On the other hand, while analyzing the interspersion pattern of the repeat, an increased instability seems to be correlated with the presence of the first interruption at the tenth triplet, as the configuration (CGG)9AGG(CGG)n is frequently observed in large normal alleles.19 In addition, normal-sized alleles with more than 24 pure CGGs at the 3′-end frequently share the typically studied DXS548-FRAXAC1 FXS haplotype, suggesting that these alleles could be more prone to expand.15, 17, 20, 21

FMR1 mosaicism has been reported, with multiple CGG repeat sizes (normal and full mutation alleles) present in different tissues (inter-tissue mosaic) or in different cells from the same tissue (intra-tissue mosaic) of a single patient. These post-zygotic events, utterly responsible for the intergenerational changes on repeat sizes when unstable alleles are further transmitted, may help to elucidate mitotic instability processes. Although rare, repeat size mosaicism has been previously described in FXS patients, some of them males who harbor two or more differently sized expansions and, in a few rare cases, expanded and normal alleles.22, 23 During FXS diagnostic routine, we identified four mosaic males with a typical fragile X phenotype, who carried both a full mutation and a normal pure allele at FMR1. After observing that an extremely rare event of contraction occurred independently in all four patients, we hypothesized the presence of a predisposing haplotype to large contractions in our population. We started by identifying stable single-nucleotide polymorphism (SNP) lineages of both normal pure and expanded alleles. If a common ancestry underlies these two groups of alleles, one should expect SNP lineages to be shared between them. A further comparison of fast-evolving markers (STRs) within each lineage allowed us to reconstruct phylogenetic networks, crucial to visualize genetic distances among FMR1 alleles and, thus, to clarify the origin of normal pure alleles.

Materials and methods

Subjects

In the course of the FMR1 analysis routine using previously published methods,24 four mosaic cases (M1–M4) were identified in the blood samples from male patients with FXS. In addition, we analyzed 212 males with a FMR1 allele within a CGG normal–intermediate range (cohort 1) and 123 unrelated male individuals with a clinical diagnosis of FXS and an expansion over 200 CGGs (cohort 2). The peripheral blood samples were collected after obtaining informed consent from all the subjects enrolled in this study or their legal representatives. Genomic DNA was extracted by standard salting-out procedures. Centro Hospitalar do Porto (CHP, E.P.E.) scientific and ethical committees gave their approval for this project (154-DEFI/193-CES).

(CGG)n genotyping and tract configuration

The AGG interspersion pattern of FMR1 alleles was assessed by triplet-primed PCR using 150 ng of genomic DNA, 1X PCR Master Mix (Promega, Madison, WI, USA), 5.1 mM of MgCl2, 6.4% of dimethyl sulfoxide, 2.6 mM of Betaine, 10 μM primers 1 and 2 and 7.5 μM of primer 3. The PCR reactions were performed using the following cycle conditions: an initial denaturation of 10 min, 45 cycles of 96 °C for 45 s, 58 °C for 45 s, 68 °C for 2 min and a final extension of 10 min for 68 °C. After amplification fragments were separated by capillary electrophoresis in a 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA, USA) and AGG pattern was analyzed using GeneMapper v4.0 software (Applied Biosystems).

Genotyping of flanking polymorphic markers

Three SNPs from intronic FMR1 regions were selected: rs971000 (intron 1), rs29282 (intron 7) and rs25715 (intron 9). The SNPs were genotyped by high-resolution melting curve analysis in Rotorgene 6000 (Qiagen, Hilden, Germany). The PCR reactions were carried out using 1 × Master Mix, 1 × Evagreen (Biotium, Hayward, CA, USA), 5 μM of each primer and 7.5 ng of gDNA and a PCR program containing an initial denaturation temperature of 95 °C for 5 min, followed by 35 cycles of 95 °C for 45 s, 30 s of annealing temperature, 72 °C for 30 s and a final extension of 10 min at 72 °C (Supplementary Table S1). The high-resolution melting curve analysis was performed using standard parameters and according to the manufacturer’s instructions.

The selected STR markers DXS998, DXS548, FRAXAC1 and FRAXAC2 are embedded in three distinct haplotypic blocks, with three of them upstream and FRAXAC2 located downstream the FMR1 repetitive region. Allele nomenclature follows the one previously proposed.25 STRs were amplified in a multiplex PCR using Multiplex PCR Master Mix (Qiagen) and primers labeled with different fluorochromes. The reactions were done with 1X Multiplex PCR Master Mix (Qiagen, Hilden, Germany), 150 ng of genomic DNA and a primer mix containing 4.5 μM of each DXS998 primer, 8 μM of each DXS548 primer and 10 μM of each FRAXAC1 and FRAXAC2 primer. The PCRs were performed using an initial denaturation for 15 min at 95 °C, 45 cycle steps of 30 s at 95 °C, 1 min and 30 s at 58 °C and 1 min and 30 s at 72 °C, following a final extension at 72 °C during 10 min (Supplementary Table S1). The PCR products were analyzed by capillary electrophoresis in ABI3130xl and fragment sizes determined using GeneMapper v2.6.0 software. To assess the repeat configuration of the complex microsatellite FRAXAC2, (GT)xC(TA)yTz, PCR reactions contained 100 ng of genomic DNA, 1X PCR Master Mix (Promega, Madison, WI, USA) and 10 pmol of each primer. Amplification started with an initial denaturing step at 95 °C for 10 min, followed by 40 cycles (95 °C for 1 min; annealing temperature for 1 min; and 72 °C for 2 min) and a final extension step at 72 °C during 10 min. The PCR products were purified using the Illustra ExoStar 1-Step (GE Healthcare Life Sciences, Little Chalfont, Buckinghamshire, UK) according to the manufacturer’s instructions, and sequenced using the BigDye Terminator v3.1 cycle sequencing kit (Applied Biosystems). The sequencing reaction consisted of an initial denaturing step at 96 °C for 1 min, followed by 27 cycles (96 °C for 10 s; 50 °C for 5 s; and 60° for 1 min and 15 s), ending with extension at 60 °C for 5 min. The asymmetric PCR fragments were electrophoresed on an ABI PRISM 3130xl Genetic Analyzer and analyzed using the SeqScape v2.5 software (Applied Biosystems).

Haplotype analysis

Taking advantage of the location on chromosome X of our locus of interest, FMR1, haplotypes were directly assessed on male samples. Phylogenetic relationships among normal (cohort 1) and FXS (cohort 2) alleles were investigated through reduced median followed by median-joining networks, calculated to reduce some of the reticulation of analyzed microsatellite markers (Network 4.613 software; www.fluxus-engineering.com).

Results

We identified four very rare cases showing intra-tissue mosaicism: FXS male patients harboring, in addition to the full mutation, a normal FMR1 allele (patient M4 carried an additional premutation of 125 CGGs). To gain insight into the mechanism of instability underlying the origin of these alleles, we genotyped the maternal FMR1 repeats transmitted to these mosaic patients (Table 1). Normal alleles carried by mothers differed from those found in patients regarding the number of CGGs. This suggested that post-zygote large contractions from expanded alleles occurred in our diagnosed patients. Taking into account that such a rare event occurred independently in several patients, we hypothesized the existence of an FMR1 haplotype prone to large contractions. With these contractions occurring either recurrently in the same genetic background or in different haplotypes, we wanted next to investigate the origin of all normal-sized alleles in our control population. If some fully expanded CGGs are able to contract to the normal range, the pool of normal alleles may include not only those evolving at a regular rate over the course of many thousands of years, but also others resulted, more recently, from large contractions. To follow the evolutionary history of highly mutable loci, such as the CGG repeat at FMR1, one should start by looking at a narrower time frame. As events on the origin of SNP are unique during the evolution of a species, we defined very stable FMR1 backgrounds by assessing SNP haplotypes (lineages) of this region. Next, the analysis of fast-evolving loci, such as STRs, allowed us to follow a more recent mutational history of the FMR1 alleles.

Table 1 Summary of clinical and genotyping data of mosaic male FXS patients and respective mothers

SNP–STR haplotypes of mosaic FXS patients

Through the analysis of rs971000, rs29282 and rs25715 (Supplementary Table S2), we assessed SNP backgrounds of our mosaic patients and found two different haplotypes: A-T-T (lineage A) in M1 and M2 and G-C-C (lineage C) in M3 and M4. Next, the genotyping of flanking STRs (DXS998-DXS548-FRAXAC1-FRAXAC2) has shown an extended haplotype shared by mosaics from lineage A, A21: 34-44-38-336 (Table 2). The fact that two independent large contractions (observed in mosaics M1 and M2) have occurred on the same SNP–STR background led us to question the existence of a lineage-specific predisposing haplotype. Interestingly, mosaics M3 and M4 from lineage C also shared a FRAXAC1-FRAXAC2 haplotype: 36-333. Taking into account that upstream markers DXS998 and DXS548 are located in more distant haplotypic blocks, a recombination event might have occurred during the meiosis of maternal FMR1 alleles, being responsible for breaking up a longer haplotype common to mosaics from lineage C.

Table 2 SNP–STR haplotypes of normal pure FMR1 alleles observed in mosaic patients and in our cohort 1 (212 males with a FMR1 allele within a CGG normal-intermediate range)

Regarding the complex polymorphic structure of FRAXAC2 (GT)xC(TA)yTz, one could argue that same-sized FRAXAC2 alleles had different configuration patterns, thus, not sharing identity-by-state. To discard this possibility, we sequenced all four FRAXAC2 alleles from mosaic cases: alleles 336 from M1 and M2 shared the (GT)16C(TA)7T13 configuration, similarly to alleles 333 from M3 and M4, which shared the (GT)14C(TA)7T14 pattern.

Repeat size and configuration of FMR1 alleles on the identified SNP lineages

The identification of FMR1 tract configurations from our mosaic patients has shown all normal alleles carrying no AGG interruptions, henceforward called pure alleles. We then assessed the repeat size and configuration of all the 212 healthy controls (cohort 1) to next analyze the origin of pure CGGs found in the pool of normal FMR1 alleles. After having evidence of four normal pure CGGs originated by four independent large contractions of expanded FMR1 alleles, we aimed at testing whether other uninterrupted repeats from our control population have (1) resulted by the same process or (2) evolved from other normal alleles by losing AGG interruption(s). By analyzing the same three SNPs as in the mosaic cases, we were able to place all FMR1 alleles on stable lineages. In cohort 1, normal and intermediate alleles ranged between 8 and 53 repeat units. The most common pattern was (CGG)10AGG(CGG)9AGG(CGG)9 (rf=0.344), followed by (CGG)9AGG(CGG)9AGG(CGG)9 (rf=0.127) and (CGG)10AGG(CGG)9 (rf=0.090; Supplementary Table S3). The first AGG occurs more frequently after 10 (rf=0.594), 9 (rf=0.283) or 13 (rf=0.047) CGGs, similar to other worldwide populations.13 Interestingly, we have found eight normal pure alleles distributed in lineages: A-T-T (n=5), G-T-C (n=2) and G-C-C (n=1; Table 3). Under mere stochasticity, the observed proportions of each line (62.5, 25 and 12.5%, respectively) would favor the origin of these normal pure alleles from the pool of normal interspersed repeats, as A-T-T (60%), G-T-C (25.5%) and G-C-C (7.3%) are the most frequent lineages in a total of 55 control alleles with one AGG interruption (Supplementary Table S4). On the contrary, among all analyzed expanded alleles, A-T-T accounts for only 10.6% (G-T-C: 59.3%; G-C-C: 22.8%; Table 3). Taking into account, however, that risk factors may predispose any haplotypic background to instability (independent of its frequency), both hypotheses urged to be tested.

Table 3 Relative frequencies of FMR1 lineages genotyped in the four mosaic FXS patients in comparison with those observed in normal pure, normal interrupted and expanded (CGG)n alleles

Phylogenetic distances between normal and fully expanded alleles

We further analyzed the STR backgrounds of all normal pure FMR1 alleles genotyped in cohort 1, as mosaic patients from the same SNP lineage shared entirely or partially a DXS998-DXS548-FRAXAC1-FRAXAC2 haplotype: 34-44-38-336 in M1 and M2 (lineage A) and a 3′-end 36-333 in M3 and M4 (lineage C; Table 2). All five pure CGGs from lineage A shared the FRAXAC1 allele, with DXS998 and FRAXAC2 alleles also shared by all except one individual. A similar scenario was observed for lineage C even if a single pure CGG has been genotyped: the 32-40-36-333 haplotype (segregated with (CGG)17) is only a stepwise mutation apart from the haplotype identified in case M3 (32-42-36-333). Again, by sequencing, we confirmed that FRAXAC2 alleles from pure FMR1 repeats of lineage A (336) were identical by state to the ones genotyped in M1 and M2: (GT)16C(TA)7T13. In addition, allele 338 segregating with (CGG)31 is only a step mutation apart from all the others: (GT)17C(TA)7T13. Likewise, in lineage C, allele 333 shares the same configuration as same-sized FRAXAC2 alleles from M3 and M4.

The fact that normal pure FMR1 alleles from our control population are phylogenetically close to normal alleles from mosaic patients (resulted from large contractions) supported the hypothesis of their origin by similar large contraction events from fully expanded FMR1 alleles, on a predisposing haplotype. On the other hand, if by chance, this STR haplotype was frequent among interrupted FRM1 alleles, the alternative hypothesis for the origin of pure CGG alleles by the loss of AGGs would be equally likely. Therefore, we studied 123 additional patients with a full methylated expansion over 200 CGGs (cohort 2) to compare phylogenetic relationships between normal pure alleles from the control population and (1) fully expanded alleles (Figure 1a) or (2) normal alleles carrying one AGG interruption (Figure 1b). Given the scarcity of G-T-C and G-C-C normal pure alleles, we proceeded with next analyses exclusively in lineage A (A-T-T). All normal pure haplotypes were phylogenetically close to each other, with a maximum of only three mutations apart. While testing the first hypothesis, they are shown clustered in a branch of the network, with a single haplotype (A2) shared between normal pure and FXS alleles. This suggests that recurrent large contractions from fully expanded alleles do not explain the origin of the five normal pure A-T-T alleles genotyped in cohort 1. Instead, if a single contraction occurred, a (CGG)24 allele would have arisen in haplotype A2, later evolving to A1 (after a stepwise mutation in DXS998) and to A4 (after stepwise mutations in DXS548 and in the CGG repeat). Further multistep mutations in the (CGG)n tract explain pure alleles as short as (CGG)8 and as long as (CGG)31 (Figure 2a). Interestingly, we observed a fully expanded haplotype (A3) deriving a single DXS998 mutation from A1 found in normal pure alleles. Both hypotheses of (1) a de novo expansion from (CGG)24 and (2) recombination or two mutations on DXS998 can be raised, although the second is more likely.

Figure 1
figure 1

Phylogenetic networks showing the most parsimonious relationships among FMR1 haplotypes (DXS998-DXS548-FRAXAC1-FRAXAC2) from lineage A (A-T-T) to test the hypothesis of normal pure CGGs originated by (a) the contraction of fully expanded alleles and (b) the loss of AGG interruptions. Circle and line sizes are proportional to the number of individuals and stepwise mutations, respectively. Squares indicate recombinations.

Figure 2
figure 2

Schematic representation of the two hypotheses for the origin of pure normal FMR1 alleles in the control population by (a) large contractions from the pool of fully expanded alleles and (b) the loss of AGGs from the pool of normal interrupted alleles. Dashed lines denote multistep mutations at FMR1; gray lines indicate alternative hypotheses.

On the other hand, to test the hypothesis of loss of AGG interruptions, we looked at genetic distances between pure and interrupted normal alleles, and observed three shared haplotypes (A1, A2 and A4) in the main torso of the network (Figure 2b). The only haplotype found exclusively in a pure normal allele (A13-(CGG)31) derives a single FRAXAC2 mutation from A4. Taking into account that an allele with 31 repeats (CGG)10AGG(CGG)20 has also been found to segregate on the A4 haplotype, the loss of the AGG interruption (through a recurrent mutation in the first base: AGG>CGG) is also possible to be on the origin of this (CGG)31 pure allele. To explain the ancestry of the (CGG)8 allele in our analyzed 123 normal A-T-T alleles, a multistep mutation involving the gain of several CGG units must have occurred as no intermediate-sized alleles from lineage A have been found between 8 and 20 CGGs.

While analyzing our four STRs in cohorts 1 and 2, we faced again the problem of FRAXAC2 heterogeneity. Here, to discard the hypothesis of biased results due to uncontrolled source of variation within this marker, we reconstructed phylogenetic networks based only on DXS998, DXS548 and FRAXAC1 markers; results obtained led to the same conclusions (Supplementary Figure S1). Similarly, to gain insight into the noise introduced in our networks due to recombination, we excluded the most distant marker (DXS998 located several blocks apart from the repeat); again, we obtained similar results suggesting that recombination was not a major source of additional variation to analyzed STR haplotypes (Supplementary Figure S2).

Discussion

In fragile X syndrome, the occurrence of mosaic full mutations, that is, multiple alleles within the expanded range, has been fairly reported.26, 27, 28 This is owing to the high intergenerational and somatic instability of fully expanded alleles. Rarer is the presence of both premutation and full mutation in the same individual, which usually underlies the origin of de novo expansions.10, 29 In this study, we presented four cases of an even rarer type of intra-tissue size mosaicism, with both normal and fully mutated alleles in different blood cells from a single male patient. The presence of normal/intermediate FMR1 alleles in addition to the full mutation did not modify the typical FXS clinical presentation observed in all the four patients. This may be due to the low levels of the short alleles when compared with the expanded ones and/or to their methylation status as all mosaic patients displayed methylated full expansions and unmethylated normal and intermediate alleles. This differed from the mosaic case presented by Basuta et al., a male patient with FXTAS, who showed a completely unmethylated FMR1 full mutation in peripheral blood mononucleated cells (as well as intra- and inter-tissue size mosaicism, with premutation alleles of different sizes in fibroblasts and sperm).30 In this case, the complex phenotype (features of FXS and late onset neurological deterioration with probable FXTAS) was explained by a combined molecular mechanism, derived from both the elevated FMR1 mRNA levels and low FMRP expression.

In our mosaic patients, after genotyping the respective mothers for FMR1, we have shown that all the four large contractions occurred independently in an early embryonic development of the mosaic patients, from a fully expanded to a normal-sized (CGG)n allele. Therefore, we benefited from having both parental and derived alleles to unravel the mechanisms of FMR1 CGG repeat instability, more specifically of large contractions. It would be interesting to follow the repeat dynamics of our mosaic cases, namely by analyzing the sperm cells of these patients. Interestingly, a bias for repeat contraction has been suggested in males since a man carrying an unmethylated FMR1 full mutation in peripheral blood mononucleated cells presented a premutation allele with 86 CGG repeats in sperm cells.30 Taking into account that maternal transmissions result almost invariably in expansions of the CGG allele size, one could hypothesize the maintenance of highly unstable pure CGGs owing to the existence of expansion-prone alleles in female meioses, which would contract again to the normal range in the paternal lineage during father–offspring transmissions.

Despite several studies have included both SNP and STR markers flanking the repetitive region, herein, to follow more accurately the recent evolution of the FMR1 repeat, we applied a population genetics approach by first analyzing stable SNP backgrounds to next place fast-evolving STRs within the pre-identified SNP lineages. This is, to our knowledge, the first study combining such a structured haplotype analysis with seven flanking polymorphic markers phased with FMR1 alleles (with known repeat size and configuration). A single SNP background was not shown to segregate with (CGG)n alleles from our analyzed mosaics, however, the shared extended DXS998-DXS548-FRAXAC1-FRAXAC2 haplotype within each lineage (34-44-38-336 in lineage A and the 3′-end 36-333 in lineage C) raised the question of a lineage-specific risk haplotype more prone to large contractions. After observing that all contracted alleles displayed no AGG interruptions, we searched for the answer by analyzing STR haplotypes of normal pure FMR1 alleles. The close phylogenetic relationships observed among normal pure repeats from the control population suggested their recent evolution from a common ancestor instead of recurrent large contractions from the pool of expanded alleles. Nevertheless, even if drastic reductions on FMR1 repeat sizes are not on the origin of all pure normal alleles, the occurrence of these rare events on recurrent extended STR-based haplotypes may indicate that a predisposing factor to large contractions may be tagged in both lineages A and C. The mechanisms behind this cis-effect may be via (1) changes in chromatin organization as SNP variants may impede the binding of CTCF, an insulator DNA-binding protein previously linked to repeat instability;31 (2) gene regulation if these or associated SNPs alter the binding of transcription factors as transcription through the repeat tract has been shown to increase repeat instability;32 and/or (3) differential interaction of repair proteins with the SNP variants at FMR1, resulting in different instability patterns as suggested for other trinucleotide repeat disorders.33

In the literature, a few other multistep contractions have been described in the maternal lineage (although in a smaller range), namely the change of interrupted premutated alleles either to smaller premutations or to intermediate pure alleles.10 It would be interesting to assess the haplotype background where these events took place and gain insight whether small and large contractions share the same risk factors. While analyzing the evolution of normal pure alleles, other, less drastic multistep mutations seemed to occur. Indeed, a multistep mechanism involving the gain or loss of several repetitive units during the intergenerational transmission of expanded alleles has been suggested to explain the evolution of other disease-expansion repeats.34, 35, 36

Further studies of repeat instability underlying the different groups of normal pure (CGG)n alleles (the unstable ones resulted from large contractions and those, probably more stable, evolved through interrupted repeats) would be of upmost importance for the research of both the mechanistic aspects of repeat instability and the clinical risk-assessment for fragile X syndrome.