Background
Facioscapulohumeral muscular dystrophy (FSHD) is an autosomal disease characterized by muscle weakness that initially manifests in the face, shoulder, and upper arms, followed by asymmetric involvement of other muscles [
1].
DUX4 is a causative gene for FSHD and is located within an approximately 3.3 kb repeat sequence, referred to as D4Z4, which comprises 1–100 repeat units (RUs) on the subtelomeric regions of chromosomes 4 and 10. Chromosome 4 has two haplotypes distal of the D4Z4 repeat, 4qA and 4qB, where only the 4qA allele contributes to FSHD development, due to the presence of a polyadenylation signal in the most distal D4Z4 RU [
2,
3].
FSHD has two types, FSHD1 and FSHD2, both caused by genetic defects leading to aberrant DUX4 expression in skeletal muscle [
4]. FSHD1 is mediated by contraction of the D4Z4 4qA allele to 1–10 RUs [
5], while FSHD2 is caused by a combination of milder D4Z4 contraction (8–20 RUs) and genetic variants in
SMCHD1,
DNMT3B, or
LRIF1, which each encode epigenetic modifiers [
6‐
8]. Epigenetic modifiers affect histone modification, DNA methylation, and RNA-based mechanisms, may be involved in mechanisms of various diseases and have important diagnostic potential [
9] DNA methylation and histone modification at D4Z4 RUs are altered in FSHD [
10‐
12]. CpG methylation is specifically decreased at the contracted D4Z4 repeat on chromosome 4 in FSHD1, while the D4Z4 repeats on both chromosomes 4 and 10 are hypomethylated in FSHD2 [
10,
13,
14]; however, the distribution of methylation throughout the full D4Z4 repeat sequence has not been analyzed.
Southern blotting, bisulfite sequencing, molecular combing, and next-generation sequencing are currently used for genetic diagnosis of FSHD [
15], but these diagnostic procedures and interpretation of their results present several difficulties. First, interpretation of hybridization patterns generated by Southern blotting is complicated by the fact that the detecting probe also recognizes an additional locus on chromosome 10q that is almost completely homologous to the target 4q35 locus. Second, two subtelomeric variations distal to D4Z4 have been identified on chromosome 4, referred to as the 4qA and 4qB alleles, and selective identification of contracted 4qA repeats is necessary, as only 4qA is associated with FSHD. Third, analysis of CpG methylation by bisulfite sequencing has been performed across the entire D4Z4 units at both the 4q and 10q loci; however, a focal region of extreme demethylation has been reported [
16]. Additionally, several patients with milder D4Z4 contraction and CpG hypomethylation have been identified, making diagnosis difficult.
Here, we applied Nanopore CRISPR/Cas9-targeted resequencing (nCATS) to measure the number of D4Z4 RUs and their methylation status in patients with FSHD. We specifically analyzed D4Z4 RUs derived from 4qA and measured the CpG methylation rate in each RU. D4Z4 RUs from 10q were also analyzed.
Methods
Genomic DNA preparation
Peripheral blood lymphocytes (10 ml) were combined with 30 ml EL buffer (155 mM NH4Cl, 10 mM KHCO3, 1 mM EDTA, pH 7.4) on ice for 15 min, followed by centrifugation (KUBOTA 5930, RS-3012M) (840×g, 10 min, room temperature). After a repeat EL buffer wash, pellets were suspended in 3 ml NL buffer (10 mM Tris–HCl, 2 mM EDTA, 400 mM NaCl, pH 8.2), followed by addition of 1% SDS and proteinase K and incubation at 37 °C overnight. DNA lysis solution was added with 1 ml 5 M NaCl, followed by phenol/chloroform extraction and ethanol precipitation. DNA pellets were suspended in TE buffer.
Fibroblasts grown in culture dishes were lysed in 10 mM Tris–HCl, 10 mM EDTA, 150 mM NaCl, pH 8.0 containing 0.5% SDS and proteinase K at 55 °C overnight, followed by phenol/chloroform extraction and ethanol precipitation. DNA pellets were suspended in TE buffer.
DNA library preparation
DNA libraries were prepared using a ligation sequencing kit (Oxford Nanopore Technologies, SQK-LSK109). To generate Cas9 ribonucleoprotein complexes (RNPs), annealed 1 μM tracrRNA-crRNA pool (CR1/CR2/CR3/CR4) and 0.5 μM HiFi Cas9 were incubated at room temperature (around 23 °C) for 30 min. Genomic DNA (2 μg) was dephosphorylated with Quick Calf Intestinal Phosphatase (NEB, #M0525S) at 37 °C for 10 min, followed by 80 °C for 2 min. For Cas9 RNP cleavage and dA-tailing, dephosphorylated genomic DNA samples were treated with Cas9 RNPs, Taq polymerase (NEB, #M0273S), and dATP (NEB, #N0440S) at 37 °C for 30 min, followed by 72 °C for 5 min. For native barcode ligation, native barcoding expansion 1–12 (Oxford Nanopore Technologies, EXP-NBD104) were ligated to cleaved and dA-tailed genomic DNAs using Blunt/TA Ligase Master Mix (NEB, #M0367L) at room temperature for 10 min, followed by purification with Agencourt AMPure XP Beads (Beckman Coulter, #A63880) on a magnet. AMII adapters were ligated to barcoded genomic DNA using Quick T4 DNA ligase (NEB, #E7185A) at room temperature for 10 min, followed by purification with AMPure XP Beads on a magnet. The DNA library from Cas9-targeted native barcoding was primed into a MinION Flow Cell (FLO-MIN106D) on a MinION Mk1C and sequencing was performed for 20–21 h.
The crRNA design tool, CHOPCHOP [
17], was used to design crRNAs, which were synthesized by Integrated DNA Technologies as follows: CR1, 5′gataccgacagcaatagtcc3′; CR2, 5′gtccttcagcactccacatc3′; CR3, 5′ctataggatccacagggagg3′; and CR4, 5′tgtcaaggtttggcttatag3′.
Data analysis
Bases were called from Fast5 files using Guppy to generate Fastq files. Alignment to the reference sequence, which contains 10 D4Z4 RUs and flanking sequences from 3950 bp upstream of CR1 to 251 bp downstream of CR4, was conducted using Minimap2. Reference sequences were constructed using SnapGene software (from Insightful Science; available at snapgene.com). For DNA methylation analysis, sense- and antisense-strand reads from the 4qA and 10q loci were re-aligned to the corresponding reference sequences and then Nanopolish was performed [
18]. Reference sequences contained the detected size of D4Z4 RUs and flanking sequences from 327 bp downstream of CR2 to 1 bp upstream of CR3. Unipro UGENE free software and Integrative genomics viewer were used for sequence alignment [
19,
20]. For analysis of correlation between the distal D4Z4 CpG methylation rate and clinical symptoms, we calculated mean CpG methylation rates of the most distal D4Z4 RUs (RU3, RU2, and the promoter region of RU1) for all 4qA-reads obtained from each FSHD1 sample. Mean methylation rate or D4Z4 length, and age at disease onset or age at hospital inspection were analyzed and plotted with Graphpad Prism, and correlation coefficients were calculated by linear regression.
Discussion
In general, nCATS could be applicable to any other genetic disorders. In particular, it has an advantage on diagnosis of repeat-associated disorders, such as Huntington disease, spinal cerebellar ataxia, neuronal intranuclear inclusion disease, oculopharyngeal distal myopathy and others, in which the causative genetic variation cannot be identified by short read sequencing. In fact, it has been applied for analysis of some tandem repeat disorders, fragile X syndrome and myotonic dystrophy [
21‐
23]. Nanopore sequencing was previously applied for analysis of FSHD using a bacterial artificial chromosome clone containing 13 D4Z4 repeat units [
24].
In this study, we developed a direct sequencing system using nCATS to analyze clinical samples from patients with FSHD. Our method is more efficient and can collect more detailed information than conventional method. Conventional method for diagnosis of FSHD is carried out by multiple Southern blots for detection of the size of 4q-derived D4Z4 repeat and haplotyping 4q, and by bisulfite sequencing for measurement of the CpG methylation rate. In contrast, our method enables us to simultaneously identify the number and the methylation rate of D4Z4 repeat unit and the haplotype derived from 4qA. Our system has several advantages. First, long read sequencing can be applied to analysis of a similar DNA fragment size range to that detected by Southern blotting. Second, CRISPR/CAS9 enrichment allows barcoding sequencing of five samples simultaneously, saving time and cost. Third, single-molecule sequencing technology provides genetic information at the base level and can determine the number of RUs, even in samples that have mutated restriction enzyme sites, which prevent determination of RU number by the standard Southern blotting method. Finally, the nCATS system allows simultaneous detection of CpG methylation and D4Z4 RUs numbers, providing information about local epigenetic modification of D4Z4 repeats, due to the application of single-molecule sequencing of unamplified genomic DNA molecules derived from individual nuclei, without any bias.
Along with successful determination of D4Z4 RU numbers in patients, we also detected atypical rearrangement of D4Z4 repeats. As shown in Figs.
1D and
2, two reads of intermediate size had a 1.3 kb deletion in the most proximal D4Z4 RU, while p13E-11 was not deleted. This deletion is unlikely to be associated with the contraction of D4Z4 repeats in FSHD1, as the pathogenic alleles in FSHD1 usually maintain the intact RU structure, even when they contracted. Common atypical rearrangements found in individuals with FSHD1 have been reported, including D4Z4 proximally extended deletion (DPED1–7) alleles, which span 5.9–45.7 kb proximal to and within D4Z4, including p13E-11. In some DPED alleles, genetic elements, such as
DUX4C,
FRG2,
DBE-T, and myogenic enhancers, are deleted, suggesting that their role in FSHD pathogenesis requires reevaluation [
25].
The most important finding in our study was detection of DNA methylation rates across entire contracted and normal expanded D4Z4 repeat sequences from the 4qA and 10q loci. As shown in Fig.
3, 4qA-derived contracted reads were uniformly hypomethylated in patients with FSHD1, while both 4qA- and 10q-derived reads were uniformly hypomethylated in FSHD2, with the exception of a few reads. These results are similar to those generated in previous studies by Southern blot and bisulfite sequencing analyses [
10,
13,
14], but our approach allows assessment of focal methylation rate at the nucleotide level. We further analyzed 10q-derived reads in FSHD1, and found that the methylation level was lower at proximal D4Z4 RUs (position 8–13), while it gradually increased (up to ≥ 60%) at distal RUs (positions 1–7). Given the mimicry of normal expanded 4qA-D4Z4 repeats by 10q-derived reads, these results suggest that only DNA hypermethylation at distal D4Z4 RUs contributes to suppression of the
DUX4 gene in the normal 4qA allele, while contraction of D4Z4 repeats causes hypomethylation of distal D4Z4 similar to proximal D4Z4 in the 10q locus, leading to DUX4 expression and consequent development of FSHD1. Indeed, mean CpG methylation rate of the most distal RUs and disease onset in patients was well-correlated. A larger study of the relationships among methylation rate, D4Z4 contraction, and clinical phenotypes is needed. To this end, we aim to overcome the limitation of decreased acquisition of sequencing reads from alleles with more than 10 RUs.
Limitations
The nCATS method has limitations. First, the number of sequencing reads containing mildly contracted D4Z4 repeats (11–13 RUs) detected was quite low, particularly as only a few reads were obtained from the normal 10q locus, and no reads were obtained from some samples. The reasons why we could not obtain read from chromosome 10 in all samples and the number of reads in various samples are different are; (1) the difficulty to purify intact high molecular weight DNA, because the longer DNA might tend to be subject to degradation, (2) the difficulty to obtain longer DNA fragments beyond 13 RUs, because we used only the reads harboring full-length D4Z4 repeat in our analysis, (3) the efficacy of CAS9 cleavage of hypermethylated DNA, because distal D4Z4 were extremely higher methylation rates. Technical improvements in terms of preparation of genomic DNAs are required to overcome this shortcoming. Second, our method does not isolate reads derived from 4qB. Although the lack of analysis on 4qB is not likely to affect our conclusion, the epigenetic status in 4qB could be meaningful information as reference data for methylation rate of 4qA-derived D4Z4.
Conclusions
In this study, we successfully determined the hypomethylation of D4Z4 RUs in individual 4qA fragments in FSHD. The hypomethylation in the contracted D4Z4 in FSHD1 provides a good explanation why the shortening of D4Z4 repeats is associated with severe phenotypes in patients and it induces abnormal DUX4 expression which leads to developing FSHD. For a further improvement, we need to have a large cohort of patients and controls in the future, which might give a clue for complete understanding of the pathomechanism of FSHD.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.