The online version of this article (https://doi.org/10.1186/s13024-018-0274-4) contains supplementary material, which is available to authorized users.
Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 ‘GGGGCC’ (G4C2) repeat that causes approximately 5–7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences’ (PacBio) and Oxford Nanopore Technologies’ (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9.
Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinION was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8× coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained > 800× coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual’s repeat region was > 99% G4C2 content, though we cannot rule out small interruptions.
Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.
Additional file 1: Figure S1. Repeat-containing plasmids have a large repeat size distribution. Figure S2. Sanger sequencing confirms the SCA36 repeat plasmid contains at least 37 repeats. Figure S3. Individual Sequel and MinION read(s) across the C9orf72 repeat region aligned to the reference genome and hand curated. Data S1. RS II consensus sequence in the C9-774 repeat region is 99.77% accurate, when compared to the plasmid reference sequence. Data S2. MinION consensus sequence in the C9-774 repeat region is 26.55% accurate, when compared to the plasmid reference sequence. Data S3. MinION read covering the non-pathogenic allele. Data S4. Sequel read covering the non-pathogenic allele. Data S5. Sequel read that did not extend through repeat, but contains approximately 30 repeats. Data S6. Sequel read covering approximately 69 repeats. Data S7. Sequel read that did not extend through the repeat, but contains approximately 912 repeats. Data S8. Sequel read covering 1324-repeat allele. Data S9. PacBio Sequel consensus sequence by Long Amplicon Analysis (LAA2). Data S10. Plasmid backbones used to separate reads. Data S11. Sequences adjacent to the C9orf72 repeat region used to identify on-target reads from the PacBio No-Amp Targeted Sequencing approach. (DOCX 3982 kb)
Kieleczawa J. Fundamentals of sequencing of difficult templates—an overview. J Biomol Tech JBT. 2006;17:207–17. PubMed
Zhao X, Haqqi T, Yadav SP. Sequencing telomeric DNA template with short tandem repeats using dye terminator cycle sequencing. J Biomol Tech JBT. 2000;11:111–21. PubMed
van Blitterswijk M, DeJesus-Hernandez M, Niemantsverdriet E, Murray ME, Heckman MG, Diehl NN, et al. Association between repeat sizes and clinical and pathological characteristics in carriers of C9ORF72 repeat expansions (Xpansize-72): a cross-sectional cohort study. Lancet Neurol. 2013;12:978–88. CrossRefPubMed
Kraus-Perrotta C, Lagalwar S. Expansion, mosaicism and interruption: mechanisms of the CAG repeat mutation in spinocerebellar ataxia type 1. Cerebellum Ataxias [Internet]. 2016;3. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5118900/
Stolle CA, Frackelton EC, McCallum J, Farmer JM, Tsou A, Wilson RB, et al. Novel, complex interruptions of the GAA repeat in small, expanded alleles of two affected siblings with late-onset Friedreich ataxia. Mov Disord Off J Mov Disord Soc. 2008;23:1303–6. CrossRef
Fratta P, Mizielinska S, Nicoll AJ, Zloh M, Fisher EMC, Parkinson G, et al. C9orf72 hexanucleotide repeat associated with amyotrophic lateral sclerosis and frontotemporal dementia forms RNA G-quadruplexes. Sci Rep. 2012;2:srep01016. CrossRef
Tsai Y-C, Greenberg D, Powell J, Hoijer I, Ameur A, Strahl M, et al. Amplification-free, CRISPR-Cas9 Targeted Enrichment and SMRT Sequencing of Repeat-Expansion Disease Causative Genomic Regions. bioRxiv. 2017; 203919.
No-Amp Targeted Sequencing [Internet]. PacBio. [cited 2018 May 30]. Available from: https://www.pacb.com/applications/targeted-sequencing/no-amp-targeted-sequencing/
Avvaru AK, Sowpati DT, Mishra RK. PERF: An Exhaustive Algorithm for Ultra-Fast and Efficient Identification of Microsatellites from Large DNA Sequences. Bioinformatics [Internet]. [cited 2017 Nov 22]; Available from: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btx721/4600186
Levenshtein VI. Binary codes capable of correcting deletions, Insertions and Reversals. Sov Phys Dokl. 1966;10:707.
Zhang Y-J, Gendron TF, Ebbert MTW, O’Raw AD, Yue M, Jansen-West K, et al. Poly(GR) impairs protein translation and stress granule dynamics in C9orf72 -associated frontotemporal dementia and amyotrophic lateral sclerosis. Nat Med. 2018;1. https://doi.org/10.1038/s41591-018-0071-1.
Vance C, Al-Chalabi A, Ruddy D, Smith BN, Hu X, Sreedharan J, et al. Familial amyotrophic lateral sclerosis with frontotemporal dementia is linked to a locus on chromosome 9p13.2–21.3. Brain J Neurol. 2006;129:868–76. CrossRef
Lodé L, Ameur A, Coste T, Ménard A, Richebourg S, Gaillard JB, et al. Single-molecule DNA sequencing of acute myeloid leukemia and myelodysplastic syndromes with multiple TP53 alterations. Haematologica. 2017;103(1):e13–e16. https://doi.org/10.3324/haematol.2017.176719.
Suh E, Lee EB, Neal D, Wood EM, Toledo JB, Rennert L, et al. Semi-automated quantification of C9orf72 expansion size reveals inverse correlation between hexanucleotide repeat number and disease duration in frontotemporal degeneration. Acta Neuropathol (Berl). 2015;130:363–72. CrossRef
Dolzhenko E, van Vugt JJFA, Shaw RJ, Bekritsky MA, van Blitterswijk M, Narzisi G, et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. 2017;27(11):1895–903. https://doi.org/10.1101/gr.225672.117.
Dashnow H, Lek M, Phipson B, Halman A, Davis M, Lamont P, et al. STRetch: detecting and discovering pathogenic short tandem repeats expansions. bioRxiv. 2017. https://doi.org/10.1101/159228.
Sović I, Šikić M, Wilm A, Fenlon SN, Chen S, Nagarajan N. Fast and sensitive mapping of nanopore sequencing reads with GraphMap. Nat Commun. 2016;7:ncomms11307. CrossRef
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer [Internet]. Nat. Biotechnol. 2011 [cited 2018 May 23]. Available from: https://www.nature.com/articles/nbt.1754
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinforma Oxf Engl. 2009;25:2078–9. CrossRef
R Development Core Team. R: a language and environment for statistical computing [internet]. Vienna, Austria: R Foundation for statistical. Computing. 2011; Available from: http://www.R-project.org
ggplot2 - Elegant Graphics for Data Analysis | Hadley Wickham | Springer [Internet]. [cited 2017 Sep 15]. Available from: http://www.springer.com/us/book/9780387981413.
- Long-read sequencing across the C9orf72 ‘GGGGCC’ repeat expansion: implications for clinical use and genetic discovery efforts in human disease
Mark T. W. Ebbert
Stefan L. Farrugia
Jonathon P. Sens
Tania F. Gendron
Ian J. McLaughlin
Patricia H. Brown
Dennis W. Dickson
Marka van Blitterswijk
John D. Fryer
- BioMed Central