The online version of this article (https://doi.org/10.1186/s12883-017-1010-3) contains supplementary material, which is available to authorized users.
Alzheimer’s disease (AD) is an important, progressive neurodegenerative disease, with a complex genetic architecture. A key goal of biomedical research is to seek out disease risk genes, and to elucidate the function of these risk genes in the development of disease. For this purpose, expanding the AD-associated gene set is necessary. In past research, the prediction methods for AD related genes has been limited in their exploration of the target genome regions. We here present a genome-wide method for AD candidate genes predictions.
We present a machine learning approach (SVM), based upon integrating gene expression data with human brain-specific gene network data, to discover the full spectrum of AD genes across the whole genome.
We classified AD candidate genes with an accuracy and the area under the receiver operating characteristic (ROC) curve of 84.56% and 94%. Our approach provides a supplement for the spectrum of AD-associated genes extracted from more than 20,000 genes in a genome wide scale.
In this study, we have elucidated the whole-genome spectrum of AD, using a machine learning approach. Through this method, we expect for the candidate gene catalogue to provide a more comprehensive annotation of AD for researchers.
Additional file 1: Figure S1. The correct rates of different non-associated gene sets in SVM training. We randomly selected the dataset of 335 non-associated genes (n = 100) for SVM training. (TIFF 12473 kb)12883_2017_1010_MOESM1_ESM.tif
Additional file 2: 335 AD-associated genes. The datasets collected from public Alzheimer’s disease databases (AlzGene) and the publications treating upon AD. (XLSX 14 kb)12883_2017_1010_MOESM2_ESM.xlsx
Additional file 3: The classification of AD-associated genes and AD-non associated genes. C1-AD: probable pathogenic genes; C2-AD: high confidence genes; C3-AD: related genes; C4-AD: possibly associated genes; C5-AD: AD-non associated genes. (XLSX 19 kb)12883_2017_1010_MOESM3_ESM.xlsx
Additional file 4: 832 AD predicted genes. A total number of AD predicted genes across the whole genome in our study. (XLSX 68 kb)12883_2017_1010_MOESM4_ESM.xlsx
Karni S, Soreq H, Sharan R. A network-based method for predicting disease-causing genes. Journal of computational biology : a journal of computational molecular cell biology. 2009;16(2):181–9. CrossRef
Li M, Zhang J, Liu Q, Wang J, Wu FX. Prediction of disease-related genes based on weighted tissue-specific networks by using DNA methylation. BMC Med Genet. 2014;7(Suppl 2):S4.
Ochagavia ME, Miranda J, Nazabal M, Martin A, Novoa LI, Bringas R, Fernandez DECJ, Camacho H. A methodology based on molecular interactions and pathways to find candidate genes associated to diseases: its application to schizophrenia and Alzheimer's disease. J Bioinforma Comput Biol. 2011;9(4):541–57. CrossRef
Chen JJ, Roberson PK, Schell MJ. The false discovery rate: a key concept in large-scale genetic studies. Cancer control : journal of the Moffitt Cancer Center. 2010;17(1):58–62. CrossRef
Chen JA, Wang Q, Davis-Turak J, Li Y, Karydas AM, Hsu SC, Sears RL, Chatzopoulou D, Huang AY, Wojta KJ, et al. A multiancestral genome-wide exome array study of Alzheimer disease, frontotemporal dementia, and progressive supranuclear palsy. JAMA neurology. 2015;72(4):414–22. CrossRefPubMedPubMedCentral
Aken BL, Ayling S, Barrell D, Clarke L, Curwen V, Fairley S, Fernandez Banet J, Billis K, Garcia Giron C, Hourlier T et al. The Ensembl gene annotation system. Database : the journal of biological databases and curation. 2016;2016:baw093.
UniProt C. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43(Database issue):D204–12.
Xiao Q, Liu Z-J, Tao S, Sun Y-M, Jiang D, Li H-L, Chen H, Liu X, Lapin B, Wang C-H. Risk prediction for sporadic Alzheimer's disease using genetic risk score in the Han Chinese population. Oncotarget. 2015;6(35):36955.
Malishkevich A, Marshall GA, Schultz AP, Sperling RA, Aharon-Peretz J, Gozes I. Blood-borne activity-dependent neuroprotective protein (ADNP) is correlated with premorbid intelligence, clinical stage, and Alzheimer’s disease biomarkers. J Alzheimers Dis. 2016;50(1):249-60.
Zheng X, Demirci F, Barmada M, Richardson G, Lopez O, Sweet R, Kamboh M, Feingold E. Genome-wide copy-number variation study of psychosis in Alzheimer’s disease. Transl Psychiatry. 2015;5(6):e574.
Marchesi VT. Gain-of-function somatic mutations contribute to inflammation and blood vessel damage that lead to Alzheimer dementia: a hypothesis. FASEB J. 2016;30(2):503-6.
- Revealing Alzheimer’s disease genes spectrum in the whole-genome by machine learning
Laurent Christian Asker M. Tellier
- BioMed Central