Skip to main content
Erschienen in: BMC Proceedings 7/2016

Open Access 01.10.2016 | Proceedings

Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data

verfasst von: Elizabeth Held, Joshua Cape, Nathan Tintle

Erschienen in: BMC Proceedings | Sonderheft 7/2016

Einloggen, um Zugang zu erhalten

Abstract

Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Literatur
1.
Zurück zum Zitat Dasgupta A, Sun YV, König IR, Bailey-Wilson JE, Malley JD. Brief review of regression-based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience. Genet Epidemiol. 2011;35 Suppl 1:S5–S11.CrossRefPubMedPubMedCentral Dasgupta A, Sun YV, König IR, Bailey-Wilson JE, Malley JD. Brief review of regression-based and machine learning methods in genetic epidemiology: the Genetic Analysis Workshop 17 experience. Genet Epidemiol. 2011;35 Suppl 1:S5–S11.CrossRefPubMedPubMedCentral
2.
Zurück zum Zitat Lu AT, Austin E, Bonner A, Huang HH, Cantor RM. Applications of machine learning and data mining methods to detect associations of rare and common variants with complex traits. Genet Epidemiol. 2014;38 Suppl 1:S81–5.CrossRefPubMed Lu AT, Austin E, Bonner A, Huang HH, Cantor RM. Applications of machine learning and data mining methods to detect associations of rare and common variants with complex traits. Genet Epidemiol. 2014;38 Suppl 1:S81–5.CrossRefPubMed
3.
Zurück zum Zitat Huang HH, Xu T, Yang J. Comparing logistic regression, support vector machines, and permanental classification methods in predicting hypertension. BMC Proc. 2014;8 Suppl 1:S96.CrossRefPubMedPubMedCentral Huang HH, Xu T, Yang J. Comparing logistic regression, support vector machines, and permanental classification methods in predicting hypertension. BMC Proc. 2014;8 Suppl 1:S96.CrossRefPubMedPubMedCentral
4.
Zurück zum Zitat Huang YT, Vanderweele TJ, Lin X. Joint analysis of SNP and gene expression data in genetic association studies of complex diseases. Ann Appl Stat. 2014;8(1):352–76.CrossRefPubMedPubMedCentral Huang YT, Vanderweele TJ, Lin X. Joint analysis of SNP and gene expression data in genetic association studies of complex diseases. Ann Appl Stat. 2014;8(1):352–76.CrossRefPubMedPubMedCentral
5.
Zurück zum Zitat Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83(3):311–21.CrossRefPubMedPubMedCentral Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83(3):311–21.CrossRefPubMedPubMedCentral
7.
Zurück zum Zitat Akbani R, Swek S, Japkowicz N. Applying support vector machines to imbalanced data. In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D, editors. Machine Learning ECML 2004: 15th European Conference on Machine Learning, Pisa, Italy, September 20–24, 2004. Berlin: Springer, Heidelberg; 2004. p. 39–50. doi:10.1007/b100702.CrossRef Akbani R, Swek S, Japkowicz N. Applying support vector machines to imbalanced data. In: Boulicaut J-F, Esposito F, Giannotti F, Pedreschi D, editors. Machine Learning ECML 2004: 15th European Conference on Machine Learning, Pisa, Italy, September 20–24, 2004. Berlin: Springer, Heidelberg; 2004. p. 39–50. doi:10.​1007/​b100702.CrossRef
Metadaten
Titel
Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data
verfasst von
Elizabeth Held
Joshua Cape
Nathan Tintle
Publikationsdatum
01.10.2016
Verlag
BioMed Central
Erschienen in
BMC Proceedings / Ausgabe Sonderheft 7/2016
Elektronische ISSN: 1753-6561
DOI
https://doi.org/10.1186/s12919-016-0020-2

Weitere Artikel der Sonderheft 7/2016

BMC Proceedings 7/2016 Zur Ausgabe