Skip to main content
Erschienen in: Health Services and Outcomes Research Methodology 4/2022

06.06.2022

A two-stage super learner for healthcare expenditures

verfasst von: Ziyue Wu, Seth A. Berkowitz, Patrick J. Heagerty, David Benkeser

Erschienen in: Health Services and Outcomes Research Methodology | Ausgabe 4/2022

Einloggen, um Zugang zu erhalten

Abstract

To improve the estimation of healthcare expenditures by introducing a novel method that is well-suited to situations where data exhibit strong skewness and zero-inflation. Simulations, and two real-world datasets: the 2016–2017 Medical Expenditure Panel Survey; the Back Pain Outcomes using Longitudinal Data. Super learner is an ensemble machine learning approach that can combine several algorithms to improve estimation. We propose a two-stage super learner that is well suited for healthcare expenditure data by separately estimating the probability of any healthcare expenditure and the mean amount of healthcare expenditure conditional on having healthcare expenditures. These estimates can then be combined to yield a single estimate of expenditures for each observation. The analytical strategy can flexibly incorporate a range of individual estimation approaches for each stage of estimation, including both regression-based approaches and machine learning algorithms such as random forests. We compare the performance of the two-stage super learner with a one-stage super learner, and with multiple individual algorithms for estimation of healthcare cost under a broad range of data settings in simulated and real data. The predictive performance was compared using Mean Squared Error and R2. Our results indicate that the two-stage super learner has better performance compared with a one-stage super learner and individual algorithms, for healthcare cost estimation under a wide variety of settings in simulations and in empirical analyses. The improvement of the two-stage super learner over the one-stage super learner was particularly evident in settings when zero-inflation is high.
Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Bergquist, S.L., Brooks, G.A., Keating, N.L., Landrum, M.B., Rose, S.: Classifying lung cancer severity with ensemble machine learning in health care claims data. Proc. Mach. Learn. Res. 68, 25–38 (2017)PubMedPubMedCentral Bergquist, S.L., Brooks, G.A., Keating, N.L., Landrum, M.B., Rose, S.: Classifying lung cancer severity with ensemble machine learning in health care claims data. Proc. Mach. Learn. Res. 68, 25–38 (2017)PubMedPubMedCentral
Zurück zum Zitat Glass, K.P., Anderson, J.R.: Relative value units: from A to Z (Part I of IV). J. Med. Pract. Manage. 17(5), 225–228 (2002)PubMed Glass, K.P., Anderson, J.R.: Relative value units: from A to Z (Part I of IV). J. Med. Pract. Manage. 17(5), 225–228 (2002)PubMed
Zurück zum Zitat James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: with Applications in R. Springer, Berlin (2013)CrossRef James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: with Applications in R. Springer, Berlin (2013)CrossRef
Zurück zum Zitat Jarvik, J.G., Comstock, B.A., Bresnahan, B.W., Nedeljkovic, S.S., Nerenz, D.R., Bauer, Z., Avins, A.L., James, K., Turner, J.A., Heagerty, P., Kessler, L., Friedly, J.L., Sullivan, S.D., Deyo, R.A.: Study protocol: the back pain outcomes using longitudinal data (BOLD) registry. BMC Musculoskelet Disord. 3(13), 64 (2012). https://doi.org/10.1186/1471-2474-13-64CrossRef Jarvik, J.G., Comstock, B.A., Bresnahan, B.W., Nedeljkovic, S.S., Nerenz, D.R., Bauer, Z., Avins, A.L., James, K., Turner, J.A., Heagerty, P., Kessler, L., Friedly, J.L., Sullivan, S.D., Deyo, R.A.: Study protocol: the back pain outcomes using longitudinal data (BOLD) registry. BMC Musculoskelet Disord. 3(13), 64 (2012). https://​doi.​org/​10.​1186/​1471-2474-13-64CrossRef
Zurück zum Zitat Kessler, R.C., Rose, S., Koenen, K.C., Karam, E.G., Stang, P.E., Stein, D.J., Heeringa, S.G., Hill, E.D., Liberzon, I., McLaughlin, K.A., McLean, S.A., Pennell, B.E., Petukhova, M., Rosellini, A.J., Ruscio, A.M., Shahly, V., Shalev, A.Y., Silove, D., Zaslavsky, A.M., Angermeyer, M.C., Bromet, E.J., de Almeida, J.M., de Girolamo, G., de Jonge, P., Demyttenaere, K., Florescu, S.E., Gureje, O., Haro, J.M., Hinkov, H., Kawakami, N., Kovess-Masfety, V., Lee, S., Medina-Mora, M.E., Murphy, S.D., Navarro-Mateu, F., Piazza, M., Posada-Villa, J., Scott, K., Torres, Y., Carmen, V.M.: How well can post-traumatic stress disorder be predicted from pre-trauma risk factors? An exploratory study in the WHO World Mental Health Surveys. World Psychiatry 13(3), 265–274 (2014). https://doi.org/10.1002/wps.20150CrossRefPubMedPubMedCentral Kessler, R.C., Rose, S., Koenen, K.C., Karam, E.G., Stang, P.E., Stein, D.J., Heeringa, S.G., Hill, E.D., Liberzon, I., McLaughlin, K.A., McLean, S.A., Pennell, B.E., Petukhova, M., Rosellini, A.J., Ruscio, A.M., Shahly, V., Shalev, A.Y., Silove, D., Zaslavsky, A.M., Angermeyer, M.C., Bromet, E.J., de Almeida, J.M., de Girolamo, G., de Jonge, P., Demyttenaere, K., Florescu, S.E., Gureje, O., Haro, J.M., Hinkov, H., Kawakami, N., Kovess-Masfety, V., Lee, S., Medina-Mora, M.E., Murphy, S.D., Navarro-Mateu, F., Piazza, M., Posada-Villa, J., Scott, K., Torres, Y., Carmen, V.M.: How well can post-traumatic stress disorder be predicted from pre-trauma risk factors? An exploratory study in the WHO World Mental Health Surveys. World Psychiatry 13(3), 265–274 (2014). https://​doi.​org/​10.​1002/​wps.​20150CrossRefPubMedPubMedCentral
Zurück zum Zitat Morid, M.A., Kawamoto, K., Ault, T., Dorius, J., Abdelrahman, S.: Supervised learning methods for predicting healthcare costs: systematic literature review and empirical evaluation. AMIA Annu. Symp. Proc. (2017) 2018:1312–1321. [PubMed: 29854200] Morid, M.A., Kawamoto, K., Ault, T., Dorius, J., Abdelrahman, S.: Supervised learning methods for predicting healthcare costs: systematic literature review and empirical evaluation. AMIA Annu. Symp. Proc. (2017) 2018:1312–1321. [PubMed: 29854200]
Zurück zum Zitat van der Laan, M.J., Dudoit, S.: Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 130 (2003). https://biostats.bepress.com/ucbbiostat/paper130. van der Laan, M.J., Dudoit, S.: Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 130 (2003). https://​biostats.​bepress.​com/​ucbbiostat/​paper130.
Zurück zum Zitat Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). (ISBN 0-387-95457-0) Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). (ISBN 0-387-95457-0)
Metadaten
Titel
A two-stage super learner for healthcare expenditures
verfasst von
Ziyue Wu
Seth A. Berkowitz
Patrick J. Heagerty
David Benkeser
Publikationsdatum
06.06.2022
Verlag
Springer US
Erschienen in
Health Services and Outcomes Research Methodology / Ausgabe 4/2022
Print ISSN: 1387-3741
Elektronische ISSN: 1572-9400
DOI
https://doi.org/10.1007/s10742-022-00275-x

Weitere Artikel der Ausgabe 4/2022

Health Services and Outcomes Research Methodology 4/2022 Zur Ausgabe