nach oben

Health Services and Outcomes Research Methodology

Erschienen in:

06.06.2022

A two-stage super learner for healthcare expenditures

verfasst von: Ziyue Wu, Seth A. Berkowitz, Patrick J. Heagerty, David Benkeser

Erschienen in: Health Services and Outcomes Research Methodology | Ausgabe 4/2022

Einloggen, um Zugang zu erhalten

Abstract

To improve the estimation of healthcare expenditures by introducing a novel method that is well-suited to situations where data exhibit strong skewness and zero-inflation. Simulations, and two real-world datasets: the 2016–2017 Medical Expenditure Panel Survey; the Back Pain Outcomes using Longitudinal Data. Super learner is an ensemble machine learning approach that can combine several algorithms to improve estimation. We propose a two-stage super learner that is well suited for healthcare expenditure data by separately estimating the probability of any healthcare expenditure and the mean amount of healthcare expenditure conditional on having healthcare expenditures. These estimates can then be combined to yield a single estimate of expenditures for each observation. The analytical strategy can flexibly incorporate a range of individual estimation approaches for each stage of estimation, including both regression-based approaches and machine learning algorithms such as random forests. We compare the performance of the two-stage super learner with a one-stage super learner, and with multiple individual algorithms for estimation of healthcare cost under a broad range of data settings in simulated and real data. The predictive performance was compared using Mean Squared Error and R². Our results indicate that the two-stage super learner has better performance compared with a one-stage super learner and individual algorithms, for healthcare cost estimation under a wide variety of settings in simulations and in empirical analyses. The improvement of the two-stage super learner over the one-stage super learner was particularly evident in settings when zero-inflation is high.

Nur mit Berechtigung zugänglich

Basu, A., Manning, W.G.: Issues for the next generation of health care cost analyses. Med. Care. 47(7 Suppl 1), S109–S114 (2009). https://doi.org/10.1097/MLR.0b013e31819c94a1CrossRefPubMed

Basu, A., Rathouz, P.J.: Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics 6(1), 93–109 (2005). https://doi.org/10.1093/biostatistics/kxh020CrossRefPubMed

Benkeser, D., Cai, W., van der Laan, M.J.: Rejoinder: a nonparametric superefficient estimator of the average treatment effect. Stat. Sci. 35(3), 511–517 (2020a). https://doi.org/10.1214/20-STS789CrossRef

Benkeser, D., Petersen, M., van der Laan, M.J.: Improved small-sample estimation of nonlinear cross-validated prediction metrics. J. Am. Stat. Assoc. 115(532), 1917–1932 (2020b). https://doi.org/10.1080/01621459.2019.1668794CrossRefPubMed

Bergquist, S.L., Brooks, G.A., Keating, N.L., Landrum, M.B., Rose, S.: Classifying lung cancer severity with ensemble machine learning in health care claims data. Proc. Mach. Learn. Res. 68, 25–38 (2017)PubMedPubMedCentral

Berk, M.L., Monheit, A.C.: The concentration of health care expenditures, revisited. Health Aff. (millwood). 20(2), 9–18 (2001). https://doi.org/10.1377/hlthaff.20.2.9CrossRefPubMed

Blough, D.K., Madden, C.W., Hornbrook, M.C.: Modeling risk using generalized linear models. J. Health Econ. 18(2), 153–171 (1999). https://doi.org/10.1016/s0167-6296(98)00032-0CrossRefPubMed

Breiman, L.: Stacked regressions. Mach. Learn. 24, 49–64 (1996). https://doi.org/10.1007/BF00117832CrossRef

Cawley, J., Meyerhoefer, C.: The medical care costs of obesity: an instrumental variables approach. J. Health Econ. 31(1), 219–230 (2012). https://doi.org/10.1016/j.jhealeco.2011.10.003CrossRefPubMed

Cohen, J.W., Cohen, S.B., Banthin, J.S.: The medical expenditure panel survey: a national information resource to support healthcare cost research and inform policy and practice. Med. Care. 47(7 Suppl 1), S44-50 (2009). https://doi.org/10.1097/MLR.0b013e3181a23e3aCrossRefPubMed

Cohen, S.B.: Design strategies and innovations in the medical expenditure panel survey. Med. Care. 41(7 Suppl), III5–III12 (2003). https://doi.org/10.1097/01.MLR.0000076048.11549.71CrossRefPubMed

Cox, D.R.: Regression models and life-tables. J. R. Stat. Soc. Series B Stat Methodol. 34, 187–202 (1972). https://doi.org/10.1111/j.2517-6161.1972.tb00899.xCrossRef

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F. e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. R package version 1.7–9 (2021). https://CRAN.R-project.org/package=e1071

Deb, P., Norton, E.C.: Modeling health care expenditures and use. Annu. Rev. Public Health. 1(39), 489–505 (2018). https://doi.org/10.1146/annurev-publhealth-040617-013517CrossRef

Duan, N.: Smearing Estimate: a Nonparametric Retransformation Method. J. Am. Stat. Assoc. 78(383), 605–610 (1983). https://doi.org/10.1080/01621459.1983.10478017CrossRef

Finkelstein, E.A., Trogdon, J.G., Cohen, J.W., Dietz, W.: Annual medical spending attributable to obesity: payer-and service-specific estimates. Health Aff. (millwood). 28(5), w822–w831 (2009). https://doi.org/10.1377/hlthaff.28.5.w822CrossRefPubMed

Gilleskie, D.B., Mroz, T.A.: A flexible approach for estimating the effects of covariates on health expenditures. J. Health Econ. 23(2), 391–418 (2004). https://doi.org/10.1016/j.jhealeco.2003.09.008CrossRefPubMed

Glass, K.P., Anderson, J.R.: Relative value units: from A to Z (Part I of IV). J. Med. Pract. Manage. 17(5), 225–228 (2002)PubMed

Gregori, D., Petrinco, M., Bo, S., Desideri, A., Merletti, F., Pagano, E.: Regression models for analyzing costs and their determinants in health care: an introductory review. Int. J. Qual. Health Care. 23(3), 331–341 (2011). https://doi.org/10.1093/intqhc/mzr010CrossRefPubMed

James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: with Applications in R. Springer, Berlin (2013)CrossRef

Jarvik, J.G., Comstock, B.A., Bresnahan, B.W., Nedeljkovic, S.S., Nerenz, D.R., Bauer, Z., Avins, A.L., James, K., Turner, J.A., Heagerty, P., Kessler, L., Friedly, J.L., Sullivan, S.D., Deyo, R.A.: Study protocol: the back pain outcomes using longitudinal data (BOLD) registry. BMC Musculoskelet Disord. 3(13), 64 (2012). https://doi.org/10.1186/1471-2474-13-64CrossRef

Jones, A.M.: Models for Health Care. HEDG, c/o Department of Economics, University of York, Health, Econometrics and Data Group (HEDG) Working Papers. (2010). https://doi.org/10.1093/oxfordhb/9780195398649.013.0024

Ju, C., Combs, M., Lendle, S.D., Franklin, J.M., Wyss, R., Schneeweiss, S., van der Laan, M.J.: Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods. J. Appl. Stat. 46(12), 2216–2236 (2019). https://doi.org/10.1080/02664763.2019.1582614CrossRefPubMed

Kessler, R.C., Rose, S., Koenen, K.C., Karam, E.G., Stang, P.E., Stein, D.J., Heeringa, S.G., Hill, E.D., Liberzon, I., McLaughlin, K.A., McLean, S.A., Pennell, B.E., Petukhova, M., Rosellini, A.J., Ruscio, A.M., Shahly, V., Shalev, A.Y., Silove, D., Zaslavsky, A.M., Angermeyer, M.C., Bromet, E.J., de Almeida, J.M., de Girolamo, G., de Jonge, P., Demyttenaere, K., Florescu, S.E., Gureje, O., Haro, J.M., Hinkov, H., Kawakami, N., Kovess-Masfety, V., Lee, S., Medina-Mora, M.E., Murphy, S.D., Navarro-Mateu, F., Piazza, M., Posada-Villa, J., Scott, K., Torres, Y., Carmen, V.M.: How well can post-traumatic stress disorder be predicted from pre-trauma risk factors? An exploratory study in the WHO World Mental Health Surveys. World Psychiatry 13(3), 265–274 (2014). https://doi.org/10.1002/wps.20150CrossRefPubMedPubMedCentral

Lê Cook, B., McGuire, T.G., Lock, K., Zaslavsky, A.M.: Comparing methods of racial and ethnic disparities measurement across different settings of mental health care. Health Serv. Res. 45(3), 825–847 (2010). https://doi.org/10.1111/j.1475-6773.2010.01100.xCrossRefPubMedPubMedCentral

LeBlanc, M., Tibshirani, R.: Combining estiamates in regression and classification. J. Am. Stat. Assoc. 91(436), 1641–1650 (1996). https://doi.org/10.2307/2291591CrossRef

Manning, W.G., Basu, A., Mullahy, J.: Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ. 24(3), 465–488 (2005). https://doi.org/10.1016/j.jhealeco.2004.09.011CrossRefPubMed

Manning, W.G., Mullahy, J.: Estimating log models: to transform or not to transform? J. Health Econ. 20(4), 461–494 (2001). https://doi.org/10.1016/s0167-6296(01)00086-8CrossRefPubMed

Morid, M.A., Kawamoto, K., Ault, T., Dorius, J., Abdelrahman, S.: Supervised learning methods for predicting healthcare costs: systematic literature review and empirical evaluation. AMIA Annu. Symp. Proc. (2017) 2018:1312–1321. [PubMed: 29854200]

Mullahy, J.: Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17(3), 247–281 (1998). https://doi.org/10.1016/s0167-6296(98)00030-7CrossRefPubMed

Pirracchio, R., Petersen, M.L., Carone, M., Rigon, M.R., Chevret, S., van der Laan, M.J.: Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. Lancet Respir. Med. 3(1), 42–52 (2015). https://doi.org/10.1016/S2213-2600(14)70239-5CrossRefPubMed

Rose, S.: A machine learning framework for plan payment risk adjustment. Health Serv. Res. 51(6), 2358–2374 (2016). https://doi.org/10.1111/1475-6773.12464CrossRefPubMedPubMedCentral

Rose, S.: Mortality risk score prediction in an elderly population using machine learning. Am. J. Epidemiol. 177(5), 443–452 (2013). https://doi.org/10.1093/aje/kws241CrossRefPubMed

Shrestha, A., Bergquist, S., Montz, E., Rose, S.: Mental health risk adjustment with clinical categories and machine learning. Health Serv. Res. 53 Suppl 1(Suppl Suppl 1):3189–3206 (2018). https://doi.org/10.1111/1475-6773.12818

Chen, T., He, T., Benesty M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I., Zhou, T., Li, M., Xie, J., Lin, M., Geng, Y., Li, Y.: xgboost: Extreme Gradient Boosting. R package version 1.5.0.2 (2021). https://CRAN.R-project.org/package=xgboost

van der Laan, M.J., Dudoit, S., Van Der Vaart, A.W.: The cross-validated adaptive epsilon-net estimator. Stat. Decis. 24(3), 373–395 (2006). https://doi.org/10.1524/stnd.2006.24.3.373CrossRef

van der Laan, M.J., Dudoit, S.: Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 130 (2003). https://biostats.bepress.com/ucbbiostat/paper130.

van der Laan, M.J., Polley, E.C., Hubbard, A.E.: Super learner. Stat. Appl. Genet. Mol. Biol. 6, 25 (2007). https://doi.org/10.2202/1544-6115.1309

Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002). (ISBN 0-387-95457-0)

Wang, H.J., Zhou, X.H.: Estimation of the retransformed conditional mean in health care cost studies. Biometrika 97(1), 147–158 (2010). https://doi.org/10.1093/biomet/asp072CrossRef

Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992). https://doi.org/10.1016/S0893-6080(05)80023-1CrossRef

Yu, K., Lu, Z., Stander, J.: Quantile regression: applications and current research areas. J. R. Stat. Soc. Series D Stat. 52(3), 331–350 (2003). https://doi.org/10.1111/1467-9884.00363CrossRef

Zink, A., Rose, S.: Fair regression for health care spending. Biometrics 76(3), 973–982 (2020). https://doi.org/10.1111/biom.13206. ([PubMed: 31860120])CrossRefPubMedPubMedCentral

Titel: A two-stage super learner for healthcare expenditures
verfasst von: Ziyue Wu
Seth A. Berkowitz
Patrick J. Heagerty
David Benkeser
Publikationsdatum: 06.06.2022
Verlag: Springer US
Erschienen in: Health Services and Outcomes Research Methodology / Ausgabe 4/2022
Print ISSN: 1387-3741
Elektronische ISSN: 1572-9400
DOI: https://doi.org/10.1007/s10742-022-00275-x

Die Highlights vom Kongress des American College of Cardiology 2024

Springer Medizin

A two-stage super learner for healthcare expenditures

Abstract

Die Highlights vom Kongress des American College of Cardiology 2024

Springer Medizin

Abstract

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

Weitere Artikel der Ausgabe 4/2022

Does balancing site characteristics result in balanced population characteristics in a cluster-randomized controlled trial?

Using NVivoTM as a methodological tool for a literature review on nursing innovation: a step-by-step approach

Imputing race and ethnicity in healthcare claims databases

Just you wait… and fill out this survey. Discussion of the methodological aspects of waiting room surveys

The answer depends on pragmatic norms, semantic context-sensitivity, and epistemic reflection. A linguistic and epistemological analysis of the Danish Short Form 36 Health Survey (SF-36)