nach oben

Health Services and Outcomes Research Methodology

Erschienen in:

08.06.2019

Developing and evaluating methods to impute race/ethnicity in an incomplete dataset

verfasst von: Gabriella C. Silva, Amal N. Trivedi, Roee Gutman

Erschienen in: Health Services and Outcomes Research Methodology | Ausgabe 2-3/2019

Einloggen, um Zugang zu erhalten

Abstract

The availability of race data is essential for identifying and addressing racial/ethnic disparities in the health care system; however, patient self-reported racial/ethnic information is often missing. Indirect methods for estimating race have been developed, but they usually only consider geocoded and surname data as predictors, may perform poorly among racial minorities, they do not adjust for possible errors for specific datasets, and are unable to provide race estimates for subjects missing some of this information. The objective of this study was to address these limitations by developing novel methods for imputing race/ethnicity when this information is partially missing. By viewing the unobserved race as missing data, we explored different multiple imputation methods for imputing race/ethnicity, and we applied these methods to a subset of Rhode Island Medicaid beneficiaries. Current race imputation methods and newly developed ones were compared using area under the ROC curve statistics and racial composition estimates to identify methods and sets of predictors that yield superior race imputations. Family race was identified as an important predictor and should be included in race estimation models when possible. Bayesian regression models (BRM) provide better race estimates than previously proposed methods. Missing race was multiply imputed using joint modeling and fully conditional specification. Post-imputation analyses showed that fully conditional specification with a BRM is superior to joint modeling for race imputation. The proposed fully conditional specification method is a flexible, effective way of estimating race/ethnicity that allows for propagation of imputation error and ease of interpretation in further analyses.

Nur mit Berechtigung zugänglich

Adjaye-Gbewonyo, D., Bednarczyk, R.A., Davis, R.L., Omer, S.B.: Using the Bayesian improved surname geocoding method (BISG) to create a working classification of race and ethnicity in a diverse managed care population: a validation study. Health Serv. Res. 49(1), 268–283 (2013)CrossRefPubMedPubMedCentral

Consumer Financial Protection Bureau: Using publicly available information to proxy for unidentified race and ethnicity : a methodology and assessment. Consumer Financial Protection Bureau, United States (2014)

Elliott, M.N., Fremont, A., Morrison, P.A., Pantoja, P., Lurie, N.: A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. Health Serv. Res. 43(5p1), 1722–1736 (2008)CrossRefPubMedPubMedCentral

Elliott, M.N., Morrison, P.A., Fremont, A., McCaffrey, D.F., Pantoja, P., Lurie, N.: Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities. Health Serv. Outcomes Res. Methodol. 9(2), 69 (2009)CrossRef

Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)CrossRef

Fiscella, K., Fremont, A.M.: Use of geocoding and surname analysis to estimate race and ethnicity. Health Serv. Res. 41(4 Pt 1), 1482–1500 (2006)PubMedPubMedCentral

Hassett, P.: Taking on racial and ethnic disparities in health care: the experience at Aetna. Health Aff. 24(2), 417–420 (2005)CrossRef

Honaker, J., King, G., Blackwell, M.: Amelia: A program for missing data. R package version 1.7.5 (2018). https://cran.r-project.org/web/packages/Amelia/

Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley, Hoboken (2000)CrossRef

Hosmer, D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression. Wiley, Hoboken (2013)CrossRef

Kruschke, J.K.: Doing Bayesian Data Analysis: A Tutorial with R and BUGS. Academic Press, Burlington, MA (2011)

Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2002)CrossRef

Liu, Y., De, A.: Multiple imputation by fully conditional specification for dealing with missing data in a large epidemiologic study. Int. J. Stat. Med. Res. 4(3), 287–295 (2015)CrossRefPubMedPubMedCentral

Ma, Y., Zhang, W., Lyman, S., Huang, Y.: The HCUP SID imputation project: improving statistical inferences for health disparities research by imputing missing race data. Health Serv. Res. 53(3), 1870–1889 (2018)CrossRefPubMed

Ng, J.H., Ye, F., Ward, L.M., Haffer, S.C.C., Scholle, S.H.: Data on race, ethnicity, and language largely incomplete for managed care plan members. Health Aff. (Project Hope) 36(3), 548–552 (2017)CrossRef

Polson, N.G., Scott, J.G., Windle, J.: Bayesian inference for logistic models using Polya-Gamma latent variables (2013a). arXiv:1205.0310

Polson, N.G., Scott, J.G., Windle, J.: BayesLogit (2013b)

Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)CrossRef

Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, Hoboken (1987)CrossRef

Schafer, J.L.: Analysis of Incomplete Multivariate Data, 1. ed., 1. CRC Press Reprint ed. Monographs on Statistics and Applied Probability, vol. 72. Chapman & Hall/CRC, Boca Raton (2000)

Schafer, J.L.: Mix: Estimation/Multiple imputation for mixed categorical and continuous data. R package version 1.0-10. (2017). https://CRAN.R-project.org/package=mix

Seaman, S.R., Hughes, R.A.: Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: the general location model. Stat. Methods Med. Res. 27(6), 1603–1614 (2018)CrossRefPubMed

Ulmer, C., McFadden, B., Nerenz, D.R.: Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. National Academies Academic Press, Washington, D.C. (2009)

van Buuren, S.: Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med. Res. 16(3), 219–242 (2007)CrossRefPubMed

van Buuren, S., Groothuis-Oudshoorn, K.: Mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67. (2018). http://www.jstatsoft.org/v45/i03/

Word, D.L., Coleman, C.D., Nunziata, R., Kominski, R.: Demographic Aspects of Surnames from Census 2000. US Census Bureau, Suitland (2008)

Titel: Developing and evaluating methods to impute race/ethnicity in an incomplete dataset
verfasst von: Gabriella C. Silva
Amal N. Trivedi
Roee Gutman
Publikationsdatum: 08.06.2019
Verlag: Springer US
Erschienen in: Health Services and Outcomes Research Methodology / Ausgabe 2-3/2019
Print ISSN: 1387-3741
Elektronische ISSN: 1572-9400
DOI: https://doi.org/10.1007/s10742-019-00200-9

Springer Medizin

Abstract

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

Weitere Artikel der Ausgabe 2-3/2019

Modeling determinants of time-to-circumcision of girls: a comparison of various parametric shared frailty models

Causal inference for multi-level treatments with machine-learned propensity scores

Accounting for study participants who are ineligible for linkage: a multiple imputation approach to analyzing the linked National Health and Nutrition Examination Survey and Centers for Medicare and Medicaid Services’ Medicaid data

Difference-in-differences and matching on outcomes: a tale of two unobservables