Abstract
Important empirical information on household behavior and finances is obtained from surveys, and these data are used heavily by researchers, central banks, and for policy consulting. However, various interdependent factors that can be controlled only to a limited extent lead to unit and item nonresponse, and missing data on certain items is a frequent source of difficulties in statistical practice. More than ever, it is important to explore techniques for the imputation of large survey data. This paper presents the theoretical underpinnings of a Markov chain Monte Carlo multiple imputation procedure and outlines important technical aspects of the application of MCMC-type algorithms to large socio-economic data sets. In an illustrative application it is found that MCMC algorithms have good convergence properties even on large data sets with complex patterns of missingness, and that the use of a rich set of covariates in the imputation models has a substantial effect on the distributions of key financial variables.
Similar content being viewed by others
References
Barceló, C.: Imputation of the 2002 wave of the Spanish Survey of Household Finances (EFF). Occasional Paper No. 0603, Bank of Spain (2006)
Beatty, P., Herrmann, D.: To answer or not to answer: Decision processes related to survey item nonresponse. In: Groves, R.M., Dillman, D.A., Eltinge, J.L., Little, R.J.A. (eds.) Survey Nonresponse, pp. 71–85. Wiley, New York (2002)
Bernaards, C.A., Farmer, M.M., Qi, K., Dulai, G.S., Ganz, P.A., Kahn, K.L.: Comparison of two multiple imputation procedures in a cancer screening survey. J. Data Sci. 1(3), 293–312 (2003)
Biewen, M.: Item non-response and inequality measurement: Evidence from the German earnings distribution. Allg. Stat. Arch. 85(4), 409–425 (2001)
Cameron, A.C., Trivedi, P.K.: Microeconometrics. Methods and Applications. Cambridge University Press, New York (2005)
Chand, H., Gan, L.: Wealth item nonresponse and imputation in the AHEAD. Working Paper, Texas A&M University (2002)
Essig, L., Winter, J.: Item nonresponse to financial questions in household surveys: An experimental study of interviewer and mode effects. MEA-Discussion paper 39-03, MEA—Manheim Research Institute for the Economics of Aging, University of Mannheim (2003)
Ezzati-Rice, T.M., Johnson, W., Khare, M., Little, R.J.A., Rubin, D.B., Schafer, J.L.: Multiple imputation of missing data in NHANES III. In: Proceedings of the Annual Research Conference, pp. 459–487, U.S. Bureau of the Census (1995)
Ferber, R.: Item nonresponse in a consumer survey. Public Opin. Q. 30(3), 399–415 (1966)
Frick, J.R., Grabka, M.M.: Item nonresponse on income questions in panel surveys: Incidence, imputation and the impact on inequality and mobility. Allg. Stat. Arch. 90(1), 49–62 (2005)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6(6), 721–741 (1984)
Graham, J.W., Schafer, J.L.: On the performance of multiple imputation for multivariate data with small sample size. In: Hoyle, R. (ed.) Statistical Strategies for Small Sample Research, pp. 1–29. Sage, Thousand Oaks (1999)
Hartley, H.O., Hocking, R.R.: The analysis of incomplete data. Biometrics 27, 783–808 (1971)
Hastings, W.K.: Monte Carlo sampling methods using Markov chain and their applications. Biometrika 57, 97–109 (1970)
Hoynes, H., Hurd, M., Chand, H.: Household wealth of the Elderly under alternative imputation procedures. In: Wise, D.A. (ed.) Inquiries in the Economics of Aging, pp. 229–257. The University of Chicago Press, Chicago (1998)
Kalwij, A., van Soest, A.: Item non-response and alternative imputation procedures. In: Börsch-Supan, A., Jürges, H. (eds.) The Survey of Health, Ageing and Retirement in Europe—Methodology, pp. 128–150. Mannheim Research Institute for the Economics of Aging, Mannheim (2005)
Kennickell, A.B.: Multiple imputation in the survey of consumer finances. In: Proceedings of the 1998 Joint Statistical Meetings, Dallas, TX (1998)
Li, K.: Imputation using Markov chains. J. Stat. Comput. Simul. 30, 57–79 (1988)
Li, K., Raghunathan, T., Rubin, D.: Large sample significance levels from multiply-imputed data using moment-based statistics and an F reference distribution. J. Am. Stat. Assoc. 86, 1065–1073 (1991)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2002)
Little, R.J.A., Raghunathan, T.: Should imputation of missing data condition on all observed variables? In: Proceedings of the Section on Survey Research Methods, Joint Statistical Meetings, Anaheim, California (1997)
Manski, C.: Partial identification with missing data: Concepts and findings. Int. J. Approx. Reason. 39(2–3), 151–165 (2005)
Rässler, S., Riphahn, R.: Survey item nonresponse and its treatment. Allg. Stat. Arch. 90, 217–232 (2006)
Riphahn, R., Serfling, O.: Item non-response on income and wealth questions. Empir. Econ. 30(2), 521–538 (2004)
Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)
Rubin, D.B.: Multiple imputation after 18+ years. J. Am. Stat. Assoc. 91(434), 473–489 (1996)
Rubin, D.B., Schenker, N.: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J. Am. Stat. Assoc. 81(394), 366–374 (1986)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall, London (1997)
Schunk, D.: A Markov chain Monte Carlo multiple imputation procedure for dealing with item nonresponse in the German save survey. MEA-Technical Discussion Paper 121-07, MEA—Manheim Research Institute for the Economics of Aging, University of Mannheim (2007)
Schräpler, J.-P.: Gross income non-response in the German socio-economic panel: Refusal or don’t know? Schmollers Jahrb. 123, 109–124 (2003)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)
Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82(398), 528–550 (1987)
van Buuren, S., Boshuizen, H.C., Knook, D.L.: Multiple imputation of missing blood pressure covariates in survival analysis. Stat. Med. 18, 681–694 (1999)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schunk, D. A Markov chain Monte Carlo algorithm for multiple imputation in large surveys. AStA 92, 101–114 (2008). https://doi.org/10.1007/s10182-008-0053-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-008-0053-6