Skip to main content
Log in

A Markov chain Monte Carlo algorithm for multiple imputation in large surveys

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

Important empirical information on household behavior and finances is obtained from surveys, and these data are used heavily by researchers, central banks, and for policy consulting. However, various interdependent factors that can be controlled only to a limited extent lead to unit and item nonresponse, and missing data on certain items is a frequent source of difficulties in statistical practice. More than ever, it is important to explore techniques for the imputation of large survey data. This paper presents the theoretical underpinnings of a Markov chain Monte Carlo multiple imputation procedure and outlines important technical aspects of the application of MCMC-type algorithms to large socio-economic data sets. In an illustrative application it is found that MCMC algorithms have good convergence properties even on large data sets with complex patterns of missingness, and that the use of a rich set of covariates in the imputation models has a substantial effect on the distributions of key financial variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Barceló, C.: Imputation of the 2002 wave of the Spanish Survey of Household Finances (EFF). Occasional Paper No. 0603, Bank of Spain (2006)

  • Beatty, P., Herrmann, D.: To answer or not to answer: Decision processes related to survey item nonresponse. In: Groves, R.M., Dillman, D.A., Eltinge, J.L., Little, R.J.A. (eds.) Survey Nonresponse, pp. 71–85. Wiley, New York (2002)

    Google Scholar 

  • Bernaards, C.A., Farmer, M.M., Qi, K., Dulai, G.S., Ganz, P.A., Kahn, K.L.: Comparison of two multiple imputation procedures in a cancer screening survey. J. Data Sci. 1(3), 293–312 (2003)

    Google Scholar 

  • Biewen, M.: Item non-response and inequality measurement: Evidence from the German earnings distribution. Allg. Stat. Arch. 85(4), 409–425 (2001)

    MathSciNet  Google Scholar 

  • Cameron, A.C., Trivedi, P.K.: Microeconometrics. Methods and Applications. Cambridge University Press, New York (2005)

    Google Scholar 

  • Chand, H., Gan, L.: Wealth item nonresponse and imputation in the AHEAD. Working Paper, Texas A&M University (2002)

  • Essig, L., Winter, J.: Item nonresponse to financial questions in household surveys: An experimental study of interviewer and mode effects. MEA-Discussion paper 39-03, MEA—Manheim Research Institute for the Economics of Aging, University of Mannheim (2003)

  • Ezzati-Rice, T.M., Johnson, W., Khare, M., Little, R.J.A., Rubin, D.B., Schafer, J.L.: Multiple imputation of missing data in NHANES III. In: Proceedings of the Annual Research Conference, pp. 459–487, U.S. Bureau of the Census (1995)

  • Ferber, R.: Item nonresponse in a consumer survey. Public Opin. Q. 30(3), 399–415 (1966)

    Article  Google Scholar 

  • Frick, J.R., Grabka, M.M.: Item nonresponse on income questions in panel surveys: Incidence, imputation and the impact on inequality and mobility. Allg. Stat. Arch. 90(1), 49–62 (2005)

    MathSciNet  Google Scholar 

  • Geman, S., Geman, D.: Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6(6), 721–741 (1984)

    Article  Google Scholar 

  • Graham, J.W., Schafer, J.L.: On the performance of multiple imputation for multivariate data with small sample size. In: Hoyle, R. (ed.) Statistical Strategies for Small Sample Research, pp. 1–29. Sage, Thousand Oaks (1999)

    Google Scholar 

  • Hartley, H.O., Hocking, R.R.: The analysis of incomplete data. Biometrics 27, 783–808 (1971)

    Article  Google Scholar 

  • Hastings, W.K.: Monte Carlo sampling methods using Markov chain and their applications. Biometrika 57, 97–109 (1970)

    Article  MATH  Google Scholar 

  • Hoynes, H., Hurd, M., Chand, H.: Household wealth of the Elderly under alternative imputation procedures. In: Wise, D.A. (ed.) Inquiries in the Economics of Aging, pp. 229–257. The University of Chicago Press, Chicago (1998)

    Google Scholar 

  • Kalwij, A., van Soest, A.: Item non-response and alternative imputation procedures. In: Börsch-Supan, A., Jürges, H. (eds.) The Survey of Health, Ageing and Retirement in Europe—Methodology, pp. 128–150. Mannheim Research Institute for the Economics of Aging, Mannheim (2005)

    Google Scholar 

  • Kennickell, A.B.: Multiple imputation in the survey of consumer finances. In: Proceedings of the 1998 Joint Statistical Meetings, Dallas, TX (1998)

  • Li, K.: Imputation using Markov chains. J. Stat. Comput. Simul. 30, 57–79 (1988)

    Article  MATH  Google Scholar 

  • Li, K., Raghunathan, T., Rubin, D.: Large sample significance levels from multiply-imputed data using moment-based statistics and an F reference distribution. J. Am. Stat. Assoc. 86, 1065–1073 (1991)

    Article  MathSciNet  Google Scholar 

  • Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (2002)

    MATH  Google Scholar 

  • Little, R.J.A., Raghunathan, T.: Should imputation of missing data condition on all observed variables? In: Proceedings of the Section on Survey Research Methods, Joint Statistical Meetings, Anaheim, California (1997)

  • Manski, C.: Partial identification with missing data: Concepts and findings. Int. J. Approx. Reason. 39(2–3), 151–165 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Rässler, S., Riphahn, R.: Survey item nonresponse and its treatment. Allg. Stat. Arch. 90, 217–232 (2006)

    MATH  MathSciNet  Google Scholar 

  • Riphahn, R., Serfling, O.: Item non-response on income and wealth questions. Empir. Econ. 30(2), 521–538 (2004)

    Article  Google Scholar 

  • Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys. Wiley, New York (1987)

    Google Scholar 

  • Rubin, D.B.: Multiple imputation after 18+ years. J. Am. Stat. Assoc. 91(434), 473–489 (1996)

    Article  MATH  Google Scholar 

  • Rubin, D.B., Schenker, N.: Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J. Am. Stat. Assoc. 81(394), 366–374 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  • Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hall, London (1997)

    MATH  Google Scholar 

  • Schunk, D.: A Markov chain Monte Carlo multiple imputation procedure for dealing with item nonresponse in the German save survey. MEA-Technical Discussion Paper 121-07, MEA—Manheim Research Institute for the Economics of Aging, University of Mannheim (2007)

  • Schräpler, J.-P.: Gross income non-response in the German socio-economic panel: Refusal or don’t know? Schmollers Jahrb. 123, 109–124 (2003)

    Google Scholar 

  • Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)

    MATH  Google Scholar 

  • Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82(398), 528–550 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  • van Buuren, S., Boshuizen, H.C., Knook, D.L.: Multiple imputation of missing blood pressure covariates in survival analysis. Stat. Med. 18, 681–694 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Schunk.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schunk, D. A Markov chain Monte Carlo algorithm for multiple imputation in large surveys. AStA 92, 101–114 (2008). https://doi.org/10.1007/s10182-008-0053-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-008-0053-6

Keywords

Navigation