Imputation of missing categorical data by maximizing internal consistency

van Buuren, Stef; van Rijckevorsel, Jan L. A.

doi:10.1007/BF02294420

Imputation of missing categorical data by maximizing internal consistency

Published: December 1992

Volume 57, pages 567–580, (1992)
Cite this article

Psychometrika Aims and scope Submit manuscript

Stef van Buuren¹ &
Jan L. A. van Rijckevorsel¹

161 Accesses
31 Citations
Explore all metrics

Abstract

This paper suggests a method to supplant missing categorical data by “reasonable” replacements. These replacements will maximize the consistency of the completed data as measured by Guttman's squared correlation ratio. The text outlines a solution of the optimization problem, describes relationships with the relevant psychometric theory, and studies some properties of the method in detail. The main result is that the average correlation should be at least 0.50 before the method becomes practical. At that point, the technique gives reasonable results up to 10–15% missing data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Missing Data

Missing Data Theory

References

Dear, R. E. (1959).A principal component missing data method for multiple regression models (SP-86). Santa Monica, CA: System Development Corporation.
Google Scholar
Fisher, W. D. (1958). On grouping for maximum homogeneity.Journal of the American Statistical Association, 53, 789–798.
Google Scholar
Gifi, A. (1990).Nonlinear multivariate analysis. Chichester: Wiley.
Google Scholar
Gleason, T. C., & Staelin, R. (1975). A proposal for handling missing data.Psychometrika, 40, 229–252.
Google Scholar
Greenacre, M. J. (1984).Theory and applications of correspondence analysis. New York: Academic Press.
Google Scholar
Guttman, L. (1941). The quantification of a class of attributes: A theory and method of scale construction. In P. Horst et al. (Eds.),The prediction of personal adjustment (pp. 319–348). New York: Social Science Research Council.
Google Scholar
Hartigan, J. A. (1975).Clustering algorithms. New York: Wiley.
Google Scholar
Hartley, H. O., & Hocking, R. R. (1971). The analysis of incomplete data.Biometrics, 27, 783–808.
Google Scholar
Kalton, G., & Kasprzyk, D. (1982). Imputing for missing survey responses.Proceedings of the Section of Survey Research Methods, 1982 (pp. 22–23). Alexander, VA: American Statistical Association.
Google Scholar
Little, R. J. A., & Rubin, D. B. (1990). The analysis of social science data with missing values. In J. Fox & T. Scott Long (Eds.),Modern methods of data analysis (pp. 374–409). London: Sage.
Google Scholar
Madow, W. G., Olkin, I., & Rubin, D. B. (Eds.). (1983).Incomplete data in sample surveys (Vols. 1–3). New York: Academic Press.
Google Scholar
Meulman, J. (1982).Homogeneity analysis of incomplete data. Leiden: DSWO Press.
Google Scholar
Milligan, G. W. (1980). An examination of the effect of six types of error perturbation of fifteen clustering algorithms.Psychometrika, 45, 325–342.
Google Scholar
Nishisato, S. (1980).Analysis of categorical data: Dual scaling and its applications. Toronto: University of Toronto Press.
Google Scholar
Nishisato, S., & Ahn, H. (in press). When not to analyze data: Decision making on missing responses in dual scaling.Annals of Operations Research.
Rubin, D. B. (1987).Multiple imputation for nonresponse in surveys. New York: Wiley.
Google Scholar
Rubin, D. B. (1991). EM and beyond.Psychometrika, 56, 241–254.
Google Scholar
Scheibler, D., & Schneider, W. (1985). Monte Carlo tests of the accuracy of cluster analysis algorithms.Multivariate Behavioral Research, 20, 283–304.
Google Scholar
Späth, H. (1985).Cluster dissection and analysis. Chichester: Ellis Horwood.
Google Scholar
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation.Journal of the American Statistical Association, 82, 528–550.
Google Scholar
van Buuren, S., & Heiser, W. J. (1989). Clusteringn objects intok groups under optimal scaling of variables.Psychometrika, 54, 699–706.
Google Scholar
van Buuren, S., & van Rijckevorsel, J. L. A. (1992). Data augmentation and optimal scaling. In R. Steyer, K. F. Wender, & K. F. Widaman (Eds.),Psychometric Methodology. Proceedings of the 7th European Meeting of the Psychometric Society in Trier (80–84). Stuttgart and New York: Gustav Fischer Verlag.
Google Scholar
van der Heijden, P. G. M., & Escofier, B. (1989).Multiple correspondence analysis with missing data. Unpublished manuscript, University of Leiden, Department of Psychometrics and Research Methods.
van Rijckevorsel, J. L. A., & de Leeuw, J. (1992). Some results about the importance of knot selection in nonlinear multivariate analysis.Statistica Applicata: Italian Journal of Applied Statistics, 4.

Download references

Author information

Authors and Affiliations

TNO Institute of Preventive Health Care, PO Box 124, 2300 AC, Leiden, The Netherlands
Stef van Buuren & Jan L. A. van Rijckevorsel

Authors

Stef van Buuren
View author publications
You can also search for this author in PubMed Google Scholar
Jan L. A. van Rijckevorsel
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

We thank Anneke Bloemhoff of NIPG-TNO for compiling and making the Dutch Life Style Survey data available to use, and Chantal Houée and Thérèse Bardaine, IUT, Vannes, France, exchange students under the COMETT program of the EC, for computational assistance. We also thank Donald Rubin, the Editors and several anonymous reviewers for constructive suggestions.

Rights and permissions

Reprints and permissions

About this article

Cite this article

van Buuren, S., van Rijckevorsel, J.L.A. Imputation of missing categorical data by maximizing internal consistency. Psychometrika 57, 567–580 (1992). https://doi.org/10.1007/BF02294420

Download citation

Received: 26 December 1970
Revised: 06 January 1992
Issue Date: December 1992
DOI: https://doi.org/10.1007/BF02294420

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imputation of missing categorical data by maximizing internal consistency

Abstract

Access this article

Similar content being viewed by others

Missing Data

Missing Data

Missing Data Theory

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Imputation of missing categorical data by maximizing internal consistency

Abstract

Access this article

Similar content being viewed by others

Missing Data

Missing Data

Missing Data Theory

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation