Elsevier

Academic Pediatrics

Volume 10, Issue 3, May–June 2010, Pages 205-210
Academic Pediatrics

Methods
Techniques for Handling Missing Data in Secondary Analyses of Large Surveys

Presented in part at the Pediatric Academic Societies' Annual Meeting, San Francisco, California, April 29–May 2 2006.
https://doi.org/10.1016/j.acap.2010.01.005Get rights and content

Objective

Using an appropriate method to handle cases with missing data when performing secondary analyses of survey data is important to reduce bias and to reach valid conclusions for the target population. Many published secondary analyses using child health data sets do not discuss the technique employed to treat missing data or simply delete cases with missing data. Missing data may threaten statistical power by reducing sample size or, in more extreme situations, estimates derived by deleting cases with missing values may be biased, particularly if the cases with missing values are systematically different from those with complete data. The aim of this study was to determine which of 4 techniques for handling missing data most closely estimates the true model coefficient when varying proportions of cases are missing data.

Methods

We performed a simulation study to compare model coefficients when all cases had complete data and when 4 techniques for handling missing data were employed with 10%, 20%, 30%, or 40% of the cases missing data.

Results

When >10% of the cases had missing data, the reweight and multiple imputation techniques were superior to dropping cases with missing scores or hot deck imputation.

Conclusions

These findings suggest that child health researchers should use caution when analyzing survey data if a large percentage of cases have missing values. In most situations, the technique of dropping cases with missing data should be discouraged. Investigators should consider reweighting or multiple imputation if a large percentage of cases are missing data.

Section snippets

Data Source

This investigation analyzed data from the live birth component of the 1988 NMIHS (in which birth certificate data are linked to mothers' survey data) and the 1991 Longitudinal Follow-up (LF) Live Birth survey.20 The 1988 NMIHS used a nationally representative sample of 9953 children born in the United State that year and linked birth certificate data to interviews of mothers. African American and low birth weight children were oversampled. The mothers of 8285 children participated in the both

Results

Figure 1, Figure 2, Figure 3, Figure 4 compare the difference in the model coefficients for the 3 independent variables (child chronic illness, child behavior problems, and maternal health) and the intercept for each of the 4 methods for handling missing data. The superior technique for handling missing data will have model coefficients as close as possible to the model coefficients with the full data (ie, the difference in the model coefficients will be close to zero and the 95% confidence

Discussion

We recommend that investigators use caution when analyzing survey data if a large percentage of values are missing for the variable of interest. When analyzing survey data with 10% or less of the values missing, there may be a slight bias by simply dropping cases with missing data (complete case analysis or listwise deletion) or using the hot deck technique, although these biases may not be severe. We found that when using the NMIHS and LF, if >10% of cases were missing a value for the variable

Acknowledgments

This work was funded by grants from The National Institute of Mental Health (1-R03-MH64060-01A1, Diane L. Langkamp, principal investigator) and the Akron Children's Hospital Foundation (Diane L. Langkamp, principal investigator).

References (28)

  • A.R.T. Donders et al.

    Review: a gentle introduction to imputation of missing values

    J Clin Epidemiol

    (2006)
  • C.D. Croy et al.

    Methods for addressing missing data in psychiatric and developmental research

    J Am Acad Child Adolesc Psychiatry

    (2005)
  • D.A. Bennett

    How can I deal with missing data in my study?

    Aust N Z J Public Health

    (2001)
  • R.J.A. Little et al.

    Statistical Analysis with Missing Data

    (2002)
  • P.D. Allison

    Missing Data

    (2002)
  • J.L. Peugh et al.

    Missing data in educational research: a review of reporting practices and suggestions for improvement

    Rev Educ Res

    (2004)
  • A.M. Wood et al.

    Are missing outcome data adequately handled? A review of published randomized controlled trials in major medical journals

    Clin Trials

    (2004)
  • D. Civic et al.

    Maternal depressive symptoms and child behavior problems in a nationally representative normal birthweight sample

    Matern Child Health J

    (2000)
  • L.W. Deal et al.

    Young maternal age and depressive symptoms: results from the 1988 National Maternal and Infant Health Survey

    Am J Public Health

    (1998)
  • K.D. Mandl et al.

    Infant health care use and maternal depression

    Arch Pediatr Adolesc Med

    (1999)
  • S.M. Petterson et al.

    Effects of poverty and maternal depression on early child development

    Child Dev

    (2001)
  • A. Chen et al.

    Breastfeeding and the risk of postneonatal death in the United States

    Pediatrics

    (2004)
  • T.E. Raghunathan

    What do we do with missing data? Some options for analysis of incomplete data

    Annu Rev Public Health

    (2004)
  • R.B. Hakim et al.

    Effect of compliance with health supervision guidelines among US infants on emergency department visits

    Arch Pediatr Adolesc Med

    (2002)
  • Cited by (126)

    View all citing articles on Scopus
    View full text