Background
-
To examine qualitatively the hypotheses concerning the mechanism of MVs in semi-directed interviews of subjects;
-
To evaluate the predictive ability of the imputation model only including items of the CES-D scale on a cohort with simulated missing data;
-
To evaluate the results of multiple imputation of either the score or of the status regarding depression symptoms (score >16) or multiple imputation of each item with two different imputation models including or not variables other than the CES-D items; complete case analysis and simple imputation of the overall CES-D score to the person-mean were used as comparators;
-
To explore the possible biases due to MNAR data
Methods
Study design and inclusion criteria
Data collection
Variables of interest
Covariates to impute MVs in the CES-D scale
Qualitative study
Statistical methods
Internal and external validity of the data
Study population
DS among complete cases
Psychometric properties of the CES-D scale
Investigation of the mechanism of MVs: a qualitative assessment
Estimation of prevalence of hDS taking MVs into account
Multiple imputation and imputation models
pmm
method) and another for multiple imputation of the hDS status using a logistic regression model (logreg
method) with all preselected risk factors of DS. Second, four imputation models were constructed for multiple imputations of the items of the CES-D scale. Two different mean structures were investigated; the parsimonious model only included the CES-D items. The full model included the CES-D items and all preselected 17 candidate risk factors of DS. For each mean structure, a linear regression model (pmm
method) as well as a polytomous unordered regression model (polyreg
method) were used to impute missing values for the items.Simulation study
pmm
and polyreg
methods). Descriptive indicators were measured on the various data sets defined by the number of MVs and the imputation method, i.e. (i) mean and variance of the CES-D score, (ii) standard error of the mean CES-D score, (iii) prevalence of hDS.CES-D score and prevalence of hDS under the ignorable MVs hypothesis
Sensitivity analysis under the nonignorable MVs hypothesis
-
Fit an imputation model assuming ignorable MVs;
-
Modify the imputation model by adding a parameter (expressed as the odds ratio comparing the odds of a response category among subjects with MV with those without MV for categorical variables; as the difference in expected values for continuous variables);
-
Impute MVs under the scenario thus specified.
Results
Internal and external validity of the data
Study population
DS among complete cases
Psychometric properties of the CES-D scale
Investigation of the mechanism of MVs: a qualitative assessment
CES-D scale (Qualitative study) | ||||
---|---|---|---|---|
No MV | At least one MV | All | ||
CES-D scale | NhDS | 104 | 12 | 116 |
(Qualitative study) | hDS | 45 | 14 | 59 |
Undetermined | - | 8 | 8 | |
CES-D scale | No MV | 91 (82.7%) | 19 (17.3%) | 110 (100.0%) |
(8th questionnaire) | At least one MV | 57 (79.2%) | 15 (20.8%) | 72 (100.0%) |
NhDS | hDS | All | ||
CES-D scale | NhDS | 79 (77.5%) | 23 (22.5%) | 102 (100.0%) |
(8th questionnaire) | hDS | 19 (43.2%) | 25 (56.8%) | 44 (100.0%) |
No MV | At least one MV | All | |
---|---|---|---|
Postal letter | |||
Potential reason for MV (given in writing) | |||
Personal physical disorders | 5 | 3 | 8 |
Personal psychological disorders | 2 | 1 | 3 |
Stressful life event | 1 | 0 | 1 |
Relative's disease or death | 3 | 1 | 4 |
No potential reason for MV | 138 | 29 | 167 |
Total (participants)
|
149
|
34
|
183
|
Interview study | |||
Potential reason for MV (given orally) | |||
Personal physical disorders | 1 | 3 | 4 |
Personal psychological disorders | 2 | 3 | 5 |
Stressful life event | 0 | 4 | 4 |
Relative's disease or death | 3 | 4 | 7 |
No potential reason for MV | 21 | 15 | 36 |
Total (contacted)
|
27
|
29
|
56
|
Estimation of prevalence taking missing data into account
Simulation study
pmm
and polyreg,
based on a linear regression model and a polytomous unordered regression model, respectively, gave similar results, up to a large number of MVs (see Additional file 7). The cut-off of 4 MVs did not appear to be associated with any particularly interesting properties. Below 15 MVs, multiple imputation performed very well, while single imputation gave some biased results.CES-D score and prevalence of hDS under the ignorable MVs hypothesis
Score on the CES-D scale | ||||||
---|---|---|---|---|---|---|
N | Mean | SD | SEM | ≥16 (%) | ||
Complete cases | 39,393 | 11.89 | 8.20 | 0.04 | 26.09 | |
Classifiable cases | 55,964 | - | - | 30.36 | ||
Single imputation - Minimum value | ||||||
0 - 20 MV | 71,412 | 11.02 | 8.28 | 0.03 | 23.79 | |
0 - 10 MV | 62,053 | 12.10 | 8.20 | 0.03 | 27.06 | |
0 - 4 MV | 59,562 | 12.07 | 8.22 | 0.03 | 26.91 | |
Single imputation - Maximum value | ||||||
0 - 20 MV | 71,412 | 19.68 | 16.15 | 0.06 | 45.42 | |
0 - 10 MV | 62,053 | 14.53 | 9.62 | 0.04 | 37.19 | |
0 - 4 MV | 59,562 | 13.71 | 8.74 | 0.04 | 34.58 | |
Single imputation - Person mean | ||||||
0 - 19 MV | 69,242 | 13.81 | 10.16 | 0.04 | 33.16 | |
0 - 10 MV | 62,053 | 12.76 | 8.82 | 0.04 | 28.99 | |
0 - 4 MV | 59,562 | 12.45 | 8.52 | 0.03 | 27.78 | |
Multiple imputation | ||||||
Score | ||||||
0 - 20 MV | 71,412 | 12.23 | 8.37 | 0.04 | 27.58 | |
0 - 10 MV | 62,053 | 12.12 | 8.33 | 0.04 | 27.10 | |
0 - 4 MV | 59,562 | 12.06 | 8.29 | 0.05 | 26.81 | |
Status ≥ 16 | ||||||
0 - 20 MV | 71,412 | - | - | 31.04 | ||
0 - 10 MV | 62,053 | - | - | 30.33 | ||
0 - 4 MV | 59,562 | - | - | 29.27 | ||
Multiple imputation | ||||||
Items | ||||||
Parsimonious model | ||||||
pmm method | ||||||
0 - 20 MV | 71,412 | 13.30 | 9.02 | 0.04 | 31.99 | |
0 - 10 MV | 62,053 | 12.76 | 8.71 | 0.04 | 29.75 | |
0 - 4 MV | 59,562 | 12.48 | 8.47 | 0.03 | 28.62 | |
polyreg method | ||||||
0 - 20 MV | 71,412 | 13.18 | 9.05 | 0.04 | 31.54 | |
0 - 10 MV | 62,053 | 12.76 | 8.72 | 0.04 | 29.75 | |
0 - 4 MV | 59,562 | 12.47 | 8.48 | 0.03 | 28.61 | |
Full model | ||||||
pmm method | ||||||
0 - 20 MV | 71,412 | 13.26 | 9.04 | 0.03 | 31.85 | |
0 - 10 MV | 62,053 | 12.76 | 8.72 | 0.04 | 29.74 | |
0 - 4 MV | 59,562 | 12.48 | 8.48 | 0.03 | 28.63 | |
polyreg method | ||||||
0 - 20 MV | 71,412 | 13.22 | 9.05 | 0.04 | 31.73 | |
0 - 10 MV | 62,053 | 12.76 | 8.72 | 0.04 | 29.80 | |
0 - 4 MV | 59,562 | 12.48 | 8.48 | 0.03 | 28.63 |
Sensitivity analysis under the nonignorable MVs hypothesis
Parsimonious model | Full model | ||||||
---|---|---|---|---|---|---|---|
Score on the CES-D scale | Score on the CES-D scale | ||||||
N | Mean | SD | ≥16 (%) | Mean | SD | ≥16 (%) | |
Scenario 1 | |||||||
θa = (1.2; 1.5; 2.0) for all items | |||||||
0 - 20 MV | 71,412 | 13.43 | 9.18 | 32.49 | 13.47 | 9.19 | 32.71 |
0 - 10 MV | 62,053 | 12.84 | 8.77 | 30.05 | 12.84 | 8.77 | 30.07 |
0 - 4 MV | 59,562 | 12.52 | 8.50 | 28.81 | 12.53 | 8.50 | 28.83 |
Scenario 2: | |||||||
θa = (1.2; 1.5; 2.0) for N items, θa = (1.5; 2.0; 2.5) for P items | |||||||
0 - 20 MV | 71,412 | 13.45 | 9.18 | 32.57 | 13.50 | 9.19 | 32.79 |
0 - 10 MV | 62,053 | 12.85 | 8.77 | 30.07 | 12.85 | 8.77 | 30.10 |
0 - 4 MV | 59,562 | 12.53 | 8.50 | 28.82 | 12.53 | 8.50 | 28.84 |
Scenario 3 | |||||||
θa = (2.0; 3.0; 5.0) for N items, θa = (3.0; 5.0; 8.0) for P items | |||||||
0 - 20 MV | 71,412 | 13.88 | 9.37 | 34.41 | 13.92 | 9.39 | 34.66 |
0 - 10 MV | 62,053 | 12.97 | 8.82 | 30.58 | 12.98 | 8.82 | 30.59 |
0 - 4 MV | 59,562 | 12.62 | 8.52 | 29.12 | 12.62 | 8.52 | 29.19 |
Scenario 4 | |||||||
θa = (4.0; 6.0; 10.0) for N items, θa = (6.0; 10.0; 15.0) for P items | |||||||
0 - 20 MV | 71,412 | 14.19 | 9.47 | 36.17 | 14.24 | 9.50 | 36.39 |
0 - 10 MV | 62,053 | 13.06 | 8.84 | 30.98 | 13.06 | 8.84 | 30.98 |
0 - 4 MV | 59,562 | 12.68 | 8.52 | 29.42 | 12.68 | 8.52 | 29.44 |
Discussion
mice
package gave very similar results when imputing MVs under the MAR assumption. From a computational point of view, the former has the advantage of being the fastest, and easiest to fit on small sample sizes. Nevertheless, this model requires a very strong assumption of linearity that must be carefully checked.