Missing data are a frequent problem in cost-effectiveness analysis within a randomised clinical trial. |
Different methods of handling missing data can yield different results and affect decisions on the value for money of healthcare interventions. |
The choice of method should be grounded in the assumed missing data mechanism, which in turn should be informed by the available evidence. |
The impact of alternative assumptions about the missing data mechanism should be carefully assessed in sensitivity analysis. |
1 Introduction
2 Classifications of Missing Data Mechanisms
-
Data are missing completely at random (MCAR) if the probability that data are missing is independent of both observed and unobserved values; i.e. the distribution of outcomes in the observed individuals is a representative sample of the distribution of outcomes in the overall population (missing and observed).
-
An extension of Rubin’s MCAR is the covariate-dependent missingness (CD-MCAR); in CD-MCAR, the probability that data are missing may depend on observed baseline covariates (e.g. age and gender) but is independent of the missing and observed outcome [13]. This distinction is useful in within-trial CEAs because RCTs often have multiple data collection points and the probability that data are missing may depend on individuals’ baseline characteristics but not on previous outcome measurements.
-
Data are missing at random (MAR) if the probability that data are missing is independent of unobserved values, given the observed data (including previous outcome measurements). Therefore, any systematic differences between the observed and unobserved values can be explained by differences in observed variables.
-
Data are missing not at random (MNAR) if, given the observed data, the probability that data are missing is dependent on unobserved values. For example, individuals with worse outcomes may be more likely to have missing data on outcomes. Assuming that data are MCAR or MAR when in fact data are MNAR may bias the estimates of treatment effect.
3 Stage 1: Descriptive Analysis of Missing Data
misspattern
’ in Stata®) are useful to visualise and understand the pattern of missing data. These graphs indicate whether patients with missing data are lost to follow-up throughout the duration of the trial (monotonic pattern), and therefore whether relatively simpler approaches can be used, such as IPW. In addition, these graphs can be plotted to determine whether data are missing for all the questions in HR-QOL or resource use or for individual items in each category (more detail in Sect. 6.1). These patterns can guide the choice of whether missing data need to be modelled in the individual components or in the aggregate score.4 Stage 2: Choosing and Implementing a Method to Handle Missing Data
4.1 Handling Missing Baseline Values
4.2 Complete Case Analysis, Available Case Analysis and Inverse Probability Weighting
4.3 Single Imputation Methods
4.4 Multiple Imputation
mi impute mvn
’ or as MICE using ‘mi impute chained’ or the ‘
ice
’ package. The analysis step can be performed using ‘
mi impute estimate
’ or the ‘
mim
’ package. Multiply imputed data created by ‘ice’ can be imported into ‘
mi impute
’ for analysis using the command ‘
mi import ice
’; otherwise, it can be analysed directly using the ‘mim’ command. Equivalent programmes are available in SAS® and R. The subsequent sections focus on the implementation of MICE because its flexibility makes it more applicable to missing data in within-trial CEAs.4.4.1 The Imputation Model
4.4.2 Analysis of the Multiply Imputed Dataset
4.5 Likelihood-Based Methods
5 Stage 3: Sensitivity Analysis to the Missing at Random (MAR) Assumption
6 Illustration with the REFLUX Study
6.1 Stage 1: Descriptive Analysis of Missing Data
6.1.1 Amount of Missing Data by Trial Group at Each Follow-Up Period
Complete at | Surgery (n = 178) | Medical management (n = 179) |
---|---|---|
Year 1 | 134 (75%) | 147 (82%) |
Year 2 | 121 (68%) | 134 (75%) |
Year 3 | 112 (63%) | 119 (66%) |
Year 4 | 114 (64%) | 118 (66%) |
Year 5 | 115 (65%) | 113 (63%) |
All years | 88 (49%) | 84 (47%) |
6.1.2 Missing Data Patterns
6.1.3 Association Between Missingness and Baseline Variables
Odds ratio in logistic regression for missing data (95 % CI) | ||
---|---|---|
Missing data on costs | Missing data on QALYs | |
Treatment allocation | 1.04 (0.68–1.59) | 1.04 (0.68–1.58) |
Gender | 1.29 (0.81–2.04) | 1.10 (0.70–1.74) |
BMI | 1.01 (0.96–1.06) | 1.01 (0.96–1.06) |
Age | 0.99 (0.97–1.00) | 0.99 (0.97–1.00) |
EQ-5D at baseline | 0.38** (0.16–0.90) | 0.46* (0.19–1.09) |
6.1.4 Association Between Missingness and Observed Outcomes
6.2 Stage 2: Choosing and Implementing a Method to Handle Missing Data
Complete case analysis with seemingly unrelated regression model | Multiple imputation of costs and QALYs followed by seemingly unrelated regression model | Mixed model with adjustment for baseline EQ-5D | ||
---|---|---|---|---|
Difference in costs (£) | Mean | 1,668 | 1,305 | 1,338 |
SE | 268 | 255 | 253 | |
95 % CI | 1,142–2,194 | 805–1,806 | 843–1,833 | |
Difference in QALYs adjusted for baseline EQ-5D | Mean | 0.301 | 0.244 | 0.227 |
SE | 0.106 | 0.098 | 0.100 | |
95 % CI | 0.093–0.508 | 0.052–0.437 | 0.031–0.422 | |
ICER | £/QALY | 5,547 | 5,340 | 5,903 |
Probability that surgery is cost effective at the threshold of £20,000 per QALY gained | 0.98 | 0.96 | 0.94 |
6.3 Stage 3: Sensitivity Analysis to the MAR Assumption
7 Implications for Practice and Research
Recommendation | Comments |
---|---|
Stage 1: Descriptive analysis | |
1.1 Conduct descriptive analysis of the data: • Proportion of missing data by trial group at each follow-up period • Missing data pattern • Association between missingness and baseline variables • Association between missingness and observed outcomes | Report the descriptive analysis that was conducted to inform the assumption on the missing data mechanism |
1.2 Discuss among the trial team (trialists, clinicians, trial management group, etc.) the possible mechanisms and reasons for missing data | |
1.3 Make an assumption on the missing data mechanism based on the information collected in 1.1 and 1.2 | Note that the descriptive analysis can distinguish between MCAR, CD-MCAR and MAR, but it cannot rule out MNAR |
1.4 State the assumption on the missing data mechanism and justify the choice of assumption | |
1.5 Report HR-QOL, resource use and costs (if applicable) by treatment group prior to imputation | |
Stage 2: Choosing and Implementing a Method to Handle Missing data | |
2.1. Choose a method to handle the missing data in accordance with the assumed missing data mechanism | Complete case analysis (with the baseline covariates related with missing data included in the analysis model) for CD-MCAR, MI or likelihood-base model for MAR, IPW for monotonic missing data under MCAR, CD-MCAR or MAR |
2.2. State up front any other assumptions required for the analysis | e.g. whether missing data in individual resource use items are assumed to be zero |
2.3. Include all randomised individuals with follow-up data | Individuals with data only at baseline may be excluded from the base case but should be included in a scenario to make the analysis truly intention-to-treat |
2.4. Impute missing baseline covariates with mean imputation or MI | MI is more complex, and may be less efficient, than mean imputation |
2.5. MI seems the most widely applicable method of analysis: • The imputation model should include all covariates related to missingness, related to outcomes and any variable included in the analysis model • MI should be implemented separately by treatment allocation • The number of imputations should be at least greater than the proportion of missing data • Predictive mean matching and/or transformations in MICE can help with CEA data that is non-normal distributed • Costs can be imputed at a resource use level or as costs • QALYs can be imputed at HR-QOL domain level, at the index score level or as QALYs | MI can be implemented with chained equations (MI-MICE) or by joint modelling (MI-JM), which assumes multivariate normality. The current evidence base does not allow for strict recommendations for one approach over another |
2.6. Likelihood-based models are a sensible alternative to MI but can be more difficult to implement | Likelihood-based models avoid the imputation step but only covariates allowed for the analysis model can be included. They can be difficult to implement when costs or health outcomes are disaggregated |
2.7. IPW methods are useful if the missing data pattern is monotonic | IPW avoids the imputation step but its reliability is dependent on the model specification |
2.8. Other ad hoc methods (e.g. complete case, mean imputation or last-value carried forward) should be avoided | They cannot incorporate the uncertainty inherent in missing data, and often make implausible assumptions about the missing data mechanism |
2.9. The method chosen to handle missing data can be validated by comparing results with an alternative method that makes the same assumption on the missing data mechanism (e.g. likelihood-based model vs. MI with the same covariates) | If using MI, the imputation model can be validated by comparing the distribution of observed and imputed data |
2.10. If using MI, report resource use, HR-QOL scores (if imputed at this level), costs and QALYs by treatment group after imputation. Results after imputation should be compared with the descriptive analysis pre-imputation | |
Stage 3: Sensitivity analysis to the MAR assumption | |
3.1. Sensitivity analysis explores the robustness of the results to alternative assumptions on the missing data mechanism: • The methods proposed here (weighting approach or an additive shift of imputed values) are straightforward and informative | Pattern mixture and selection models can be difficult to implement |
3.2. Interpret the results of the sensitivity analysis in light of the understanding of the disease and the trial context (see 1.2.) | Does the allocation decision (i.e. is the intervention likely to be cost effective?) change given plausible changes in the assumption on the missing data mechanism? |