Background
HIV programs are often assessed by the proportion of patients who are alive and retained in care, which has direct consequences for funding and programmatic services offered [
1,
2]. However, among individuals who initiate antiretroviral treatment (ART), the reported rate of lost to follow up ranges from 5 to 53% [
1,
3‐
10]. Clinically, these LTF patients are at risk for adverse outcomes such as medication resistance, transmission to others, lack of care, or at best, incomplete medical records when they transfer care to another clinic [
1,
6,
7]. Programmatically, lost to follow up leads to underestimates of retention which could be mis-interpreted as under-performance on program outcomes [
1,
5,
6,
11].
The category of lost to follow-up (LTF) is not a homogeneous outcome—e.g., “dead” or “alive”—but rather a heterogeneous category of three disparate health states: undocumented deaths, undocumented or silent transfers to another source of HIV care, or alive and complete disengagement from HIV care [
12‐
14]. Alive and being retained in care is synonymous with the proportion of patients who are neither dead nor LTF. The fact that LTF is part of the definition makes this outcome complex and problematic.
In reality, LTF is a marker for missing data on vital status. We argue that LTF should not be treated as a legitimate outcome category because it’s meaning can easily change over time and across sites. For example, patients who silently transfer to another provider, move domiciles or die outside of a healthcare facility could all be classified as LTF. Thus, studying predictors of LTF should be avoided. Instead, LTF should be considered a missing data problem that needs to be solved. We present a unique application of MICE to impute both missing outcome (vital status) and missing covariates, simultaneously, using a large longitudinal cohort of patients from Haiti who were treated for HIV infection, and compare the results with MICE to the more traditional analytic methods of using complete cases and inverse probability weights. We also evaluated associations that were predictive of death using three different methods: complete case, inverse probability weights and multiple imputation with chained equations.
Statistical methods for handling missing vital status
In the HIV literature, for studies assessing predictors of mortality/survival, the most common methods of dealing with LTF are complete case analysis, survival models that censor those LTF, and tracing with inverse probability weights [
10,
15‐
22]. But there are other methods, including simple imputation, multiple imputation, and Bayesian analysis [
15]. Each method has different underlying assumptions about the missing data.
Complete case analysis
Complete case analysis omits observations with missing data in multivariable analyses. It is the default method, employed automatically, of most statistical software programs. As only complete observations are used, sample size is decreased, statistical power is compromised, and study results are often biased [
10,
16].
Kaplan Meier survival analysis
Kaplan Meier analysis assumes that lost to follow up is unrelated to mortality. To state this another way, patients who are censored due to LTF have the same probability of survival as those who are not lost to follow up [
23]. However, one cannot verify the Kaplan Meier assumption without more information. From the extant literature, studies have traced patients who are categorized as lost and found that between 12 and 87% were dead [
24]. With this wide range in mind, it is impossible to say if LTF is associated with higher mortality, lower mortality, or if there is no association. Employing this method, patients who are LTF are censored at a time point typically defined by the date when vital status was last verified. It is often used for analyzing HIV cohort data because all cases can be included, at least for the duration that they were followed before being lost.
Inverse probability weights from tracing
Inverse probability weights (IPW) offer another general method for dealing with missing data [
17‐
21,
25]. In the HIV literature, they are often used in conjunction with tracing data. This approach involves using physical or contact tracing to determine the true vital status among a sample of those LTF [
20‐
22,
25]. Then, assuming this sample is representative of all LTF, tracing data is used to apply weights to the subjects with no missing outcome data, so that the weighted analysis provides less biased results, compared to the biased results when using (unweighted) complete cases. The results of the tracing are used to calculate the inverse probability of being a complete case (given the unique set of patient characteristics, including predictors and outcomes), which is used to weight each of the complete cases [
20‐
22,
25]. This method assumes that those who are unsuccessfully traced have a mortality that can be accurately estimated from those successfully traced.
For example, consider a simple analysis to assess whether gender predicts mortality. Among 100 women 50 are documented dead and 50 are documented alive, among 100 men there are 20 documented dead, 20 documented alive, and 60 LTF. A “complete case” analysis suggests that men and women have the same risk of dying (RR = 1), since 50% of the men died and 50% of the women died. However, suppose all 60 of the men LTF were successfully traced and found to be dead. For women who died, all were complete cases, so the IPW is the inverse of the probability of being a complete case, or 1/1.0, or 1. For all women who did not die, all were also complete cases, so the IPW is also 1/1.0. For men who were alive, all were complete cases, so their IPW is also 1/1.0. But for men who died (n = 80, 20 complete case deaths and 60 traced deaths), the probability of being a complete case was 20/(20 + 60), and therefore the IPW is 1/.25, or 4. If we apply these weights and do an IPW analysis—giving complete case men who died 4x the weight of any other complete case—then the average mortality among men is 20 × 4/(20 + 20 × 4) = 80%; and the risk of dying among men compared to women is 80/50 = 1.6.
Note: If only a fraction (f) of the LTF get traced, then each of the traced cases is weighted by the inverse probability of being traced, that is, by 1/f.
However the performance of the IPW model is dependent on methods used to track patients. In resource-limited settings, tracing is difficult, costly, and often unsuccessful. In our case study, Haiti does not have a unique national identification number for its citizens, making it difficult to track patients across various health systems or to verify vital status by referencing a current national death registry [
3].
Multiple imputation with chained equations (MICE)
Multiple Imputation with Chained Equations (MICE) is a less commonly used method for estimating the vital status of those LTF. Although MICE is commonly used to impute missing covariate (predictor) data, [
10,
26,
27] it can also be used to impute missing outcome data [
26,
27]. MICE is optimal when less than 30% of a variable’s data are missing and when subjects with missing data are only randomly different (“missing at random”) from those subjects who share an identical set of patient characteristics, or covariate values [
28‐
31]. However, to our knowledge, no articles in the extant HIV literature have reported results after imputing both the outcome and covariates simultaneously.
The aim of this analysis is to present the application of MICE to impute both missing outcome (vital status) and missing covariates, simultaneously, using a large longitudinal cohort of patients from Haiti who were treated for HIV infection, and compare the results with MICE to the more traditional methods of using complete cases, survival analysis and inverse probability weights. Specifically, we compare adjusted logistic regression models for factors associated with death using complete case, IPW and MICE.
Discussion
Among the first cohort of HIV patients who initiated antiretroviral therapy in Haiti from 2003 to 2014, we aimed to find associations that were predictive of death using three different methods: complete case, inverse probability weights and multiple imputation with chained equations. These three procedures have different assumptions and differed in the number of observations included in the adjusted model due to how missing values for co-variates were addressed. Although the point estimates were similar across the three models, for statistically significant factors we found as much as a 20% difference in odds ratio values. For statistically significant factors, such as severe poverty and WHO stage, the odds ratios in the MICE models were farther away from the null compared to the CC and IPW models. Severe poverty was a statistically significant predictor of death in the MICE model (OR 1.80; 95% CI: 1.28–2.52). In a similar cohort from the same clinic in Haiti, income was associated with a higher odds of attrition (OR 1.65; 95% CI: 1.25–2.19) [
45]. Additionally, these estimates are similar to those from an intensive contact tracing program performed in Malawi on HIV positive patients, which found about 70% of people who were initially categorized as LTF were alive and 30% were dead [
13].
Worldwide, LTF rates for patients who have initiated ART treatment for at least one year range from 5 to 53% [
1,
3‐
10]. Patient characteristics associated with becoming LTF include being clinically ill, as measured by CD4 count or WHO symptom staging, low socioeconomic status, and concern for stigma, as well as structural factors such as transportation issues [
3,
7‐
10,
45‐
47]. Several studies have reported high rates of re-engagement in care by patients who were previously labeled as LTF [
3,
4,
7,
8,
11,
45]. A study in South Africa found that up to 50% of patients who disengaged from care will re-engage within 3 years including care received at a hospital or emergency department visit [
7]. Contemporary studies that were able to determine the true status of LTF patients—which is a small number—most had transferred care to clinics closer to their home or newer clinics that provide different services; or alternatively, were alive and not engaged in care [
3,
4,
7,
11,
45]. Forster et al. found a strong correlation between clinics with high LTF rates also had high rates of missing data for patient characteristics [
1]. Ideally, a formal tracking system that “follows” patients when they receive care at other institutions would be an optimal way to track silent transfers; however this is still in development in most countries [
3,
4,
7,
10,
48]. With these findings that most LTF patients are actually alive, our method of imputing LTF status and missing covariates, at the same time, is a cost effective method to estimate true mortality and to study risk factors for HIV.
Each of the described methods in this article has different assumptions for LTF, as well as limitations and strengths (Table
4). For complete case analysis, the loss of statistical power by automatically excluding observations that have missing information is a concern for many researchers [
15,
29]. This automatic exclusion leaves room for bias depending on the types and patterns of missingness [
28,
29]. Many HIV studies have found that the underlying assumption that LTF is unrelated to mortality is an incorrect assumption and thus survival estimates and associations of death to be biased and incorrectly estimated [
17,
21,
25,
49]. Clinicians report that those who were LTF back in the early 2000’s were later found to be dead compared to more contemporary cohorts whose LTF participants are more likely to be alive [
13,
22,
25,
50,
51].
Table 4
Assumptions, Limitations, Strengths and Biases between different methods of analysis
Complete Case Analysis | Participants with missing data are a random sample of those intended to be observed [ 15, 29] | Loss of statistical power [ 56] | Automatically implemented by software Common method | Might be biased if participants with missing data are different to those with complete data [ 15] |
Survival Analysis | LTF is unrelated to mortality | Most studies found assumption to be incorrect Survival is usually overestimated | Most common method Easy to perform | |
Inverse Probability Weights from Tracing | Those unsuccessfully traced have the same mortality as those successfully traced “outcomes are missing at random after accounting for available covariates” [ 22] | Tracing was done at the end of the 10 year follow up period on everyone Case-wise deletion if covariates are missing Tracing can be difficult and expensive Only as successful as your tracing success Loss of statistical power [ 56] | Common method in HIV studies Conceptually easy to understand Best employed for monotone missing data [ 29] | Biased estimate of effect size [ 56] Residual selection bias [ 22] |
Multiple Imputation with Chained Equations | Missing are only randomly different from patients with same set of covariates | Relies on a good prediction model Susceptible to human error [ 29] | Use all observations Robust standard error Least biased estimates of effect size [ 56] Gains in precision of estimation of effects [ 15] | If data are not MCAR results might be biased away from the null [ 29] |
With regards to IPW from tracing data, there are many limitations associated with this methodology. IPW from tracing techniques assume that the traced participants are a representative sample of all LTF. With this assumption in mind, a random sample of LTF participants is selected for tracing [
13,
20,
21,
52‐
55]. In this cohort, tracing was attempted on all participants who were LTF and was performed with telephone and in-person follow up. Additionally, in this cohort, tracing was done at the end of the 10 year follow-up period, and those who were more recently lost were more likely to be found compared to those lost at the beginning of the follow-up period. Another limitation, inherent in most IPW analyses, is the non-inclusion of several observations because of automatic case-wise deletion by the analysis software due to missing data. With this in mind, estimates might be biased and a loss of statistical power might occur when utilizing this method [
22,
56].
Unlike IPW, MICE is able to use all the observations in a dataset by imputing the missing values, resulting in robust results. However, it too has assumptions and is prone to limitations. One major assumption is that the risk of death among patients who are LTF is constant over time. This may not be the case as mortality is known to be highest in early periods after ART initiation and decreases over time [
33,
34,
45,
57]. Additionally, MICE relies on a good prediction model and requires data to be missing at random (MAR) [
29,
31]. Although MAR is difficult to ascertain, recent publications have explored the application of MICE in non-MAR situations and found that a small amount of bias might be present in the results. However, compared to the other methods, the small amount of bias that might be present is offset by the gains of using all observations present in the dataset and the robust standard errors calculated by the procedure [
29,
30,
58]. Several studies have incorporated MICE as a method to estimate associations due to attrition or lost to follow up in longitudinal studies [
59,
60]. Regardless of the method used, one must diligently explore patterns of missingness before performing any analyses [
10,
25,
28‐
31]. We believe that, despite some limitations with MICE, the benefits of using all available data and the subsequent calculation of robust standard errors outweigh the limitations. Therefore, the approach of imputing both the outcome and covariates seems better than more traditional methods.
Although we describe a statical approach to approximating survival rates, implementation research is needed to determine the effectiveness and scalability of interventions to keep patients engaged in care and to return them into care [
3,
44,
45,
48]. HIV programs should consider including sensitivity analyses or other methods for estimating the vital status among those categorized as lost, as traditional methods, such as CC, IPW, Kaplan Meier and Cox proportional hazards models,do not consider that patients who are lost re-engage in care. The multiple imputation method that we describe in this paper provides an estimate that is closer to the actual outcome rates. Further research is needed to test this method in other countries and HIV programs to see if it provides outcome estimates close to actual rates.