Background
Semi-competing risks refers to the setting where interest lies in the time-to-event for some so-called
non-terminal event, the observation of which is subject to some
terminal event [
1]. In contrast to standard competing risks, where each of the outcomes under consideration is typically terminal (e.g. death due to some cause or another), in the semi-competing risks setting it is possible to observe both events on the same study unit, so that there is at least partial information on their joint distribution [
1,
2]. Take as an example the study of dementia among the elderly [
3], a complex neurocognitive condition that is estimated to affect nearly 6 million individuals aged 65 and older in the US [
4], a number that has been projected increase to 13.9 million by 2060 [
5]. It is known that the risk of death is higher among those who are diagnosed with dementia [
6]. As such, studies seeking to investigate risk factors for dementia must contend with death as a competing risk, which precludes the subsequent observation of dementia. However, it is possible to observe both outcomes among individuals who die following a diagnosis of dementia. This information can potentially increase efficiency of results and be used to assess the dependence between the nonterminal and terminal events.
Towards the analysis of semi-competing risks data, the statistical literature has focused on three broad frameworks that seek to exploit the joint information on the time to the non-terminal event and the time to the terminal event [
7]: those based on copulas [
1,
8‐
10]; those framed from the perspective of causal inference [
11,
12]; and, those based on the illness-death multi-state model [
2,
13‐
16]. In this paper, we focus on the last of these approaches, for which the philosophical underpinning is that patients begin in some initial state at time zero and may transition into the non-terminal and/or terminal state [
14,
16‐
18]. Analyses typically proceed through the development of models for transition-specific hazard functions (which dictate the rate at which patients experience the events), often with the use of subject-specific frailties, which can be viewed as individual-specific random effects that acknowledge the heterogeneity across individuals that is not accounted for by covariates [
19,
20]. Moreover, the shared frailty accounts for dependence between the nonterminal and terminal events, which can be quantified from an estimable frailty variance parameter.
In the analysis of time-to-event outcomes, data are subject to left-truncation or delayed entry when subjects are enrolled into a study after the time origin of interest. Left-truncation is common in the study of aging and dementia, where the age scale is commonly taken to be the time scale [
21‐
23]. In this setting, sampling is biased toward longer follow-up times since patients are typically only included in the study if they are dementia-free at study entry. The analysis of left-truncated time-to-event data should apply statistical methods that account for this bias. Although current methods exist for analyzing left-truncated semi-competing risks data via a standard illness-death model (without a shared frailty) [
24,
25] and have been applied to Alzheimer’s disease [
26‐
29], to our knowledge, there are no published methods in the literature for fitting an illness-death model with shared frailty to left-truncated semi-competing risks data. The purpose of this paper is to fill this gap by providing a modeling framework and openly available code for implementation.
Discussion
For the analysis of left-truncated semi-competing risks data, we have provided methods and software for fitting an illness-death model with shared frailty assuming Weibull or B-spline baseline hazards. These methods were used to estimate the association of education level and dementia accounting for the competing risk of death in a cohort of 33,117 Kaiser Permanente Northern California members. We found a dose-response relationship between educational attainment and incident dementia, with a decreased risk of dementia associated with increasing levels of education, after adjusting for sex and race/ethnicity. The impact of education level and incident dementia is still not well understood, with published studies reporting both protective and null effects [
46,
47]. Our study supports that higher education is associated with a lower risk of dementia in a large US cohort.
Note that the conclusions drawn for the outcome of interest (dementia) from the illness-death model with shared frailty aligned with those from alternative models that we considered, which omitted the shared frailty or used study entry as the time origin (as shown in Table
3). We expect that there will be variation in the estimates from these models depending on the data at hand, as was observed in the estimates of education level on the risk of death following dementia diagnosis in our study.
In our examination of model fitting operating characteristics via simulated data, we found that the regression parameters were estimated with negligible bias and good coverage. However, on average the frailty variance parameter was slightly underestimated for the Weibull baseline hazard parameterization and overestimated for the B-spline baseline hazard parameterization. For the case of the Weibull baseline hazard parameterization, the coverage was conservative for a sample size 5,000, but was closer to 95% when the sample size was increased 10,000 (see Table B.2 in the Additional file
1). For the case of the B-spline baseline hazard parameterization, the coverage was lower than 95% due to the bias in the frailty variance parameter. It is important to note that primary interest in the methods presented in this paper are the regression parameters. The frailty, and corresponding frailty variance parameter, allow us to further account for the dependence between the nonterminal and terminal events beyond covariate adjustment, analogous to a random effect in a random effects model. Similar to a random effects model, the primary interest lies in the mean outcome model and regression estimates; the variance parameter of the normally distributed random effects are typically of secondary interest.
In the analysis of dementia diagnoses such as those presented in this paper, prevalent cases at study entry are typically excluded. However, the likelihood in “
Likelihood” section can be easily updated (see Additional file
1: Section D) to include prevalent nonterminal cases. Note that in the literature, there are two approaches for handling prevalent nonterminal cases in the analysis of left-truncated semi-competing risks data. The approach we take conditions on the history up to the left truncation time as in [
25,
37] so that prevalent nonterminal cases only contribute to the estimation of
λ3. Estimation is straight-forward assuming an illness-death model with shared frailty since the frailty term,
γ, can be easily integrated out. Alternatively, Saarela and colleagues [
24] provided methods for estimation that conditions on the left-truncation time only, so that prevalent cases contribute to the estimation of all transition hazards. This approach is more efficient as it uses more of the data, but is computationally more intensive as it involves numerical integration. This approach does not accommodate an illness-death model with shared frailty well since the integration of the shared frailty term is not straight-forward.
In our approach to fitting an illness-death model with shared frailty to left-truncated semi-competing risks data, we have considered fully-parametric specifications of the baseline hazard functions, Weibull and B-spline. Both functional forms are flexible and can approximate a wide range of baseline hazard functions. At the time of submission, a pre-print by Gorfine et al. [
48] proposed a semi-parametric approach to the illness-death model with shared frailty using a pseudo-likelihood approach to estimating the regression parameters and baseline hazard functions that accommodates left-truncated semi-competing risks data. While the illness-death model with shared frailty in this paper was formulated using hazard models that are conditional on the frailty in (
4)-(
6), Gorfine et al. [
48] focused on marginal Cox hazard models. This is analogous to the conditional and marginal approaches to modeling mean outcomes in the presence of clustering via mixed models [
49] and generalized estimating equations [
50], respectively. Thus our approaches are complementary, filling a gap in the literature and allowing the analyst options for fitting an illness-death model with shared frailty to left-truncated semi-competing risks data.
One of the reviewers pointed out that dementia diagnoses may be subject to interval-censoring. To explore the possibility and/or extent of interval-censoring in the cohort, we looked at the patterns of inpatient and outpatient visits (during which dementia might be assessed) among two groups of members: those who were diagnosed with dementia during the study, and those who died without a dementia diagnosis. The concern is that long gaps between visits would lead to imprecise dementia diagnosis dates in the former and missed opportunities for dementia diagnoses in the latter. We found that among those who were diagnosed with dementia during the study, 81% had a visit with a physician within 60 days prior to the diagnosis date in the EMR. For those who died without a dementia diagnosis, 90% had a visit within 60 days prior to death. Plots of individual-level visit patterns over the study period (see Supplementary File Figures C1 and C2) among members of these two groups illustrate that utilization is high in this cohort.
Based on these data, we believe that interval-censoring is not of major concern in our data, as it is in prospective studies of Alzheimer’s disease or dementia, such as PAQUID [
51,
52] and the Adult Changes in Thought Study [
53,
54], where dementia screening can be years apart. While analyses of data from those prospective studies are indeed complicated by interval-censoring, they were designed for the purpose of understanding incident dementia with identification of dementia cases based on a battery of neuropsychological testing and confirmation by a neurologist. An EMR-based analysis of a cohort of high care utilizers may avoid interval-censoring, however, may capture dementia cases with less rigor. At KPNC, a similar set of EMR code used to identify dementia diagnoses was shown to have a sensitivity of 77% and a specificity of 95% compared with a consensus dementia diagnosis utilizing a neuropsychiatric battery, structured interviews, physical examination, and medical records review. If interval-censoring were evident in our data, modeling should be updated to account for interval-censoring. This can be done by updating the likelihood function, as in Touraine, et al. 2017 [
52]. In the setting of a shared frailty illness-death model presented in this paper, this is an avenue for future research.
It is important to mention that we provide methods for fitting a shared frailty illness-death model subject to left-truncated data when the covariate of interest is fixed with time. As one reviewer aptly pointed out, time-varying covariates may also be of interest to an analyst. We believe that the modeling framework specified in this paper can incorporate time-dependent covariates. However, deriving the likelihood function for a shared frailty model requires marginalization (integration) of the frailty term, which is complicated when time-dependent covariates are used and, such, beyond the scope of this paper. We intend to explore the implementation of the proposed extension with time-varying covariates in future work.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.