Background
The rate of progression of Alzheimer’s disease (AD) varies across patients, making it difficult to generate accurate estimates of the course of disease or time until specific disease endpoints for individual patients [
1]. Moreover, differences in group-specific rates of progression and treatment efficacies in therapeutic trials may be confounded by individual variation in rates of progression, making it difficult to evaluate the effectiveness of randomization [
2]. All of these difficulties are exacerbated by two additional factors: (1) the clinical presentation at diagnosis is highly variable over individual patients with AD—involving cognitive, functional, behavioral, psychiatric, and other symptoms; and (2) the neuropathological substrates of AD—involving neuronal dysfunction, neurodegeneration, synaptic dysfunction, cerebral atrophy, and other pathologies—differentially influence the clinical course of AD in ways that are poorly understood [
3]. For example, there are no known biomarkers that closely track the progression of AD clinical signs/symptoms or uniquely identify their presence [
4]. Thus, the development of a realistic, comprehensive, multidomain model of the progression of AD clinical signs/symptoms and outcomes in a well-defined cohort of patients with AD dementia could yield new insights into the process and accelerate the development of disease-modifying therapies. The need for such development was recognized in the call for new models of AD progression/outcomes in the recommendations from the 2015 National Institutes of Health AD Research Summit [
5]. The model reported in this paper is intended to advance this development.
Our prior work in this area was focused on maximum likelihood estimation and cross-validation of a longitudinal Grade of Membership (L-GoM) model of AD clinical signs/symptoms [
6‐
8]. L-GoM is a latent-variable model that resolves the difficult problem of extending multivariate latent-variable analysis from cross-sectional to longitudinal data [
9,
10]. Under our prior L-GoM model, the maximum likelihood estimates of the basic parameters (i.e., the individual-specific “GoM scores”) were treated as data-based computational phenotypes [
11,
12] that quantified the underlying neuropathophysiological processes giving rise to the clinical manifestations of AD recorded in the longitudinal data. In effect, the GoM scores were assumed to model the entire disease process, capturing individual differences in presentation and progression over time. The challenge in estimation was to find the optimal mapping from the data to the GoM scores.
In the present study, we modified and extended L-GoM to directly map the GoM scores to individual-specific values of residual total life expectancy (TLE) and its decomposition into disability-free life expectancy (DFLE) and disabled life expectancy (DLE), thereby obtaining a composite mapping from the data to an important set of AD timing estimates with direct clinical interpretability. To construct this composite mapping, we respecified L-GoM as an extension of Sullivan’s life table (SLT) [
13].
The combined SLT/L-GoM model has four advantages over existing alternatives. First, the standard Cox model [
14] assumes that covariates are fixed at baseline and hazard rates are proportional over follow-up time. Neither assumption holds for AD (e.g.,
see [
7] and Fig.
4 below). Second, the time-dependent Cox model [
15] resolves the first problem but introduces a new problem: temporal changes in covariates are not modeled, implying that another model (e.g., a general linear mixed model [
16‐
18]) is needed to represent those changes. Third, cognitive, functional, behavioral, psychiatric, and other measures and their changes are correlated for patients with AD, presenting formidable technical challenges for modeling AD progression under existing approaches [
2,
19]. L-GoM meets these challenges by using latent variables (GoM scores) to generate the correlations between the observed covariates [
7]. Fourth, L-GoM incorporates the SLT without making any assumptions about the transitions between healthy and morbidity/disability states, a difficult task in AD modeling [
19,
20].
Our prior L-GoM model used one of two separate study cohorts, Predictors 1 (
N = 252), for estimation and the second, Predictors 2 (
N = 254), for cross-validation [
7,
8]. Several technical refinements have since been developed to meet the SLT assumptions, to incorporate fixed genetic and other data, and to optimize the model for personalized predictive applications. In the remainder of this paper, we present and apply the newly developed SLT/L-GoM model to the Predictors 2 data; characterize the most salient clinical features of the resulting subtypes; present estimates of TLE, DFLE, and DLE for the associated subgroups; and discuss how the model can be used in future research and clinical applications, including situations where the input data come from just one examination concurrent with or shortly after AD diagnosis [
8].
Discussion
This study provides the first published estimates of the L-GoM extension of the SLT model. Our motivation for this extension was fourfold. First, our analysis supports the hypothesis that patients with AD are heterogeneous in initial presentation and in rates of progression [
1], implying that adequate characterization of the clinical course of AD requires a parsimonious multivariate latent-variable model such as L-GoM [
10]. Second, the ability to directly map the GoM scores to TLE, DFLE, and DLE focuses attention on these readily understood, familiar metrics. This contrasts with existing factor analytic models [
9] that cannot incorporate the SLT model and cannot extract TLE, DFLE, and DLE from patient-level longitudinal data [
10,
19]. Third, predictions of TLE, DFLE, DLE, and associated survival curves for many types of disability, especially FTC, are central to important decisions in AD treatment and patient care; they represent information that patients with AD, their families, and caregivers want to know. Fourth, the L-GoM extension of the SLT model can be used to assess the effects of treatment on disability-free and disabled survival. Lifetime costs can be calculated by combining estimated survival curves and cost functions for selected disability measures, implying that the SLT/L-GoM model can be used as a realistic, comprehensive modeling framework for endpoint and resource use/cost calculations for individual patients with AD and subgroups. The appendix in Additional file
1 provides all parametric estimates needed for hypothesis generation and further exploration of AD using the SLT/L-GoM model (Additional file
1: Tables A.3–A.6).
Our prior L-GOM model was based on longitudinal data from the Predictors 1 Study cohort [
7]. Subsequently, we forward-applied that model to baseline data from the Predictors 2 Study cohort and showed that it accurately predicted times to FTC, nursing home care, and death [
8]. Although that model was a major advancement, we updated the L-GoM model in this study for four reasons, described below.
First, several advances were made to the L-GoM estimation software, including more accurate representations of the effects of fixed covariates such as sex, race, age, occupation, and ApoE status, using only information from examination 1, which satisfies Drachman’s [
43] call for prognostic covariates that are independent of initial severity. Race was dichotomized as white vs. nonwhite because of an insufficient sample size to support further stratification. Only 17 of the 229 subjects were nonwhite. Education was evaluated as a potential covariate, but it did not contribute significantly to the model (
p = 0.30). Only 8 of the 229 subjects had less than 9 years of education. Another advance was the weighted maximum likelihood estimation procedure, which took into account the unique status of examination 1 as the only examination that spans the full range of the prognostic subtypes (Fig.
1). The algorithm for generating the excess weight for examination 1 used the Akaike information criterion procedure [
44] to limit the loss of fit for examinations 2–21 and maintain the accuracy of the estimated pure-subtype trajectories extending from the prognostic subtypes to the terminal subtype (Additional file
1: Table A.5).
Second, the Predictors 2 Study had several advantages over Predictors 1, including the availability of ApoE genotype at examination 1. The updated model used pooled male/female data; the prior model used sex-specific data. The pooled data yielded more accurate parametric estimates, which revealed significant sex and ApoE genotype differences.
Third, the updated model incorporated several new covariates and refined versions of others, including individual items from the BDRS [
38,
39] and MMSE [
27], individual motor signs, and depression measures. These changes contributed significantly to the characterization of prognostic subtype 3, allowing the clinical presentation of this subtype to be clearly distinguished from that of subtype 2 (Table
2). Six summary measures were processed using conditional maximum likelihood estimation procedures that did not impact the estimated GoM scores. They were ranked as follows (based on
p values) (Additional file
1: Table A.2):
dependence scale score [
40],
BDRS score [
38,
39],
CDR rating [
25],
MMSE score [
27],
psychiatric symptoms [
41,
42], and
total weekly alcohol consumption [
45]; the top five were included in Table
2.
Fourth, the updated model generated maximum likelihood estimates of the TLEs, DFLEs, and DLEs for individual patients with AD and for the aggregates of individual patients in subgroups 0–4 [
46]. We assessed the validity of the updated model by showing that the GoM subgroups all had very accurate predictions of FTC and mortality at or following each of the 21 examinations (Figs.
2 and
3). This assessment is new; it was not done for the prior model. It follows that the updated model generates even more accurate, valid, and informative representations of AD progression than the prior model. The updated model also explains why our prior Cox analyses [
14,
32] were successful in predicting FTC and mortality. Both endpoints were strongly associated with subtype 4, the terminal subtype of the L-GoM process. Hence, any covariate strongly associated with subtype 4 should work well as a predictor in the Cox model [
14] (e.g., those with high severity for subtype 4 in Table
2).
By creating relatively homogeneous rational subgroups [
35] of patients, such as subgroups 0–4 in Predictors 2, we could demonstrate that the estimated survival closely matched the actual survival for any homogeneous subgroup. The goodness-of-fit plots in Figs.
2 and
3 showed that the survival and disability (need for FTC) variables were well-estimated for all subgroups and observation times. These variables were but 2 of the 80 variables in the final model; they were treated like the other 78 variables so that the results could be representative of the entire AD process. Alternative measures of disability could be incorporated into these calculations on the basis of any of the disability-related covariates included in the study. Zehna’s theorem [
46] ensures that the resulting individual-specific TLEs, DFLEs, and DLEs are maximum likelihood estimates (
see Additional file
1). Forward application of the model to other prospective datasets will be required to further validate its statistical optimality and general applicability; these analyses are underway.
The weighted maximum likelihood estimation procedure ensures that the GoM score estimates from examination 1 alone are of high quality. It follows that maximum likelihood estimates of patient-specific GoM scores and survival curves can be generated conditionally on the parameters presented in Additional file
1: Tables A.3 and A.4 using input data from just one examination concurrent with or shortly after AD diagnosis [
8]. Hence, the SLT/L-GoM model could be used for personalized predictive modeling for new patients with AD—with important caveats. Accurate estimation of the individual survival curves and associated TLEs, DFLEs, and DLEs is not equivalent to precise estimation of observed times to specific disease endpoints; the timing is inherently stochastic. This stochasticity could be handled if, in addition to the mean estimates, individual patients with AD, or their physicians, families, and caregivers, were supplied with estimates of key quantiles (e.g., 10th, 25th, 75th, and 90th percentiles) of the individualized survival curves. Our findings in the present study indicate that the DFLEs differed widely as a function of GoM subgroup at the initial visit, whereas the DLEs were relatively much closer (Table
3). Hence, the variability of the TLEs is attributable primarily to the variability of the DFLEs, underscoring the importance of DFLE in prognostic applications.
Our analyses have several important limitations. First, the Predictors 2 Study cohort was a nonrandom collection of participants enrolled at three specific study sites specializing in AD, which may limit the generalizability of our results [
21]. Second, the full SLT/L-GoM model can only be estimated using longitudinal data with extensive sets of time-varying covariates at each examination. However, if such data have already been assembled, then SLT/L-GoM provides a highly efficient mode of analysis. Third, the assumed form and temporal structure of L-GoM may be oversimplified, reflecting the limited sample size available in the Predictors 2 Study, which required just two nonzero λ parameters to generate the entire ensemble of individual survival curves and just one λ parameter for the corresponding FTC curves. Subsequent applications may require additional λ parameters or more subtypes.
There are several other potential applications of L-GoM and its SLT extension. One would use L-GoM to determine expected progression in drug and placebo groups in clinical trials evaluating the effectiveness of randomization prior to the trial and comparing modeled vs. actual progression in the drug group after the trial [
47]. Alternatively, L-GoM could be used to explore how the clinical symptoms/signs captured by it correlate with measured AD biomarkers, such as by testing concurrent and lagged associations of biomarkers and time-varying GoM scores, associations that could elucidate the connections between DFLE/DLE and the neuropathology of AD [
48,
49]. With such applications, the model and its results have the potential to stimulate rapid progress in the fight against AD.