Background
Alzheimer’s disease, in combination with its clinical manifestation/syndrome (AD) [
1], is a progressive, multifaceted disease whose cognitive symptoms surface years after disease onset [
2]. In order to identify crucial opportunities for medical interventions that could potentially prevent or delay symptoms, it is vital to understand the temporal relationship of pathological changes underlying the progressive nature of AD. To this end, cognitive assessments and a wide range of biomarkers, including cerebrospinal fluid (CSF) markers and neuroimaging-derived measures, have been established to monitor the disease’s progression. Measuring these markers enables the observation of biochemical, structural, functional, and cognitive changes that occur as the disease progresses [
3] and the resulting data can build the basis for data-driven approaches that aim to determine the relative temporal dependencies between biomarkers and cognitive symptoms [
4]. Previously, a variety of data-driven models have been developed with the aim of accomplishing this task [
5‐
10].
One model archetype that has found wide success in the context of neurodegenerative diseases [
11‐
14] and AD specifically [
15] is the event-based model (EBM) [
13]. It is a data-driven probabilistic generative model that characterizes the progression of a disease in the form of a single sequence of events which describes the relative order of measured markers turning from a normal state to a diseased state (i.e., abnormal state). Such event sequences carry the benefit that they are highly interpretable and, although describing disease progression, can already be learned from cross-sectional cohort study data. Previously, EBMs have been used to derive event sequences [
13], stage subjects in their disease progression [
15], predict conversion from one clinical stage to the other (i.e., cognitively unimpaired (CU) to mild cognitive impairment (MCI), or MCI to AD) [
16], and uncover disease phenotypes with distinct temporal progression patterns.
To build an EBM, patient-level data are needed on which the model can be fit. In recent decades, an increasing number of observational cohort studies have released their collected data for research purposes, including the Alzheimer’s Disease Neuroimaging Initiative (ADNI) [
17], the European Prevention of Alzheimer’s Dementia (EPAD) [
18], and AddNeuroMed [
4]. So far, however, only a few studies in the AD domain have applied EBMs to data from other cohorts besides ADNI [
19,
20]. Previous work evaluating data-driven progression modeling based on cohort datasets has shown that the participant recruitment procedures can introduce cohort-specific systematic statistical biases into the collected data [
21], which, in turn, can bias the estimation of disease progression [
22]. Therefore, it is necessary to replicate and validate data-driven results in independent cohorts to ensure robust conclusions. Consequently, it remains unclear whether event sequences determined from one cohort dataset would generalize beyond the discovery cohort itself and, further, if sequences generated across several cohorts were concordant among each other. Simultaneously, gaining a comprehensive event sequence combining all relevant AD biomarkers, cognitive assessments, and functional scores is infeasible, since cohort studies can only measure a limited set of variables that are often only partially overlapping between them [
23]. In theory, however, this allows for an estimation of individual event sequences from distinct cohorts which cover complementary sets of markers. Aggregating results across cohorts would harness this complementary information by assembling a meta-sequence that provides a more complete picture of the development and progression of AD.
In this work, we present a systematic, in-depth comparison of AD event sequences derived from ten independent landmark cohort studies to investigate the generalizability and robustness of EBM-derived AD progression patterns. Furthermore, we designed a novel rank aggregation algorithm which we used to aggregate the event sequences into a single meta-sequence, thereby fusing the complementary information in all variables assessed across the studies. Our work harnesses the heterogeneity in cohort study designs and measurements to produce a meta-sequence providing a more complete, and robust, picture of the temporal order of pathological marker changes in AD progression.
Discussion
In this work, we used EBMs to investigate AD progression across ten independent cohort studies by evaluating the concurrence of their individually derived event sequences. Furthermore, we proposed an algorithm to combine event sequences estimated from partially overlapping, and thus complementary, sets of variables into a single meta-sequence describing AD progression more comprehensively. Finally, we applied said algorithm on the ten event sequences to estimate a meta-sequence comprising 13 AD variables spanning CSF biomarkers, MRI measures, and clinical assessments of cognitive and functional performance.
Consistent trends across cohorts’ event sequences
The derived event sequences proved to be broadly consistent across cohorts, with the most notable variability in the ordering of MRI brain volume events. This could be caused by (1) distinct statistical biases of the cohorts for example introduced through specific recruitment criteria [
21], (2) distinct prevalence of AD disease progression subtypes that follow different disease mechanisms [
38‐
40], or (3) mixed neuropathologies.
Inclusion and exclusion criteria of a study shape the demographic compositions of its cohort and thus can directly affect the data-driven disease progression patterns (Table
S3). For instance, ADNI held a higher proportion of APOE4 carriers compared to JADNI. Given that it has been repeatedly reported that early TAU depositioning is more prominent in APOE4 carriers [
41‐
43], this difference might explain the earlier positioning of TAU in ADNI’s sequence opposed to its relatively lower rank in JADNI’s.
Previously, for example, two empirically determined AD progression subtypes called “hippocampal-sparing” and “limbic-predominant” were described and associated with distinct patterns of brain atrophy [
38,
44]. While structural changes in the brain start with atrophy in the medial temporal lobe (e.g., entorhinal and hippocampus) for the limbic-predominant subtype, the brain deterioration in the hippocampal-sparing subtype begins with atrophy of the frontal cortex and with the enlargement of ventricles [
44]. Given their respective event sequences, this could indicate that OASIS, ADNI, and NACC might have included more patients expressing the limbic-predominant subtype, while the hippocampal-sparing subtype was more dominant among patients from ARWIBO and JADNI.
We observed that CSF biomarkers placed first in all cohorts which measured them. This finding is in concordance with previous biomarker studies that observed the occurrence of both ABETA accumulation and brain atrophy before global cognitive decline [
45‐
48].
Autopsies of AD patients have shown that AD pathology hardly appears in isolation and that patients often suffer from a mixture of brain pathologies [
49]. While most studies aim to exclude patients affected by other cognitive diseases, an AD clinical diagnosis is still mainly symptom driven and misclassification errors are possible.
A particular strength of our meta-sequence algorithm is that it works agnostic towards the differences in variable value representations exhibited across cohorts. A direct comparison of the provided data values often remains challenging without introducing statistical biases since studies differ, for example, in their data collection procedures, employed imaging machinery, and used assays. Using our approach, such semantically equivalent but statistically heterogeneous information can be combined as all computations are performed solely on the base sequences and thus potential across-cohort-biases due to value representations are avoided.
The biggest advantage of the bootstrapping approach compared to ML sequence-based one is that it allows for uncertainty quantification. However, bootstrapped EBM sequences tend to display a substantially higher positional variance (i.e., “fuzziness”) than ML derived ones (for an example, see Firth et al. Figures
1 and
2 [
35]). Comparing our ML-based meta-sequence to the bootstrapping-based meta-sequence revealed high similarity between them. Observed differences seemed to be within variational limits expressed in the bootstrapped meta-sequence and mainly affected MRI variables.
One possibility to validate the derived meta-sequence was to evaluate its concordance with previous findings describing the temporal relationship between smaller subgroups of variables.
The ordering of CSF biomarkers discovered in previous EBM studies supported our observations in the meta-sequence (ABETA followed by PTAU and TAU) [
15]. Our findings were also in line with a recent study [
50] which demonstrated that TAU and PTAU become abnormal after ABETA and that their abnormality occurred in close temporal relationship with cognitive decline. The latter was also in concordance with our findings; however, the cognitive assessments we investigated (i.e., LDEL and LIMM) were not directly included in the referenced study. Furthermore, there is a well-established association between cognitive decline and ABETA abnormality and abundant evidence that changes in cognition typically occur after abnormalities related to CSF biomarkers [
45,
50,
51].
Our observation that memory function showed abnormality before brain volumes agrees with previous studies which suggested that individual-level brain atrophy rates (not assessed in our study) precede cognitive events; however, MRI-derived brain volumes become abnormal afterwards [
15].
In our meta-sequences, changes in MRI biomarkers were ranked after cognitive decline. In agreement with this, for example, Hadjichrysanthou et al. reported that changes in MRI markers appear in close succession with memory decline [
52]. Also, the positioning of MRI variables with respect to CSF markers was concordant with previous observations where significant correlations between CSF biomarkers and temporal lobe atrophy were found [
53‐
55]. These studies argue that increases of TAU and PTAU are attributable to the deposition of neurofibrillary tangles in the temporal lobe, including the hippocampus and entorhinal cortex, which we found to be the first brain region volumes turning abnormal. Furthermore, elevated CSF biomarkers predicted future brain atrophy in these regions (i.e., CSF biomarkers became abnormal before brain volumes).
In concordance with the relative positioning of MRI biomarkers in the meta-sequence, various studies have shown that volumetric changes start with the temporal lobe areas, including the hippocampus which preceded the abnormality of the entorhinal cortex, fusiform, and middle temporal, and further proceed to other brain regions such as the ventricles [
56‐
59].
Finally, in agreement with a previous study [
60‐
63] in which visual memory dysfunction was identified as one of the last stages in AD progression, the FIGC test was ranked among the end of the sequences. The fact that it was positioned after the enlargement of ventricles is in agreement with experimental evidence that changes in the ventricles may precede a deficit in visual memory function [
64,
65]. Another EBM study [
35] also suggested that visual processing becomes impaired after episodic memory in typical AD.
The conducted patient staging provided further evidence that the generated meta-sequence described a sensible cascade of AD progression: participants from the three diagnostic groups were distributed according to their disease severity with CU subjects being staged first, MCI patients spreading around the intermediate stages, and AD cases occupying the later stages of the sequence. Observing MCI subjects at stage 0 could be explained by CSF biomarker values and cognitive scores that were close to the probabilistic event threshold but did not yet exceed it and, consequently, the model considered them to be normal. The few AD cases that were staged early in the sequence were amyloid-negative subjects which potentially indicated their misclassification.
Limitations
To build a robust meta-sequence, each variable had to be present in at least some of the base sequences to allow for meaningful distance calculations. Furthermore, the high amounts of missing data occurring when multiple data modalities are combined led to a substantial decrease of the number of available participants per study. This could have led to more noise in the EBM’s reference distributions. Additionally, modeling signals from heterogeneous data sources, such as AD cohort data, as some form of average bears the potential risk that the resulting average will resemble a rather artificial construct that cannot be observed in its specific form in the real world. However, the similarity among the base sequences as well as between base sequences and the final meta-sequence was quite high and our identified meta-sequences were highly concordant with results from both data-driven and experimental studies. Furthermore, the patient staging along the meta-sequence displayed a sensible distribution of CU, MCI, and AD subjects along the disease stages. Consequently, it is improbable that the presented meta-sequence represents such an artificial average. Finally, we want to highlight again that AD was considered primarily from a clinical perspective in all of our investigated cohort studies. As such, there is a chance that misdiagnosed patients were present in the cohorts and therefore included in this analysis as well.
Acknowledgements
We want to commend all data owners on their adherence to open science principles by sharing their data. We believe that their commitment is invaluable for AD research.
Data collection and sharing for this project were funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI; National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie; Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F.Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd; Janssen Alzheimer Immunotherapy Research Development, LLC; Johnson Johnson Pharmaceutical Research Development LLC; Lumosity; Lundbeck; Merck Co., Inc.; Meso Scale Diagnostics, LLC; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private-sector contributions are facilitated by the Foundation for the National Institutes of Health (
www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for NeuroImaging at the University of Southern California.
Data collection and sharing of ARWIBO was supported by the Italian Ministry of Health, under the following grant agreements: Ricerca Corrente IRCCS Fatebenefratelli, Linea di Ricerca 2; Progetto Finalizzato Strategico 2000-2001 “Archivio normativo italiano di morfometria cerebrale con risonanza magnetica (età 40+)”; Progetto Finalizzato Strategico 2000-2001 “Decadimento cognitivo lieve non dementigeno: stadio preclinico di malattia di Alzheimer e demenza vascolare. Caratterizzazione clinica, strumentale, genetica e neurobiologica e sviluppo di criteri diagnostici utilizzabili nella realtà nazionale”; Progetto Finalizzata 2002 “Sviluppo di indicatori di danno cerebrovascolare clinicamente significativo alla risonanza magnetica strutturale”; Progetto Fondazione CARIPLO 2005-2007 “Geni di suscettibilità per gli endofenotipi associati a malattie psichiatriche e dementigene”; “Fitness and Solidarietà”; and anonymous donors.
J-ADNI was supported by the following grants: Translational Research Promotion Project from the New Energy and Industrial Technology Development Organization of Japan; Research on Dementia, Health Labor Sciences Research Grant; Life Science Database Integration Project of Japan Science and Technology Agency; Research Association of Biotechnology (contributed by Astellas Pharma Inc., Bristol-Myers Squibb, Daiichi-Sankyo, Eisai, Eli Lilly and Company, Merck-Banyu, Mitsubishi Tanabe Pharma, Pfizer Inc., Shionogi Co., Ltd., Sumitomo Dainippon, and Takeda Pharmaceutical Company), Japan, and a grant from an anonymous foundation.
The NACC database is funded by NIA/NIH Grant U01 AG016976. NACC data are contributed by the NIA-funded ADCs: P30 AG019610 (PI Eric Reiman, MD), P30 AG013846 (PI Neil Kowall, MD), P30 AG062428-01 (PI James Leverenz, MD) P50 AG008702 (PI Scott Small, MD), P50 AG025688 (PI Allan Levey, MD, PhD), P50 AG047266 (PI Todd Golde, MD, PhD), P30 AG010133 (PI Andrew Saykin, PsyD), P50 AG005146 (PI Marilyn Albert, PhD), P30 AG062421-01 (PI Bradley Hyman, MD, PhD), P30 AG062422-01 (PI Ronald Petersen, MD, PhD), P50 AG005138 (PI Mary Sano, PhD), P30 AG008051 (PI Thomas Wisniewski, MD), P30 AG013854 (PI Robert Vassar, PhD), P30 AG008017 (PI Jeffrey Kaye, MD), P30 AG010161 (PI David Bennett, MD), P50 AG047366 (PI Victor Henderson, MD, MS), P30 AG010129 (PI Charles DeCarli, MD), P50 AG016573 (PI Frank LaFerla, PhD), P30 AG062429-01(PI James Brewer, MD, PhD), P50 AG023501 (PI Bruce Miller, MD), P30 AG035982 (PI Russell Swerdlow, MD), P30 AG028383 (PI Linda Van Eldik, PhD), P30 AG053760 (PI Henry Paulson, MD, PhD), P30 AG010124 (PI John Trojanowski, MD, PhD), P50 AG005133 (PI Oscar Lopez, MD), P50 AG005142 (PI Helena Chui, MD), P30 AG012300 (PI Roger Rosenberg, MD), P30 AG049638 (PI Suzanne Craft, PhD), P50 AG005136 (PI Thomas Grabowski, MD), P30 AG062715-01 (PI Sanjay Asthana, MD, FRCP), P50 AG005681 (PI John Morris, MD), P50 AG047270 (PI Stephen Strittmatter, MD, PhD).
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.