Introduction
Key evidence for the validity of amyloid PET tracers comes from end-of-life studies [
1‐
3]. In these studies, visual reads of scans based on PET tracers
18F-Florbetapir,
18F-Florbetaben, and
18F-Flutemetamol had a high diagnostic accuracy for predicting the presence of neuritic amyloid plaques. As an example, in the extended dataset from the pivotal
18F-Flutemetamol end-of-life study [
2] based on 106 cases[
4], sensitivity of
18F-Flutemetamol PET by majority read for increased neuritic plaque density was 91% and specificity was 90% [
4,
5]. Comparable diagnostic accuracy was found for
18F-Florbetapir [
1] and
18F-Florbetaben [
3].
In these pivotal clinicopathological studies, the visual read was based on a set of prior ad hoc rules for discriminating positive vs negative scans. In a previous study, a support vector machine (SVM) with a linear kernel was trained with the
18F-Flutemetamol phase 2 data and compared to visual reads [
6]. The classifier was able to replicate the visual reads with 100% concordance and revealed that the highest feature weights were localized to the striatum, precuneus, cingulate, and middle frontal gyrus. Training and testing a classifier for binary classification against a neuropathological ground truth may provide us with a more data-driven way of defining the most discriminative features, rather than an expert- or consensus-based definition based on visual read rules. Furthermore, in contrast to visual reads, SVM provides the distance to the hyperplane as a continuous measure of the level of certainty of the classifier. The hyperplane is the plane that separates the cases belonging to the two classes according to the SVM with a linear kernel. The distance to the hyperplane is a measure of the strength of evidence that the classifier has for putting a case in one or the other class. This continuous measure can then be correlated with a continuous neuropathological measure.
SVM can also be trained with a different ground truth to see whether its diagnostic performance against different neuropathological dimensions outperforms that of the a priori chosen measure of modified neuritic amyloid plaque density. Such an alternative neuropathological classification scheme is based on amyloid phases determined from Aβ immunochemistry [
7,
8]. According to Thal et al. (2002), Aβ spreads hierarchically through the brain in 5 phases: from neocortical areas (Aβ phase 1), it spreads into allocortical regions including entorhinal cortex and hippocampus (phase 2), next to the basal ganglia, hypothalamus, and thalamus (phase 3), followed by the midbrain (phase 4), and eventually into the cerebellum and pons (phase 5) [
7]. In the pivotal
18F-Flutemetamol end-of-life dataset (
n = 68), all phase 0–2 cases were read as negative and 89% of phases 4–5 as positive, whereas 33% of the phase 3 cases were read as positive [
9]. We grouped phases 0–2 and contrasted them with phases 3–5 and trained the classifier for this binary distinction. The grouping of cases with amyloid phases 0–2 as negative has been used before [
10] and also corresponds to newly developed amyloid PET classification schemes that assume that phases 0–2 are not detectable in vivo with PET [
11,
12].
Once the classifier has been trained and tested using post-mortem verification, and has proven to be accurate, it can readily be applied to different datasets without the need for laborious and time-intensive visual reads. This may be particularly useful for the detection of Alzheimer’s disease (AD) in the asymptomatic phase, since levels in the early disease stage may be equivocal and more difficult to read visually. Amyloid imaging has been instrumental in defining the asymptomatic phase of AD [
13]. Whereas visual reads of AD dementia cases versus controls have a high inter-rater reliability, the binary categorization of amyloid scans obtained in cognitively intact cases is more difficult: in cognitively intact individuals, intermediary levels of amyloid burden exist that may be difficult to categorize. Hence, cognitively intact healthy individuals may be one of the potential use cases for application of a classifier that has been trained on neuropathologically verified cases that cover a range of neuritic amyloid plaque densities and amyloid phases. We examined how the performance of the two classifiers that have been trained based on the neuropathological ground truth of neuritic plaque density or amyloid phase, related to a commonly used measure for semi-quantitative assessment of amyloid burden, the Centiloid (CL) scale [
14], in the Flemish Prevent AD Cohort-KU Leuven (F-PACK), a longitudinal observation cohort of older adults who are cognitively intact at study inclusion [
15].
Discussion
We trained a supervised machine learning classifier on the 18F-Flutemetamol end-of-life study and applied the classifier to an 18F-Flutemetamol dataset in a cohort of healthy cognitively intact older adults, the F-PACK cohort. In the F-PACK cohort, the Centiloid scale correlated more strongly with the performance of the amyloid phase-based classifier than with that of the neuritic plaque density-based classifier. Furthermore, the cut-off for discriminating positivity for neuritic plaque density was substantially higher than that for discriminating amyloid phases 3–5 from phases 0–2 based on classifier \({}_{\mathrm{select}}^{A\beta }\).
We used a leave-one-out approach for training and testing the classifier with the neuropathology as ground truth. The neuropathology ground truth is the major strength of this study. Ideally, the training and the test set are entirely independent. Given the limited number of autopsy cases available, a leave-one-out approach is the best approximation of this ideal. The case that is left out is independent of the cases on which the classifier is trained.
In the end-of-life dataset, when using neuritic plaque density as ground truth, mean specificity was practically the same (90%) as the median specificity reported by Ikonomovic et al. (2016) [
4] for the visual reads. A pathologically negative case was only rarely classified as positive. This is in agreement with the approved indication of amyloid PET for ruling out AD. In the past, the false-positive cases have been attributed to amyloid in diffuse plaques and cerebral amyloid angiopathy and mismatched (sparse) neuritic plaque burden [
4]. This is also a likely explanation for the classifier-based false-positives. The mean sensitivity of the classifier was numerically lower (83%) than the median sensitivity reported by Ikonomovic et al. (2016) [
4] for visual reads (91%). In the past, the false-negatives have been attributed to advanced cortical atrophy and the absence of MRI availability, which may also account for the false-negatives in the classifier-based discrimination of the end-of-life dataset.
A classifier trained with amyloid phases 0–2 versus phases 3–5 in the end-of-life dataset performed in line with what one would expect based on the visual read studies [
9]. Interestingly, the regions with the highest feature weights for discriminating phases 0–2 from phases 3–5 are not so much those that define amyloid phase 2 versus 3 neuropathologically (such as diencephalon and basal ganglia) but cortical regions. This indicates that the differentiation relies mainly on an overall increase in signal in key cortical areas and in the caudate nucleus due to increased concentration of Aβ aggregates/increased Aβ plaque load in the brain rather than the stepwise topographical expansion of Aβ plaque pathology as described by the amyloid phases [
7]. In this context, it is essential to note that all aspects of Aβ pathology (its topographical expansion as described by the Aβ phases, the quantitative amounts of Aβ plaques/aggregates as measured by the Aβ plaque load or biochemically, and the maturation of Aβ aggregates) correlate closely with one another allowing a good estimation of all these parameters by amyloid PET [
28]. The observation that a classifier trained on one neuropathological ground truth (neuritic plaque density or amyloid phase) could classify cases relatively accurately for the other ground truth (amyloid phase or neuritic plaque density, respectively) also testifies to this, at least when the dataset contains relatively advanced stages. As we will discuss further below, the interchangeability is less convincing for asymptomatic cases.
The sensitivity of the Centiloid (CL) method based on an ROC analysis of the end-of-life dataset was 73.6% for the neuritic amyloid plaque density and 69.6% for Thal amyloid phase. The classifier had a sensitivity of 83.7% and 84%, respectively. The sensitivity of the majority read based on the visual reads in the pivotal phase 3 study was 86%, with a confidence interval ranging from 73 to 95%, and the median of the sensitivity of the 5 readers was 88% (confidence interval 74%-96%) [
2]. The sensitivity of the classifier (83.7%) falls within this range and the study demonstrates that the classifier performs similarly to the visual reads in that respect. Both the classifier and the visual reads take into account the distribution of the values across the entire scan rather than a single composite value and this may explain their similarity in performance and constitute an advantage compared to the CL method.
Recently, amyloid PET classification schemes have been proposed based on a combination of cortical and striatal amyloid levels or reads [
11,
12]. The classifier correctly classified 3 out of 5 cases that the Thal et al. PET Aβ scheme misclassified. Five out of 12 cases misclassified by the Hanseeuw et al. PET amyloid scheme were correctly classified by the original classifier. A third 4-stage model of disease progression [
29] exists. This scheme [
29] has been developed principally through mathematical modelling of disease progression based on cross-sectional
18F-florbetapir amyloid PET scans in cognitively normal controls, mainly from the Alzheimer’s Disease Neuroimaging Initiative [
29]. It has not been applied to the current end-of-life dataset as of yet and was not included in the current study for that reason.
Visual reads are based on a set of explicit ad hoc rules, hence the interest of a data-driven definition of the anatomical distribution of the most discriminative regions. The pattern obtained for the amyloid phase-based classifier confirmed the regions that are also considered critical for visual read classification: precuneus and posterior cingulate, head of the caudate, rostral anterior cingulate, and ventromedial prefrontal cortex. These are in line with a previous SVM paper with visual reads as comparison [
6] and confirm the high feature weights of the head of the caudate nucleus as reported in that study and confirmed subsequently [
11,
12,
30]. It also revealed some less commonly used regions, namely, the posterior inferotemporal cortex and the supramarginal gyrus. It is also worth noting that two of the three regions that define stage I in the Grothe et al. [
29] staging scheme (basal temporal cortex, anterior cingulate, parietal operculum) are not among those with the highest feature weights.
More or less the same regions as for the amyloid phase-based classifier also had high feature weights for the neuritic plaque density-based classifier but the clusters for the latter classifier were more scattered and less confined. For the amyloid phase-based classifier, the visual appearance of the distribution of the highest feature weights corresponded better to regions commonly attended to for visual reads, than those of the neuritic plaque density-based classifier.
We applied the classifier to an independent
18F-Flutemetamol dataset obtained in 180 cognitively intact older adults. A classifier may be particularly useful in the asymptomatic stage of the disease, when a substantial portion of participants is situated at an intermediary level. From the SVM classifier, we derived the distance to the hyperplane for each SUVR image. This is a quantitative measure of the strength of evidence that a case belongs to one or the other class. It may be compared to the “level of confidence” of a visual read but is strictly objective. We used this measure of evidence strength to gain further insight in the link between the image classification and the continuous neuropathological measures underlying the binarized classification. When we extracted the distance from the hyperplane as a measure of classification likelihood, the distance from the amyloid phase-based classifier correlated more closely with the CL scale than when the same approach was taken for the neuritic plaque density-based classifier (
P < 3.1 × 10
–15). A closer match with amyloid phases compared to neuritic plaque density may be due to several reasons: amyloid phase takes into account both diffuse and neuritic plaques, and
18F-Flutemetamol has affinity for both types [
31].
Two other findings indicated that the transfer to a cognitively normal population worked best for classifier
\({}_{select}^{A\beta }\). The CL threshold for distinguishing amyloid 0–2 from 3–5 in the F-PACK cohort (
CL = 26) closely corresponded to that obtained when determining a CL threshold directly from the end-of-life data (
CL = 28.9). Second, classification based on classifier
\({}_{\mathrm{select}}^{A\beta }\) corresponded best to the visual reads of the F-PACK cohort (spec 95.3%, sens 81.8%) compared to the other classifiers. The classifiers trained on the end-of-life data to classify based on neuritic plaque density, had a lower sensitivity, and were less able to detect asymptomatic cases with increased amyloid load. Classification using machine learning works best when the training data reflect the same distribution as the data on which the classifier is applied. When we train using the end-of-life data and apply the classifier on an asymptomatic cohort, this poses a challenge to the classifiers as the data on which the classifier was trained are distributed differently from that on which the classifier is applied. In particular, the end-of-life study will contain many more neuropathologically advanced cases than an asymptomatic cohort, so ready transferability cannot be assumed. Among the four classifiers, transferability was satisfactory mostly for classifier
\({}_{\mathrm{select}}^{A\beta }\). The superiority of the amyloid phase-based classifier may relate to the affinity of the PET tracer not only for neuritic but also for diffuse amyloid plaques. The superior performance of the amyloid phase-based classifier in comparison to the neuritic plaque-based classifier may also have a neurobiological reason: in the course of Alzheimer’s disease, the timepoint when a case crosses from amyloid phase 2 to amyloid phase 3 occurs earlier than the timepoint when a case crosses from sparse neuritic plaque density to moderate neuritic plaque density [
32]. Hence, in an asymptomatic group, a classifier trained to distinguish phases 0–2 from phases 3–5 may have a higher sensitivity for detecting positive cases than a classifier trained on distinguishing zero/sparse neuritic plaque density from moderate/severe density. The discrimination between phases 0–2 and phases 3–5 may be more suitable in an asymptomatic population than the discrimination between zero/sparse and moderate/severe plaque density since the latter distinction occurs later in the disease course than the distinction between the amyloid phases [
32]. The selection of the 10% voxels with the highest amplitudes of feature weights clearly has a beneficial effect for classifier
\({}_{\mathrm{select}}^{A\beta }\). This can be attributed to the fact that the voxels with the highest amplitudes of feature weights may also be those that are affected earliest in the disease course. By the selection procedure, we reduce the dimensionality of the image, removing voxels that in the asymptomatic phase of AD, may contribute noise. When classifier
\({}_{\mathrm{select}}^{A\beta }\) was used, the CL threshold for distinguishing amyloid phases 0–2 versus 3–5 (
CL = 26) was also very close to that reported by La Joie et al. (2019) (
CL = 23.5) [
10].
Some of the classifiers performed poorly on the F-PACK. One cannot simply assume good transfer of a classifier trained on end-of-life data for application in a cohort of a very different nature, who are cognitively normal. Among the four classifiers, classifier \({}_{\mathrm{select}}^{A\beta }\) performed best in the asymptomatic cohort, with a close correlation between the CL and the distance to the hyperplane, a close concordance with visual reads and a CL threshold for positivity that matched that based on the neuropathological dataset and that reported by La Joie et al. (2019). What would be the added value of using classifier \({}_{\mathrm{select}}^{A\beta }\) compared to visual reads or semiquantitative assessment and CL? Given the close correspondence with visual reads and CL, the added value does not lie so much in the classification of a single image as outcome would be highly concordant. Instead, given the increasing availability of large datasets, containing 100s or 1000s of amyloid PET scans, a validated classifier is an efficient method for processing and classifying images on a large scale. In addition to this scalability, a second advantage is the generalisability of its use across centres. The use of a classifier is an objective, reader-independent method that can be easily reproduced across centres provided the input to the classifier has been processed in a state-of-the-art manner. It is worth noting that here the performance of the classifier did not critically depend on the differences in acquisition method between the end-of-life study and the F-PACK study (different scanners and acquisition windows) and neither on the differences in image analysis procedure (PET-only or MRI-assisted). Hence, the advantage of the classifier is its efficient, automated use on large datasets, and its rater-independent objectivity in classifying cases.
Acknowledgements
We acknowledge the centres that contributed to the GE067-026 study (Compass Research, Galiz Research, Las Vegas Radiology, Premier Research Institute, Banner Sun Health Research Institute, Mt Sinai Medical Center, Wien Center for Alzheimer’s Disease, Warren Alpert Medical School of Brown University, Banner Alzheimer’s Institute, University of Michigan, Moorgreen Hospital, VERITAS Research, St Margaret’s Hospital, Miami Jewish Health Systems, Memory Enhancement Center, Oxford Radcliffe Hospitals, Michigan State University, Exodon LLC, and Barrows Neurological Institute), and we are deeply grateful to the study participants and families. We would like to thank the staff of Nuclear Medicine, Neurology, and Radiology at the University Hospitals Leuven. Special thanks to Carine Schildermans, Kwinten Porters, and Mieke Steukers.
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.