Introduction
Lewy body disease (LBD) encompasses Parkinson’s disease (PD), PD with mild cognitive impairment (PD-MCI), PD with dementia (PDD), and dementia with Lewy bodies (DLB), which all have a characteristic clinical presentation and associated clinical diagnostic criteria [
10,
15,
18,
23]. The neuropathological hallmark of these clinically defined conditions is Lewy pathology (LP), which encompasses α-synuclein aggregates in nerve cell bodies and processes: Lewy bodies (LB) and Lewy neurites (LN), respectively. However, LP may also be seen in individuals lacking distinct clinical symptoms. The term incidental LBD was initially coined for individuals who lacked Parkinsonian or cognitive symptoms but had minimal LP restricted to the brainstem, but more recently, it has been expanded to encompass amygdala-predominant and olfactory-only LP [
2,
3,
13].
The heterogeneity of LP is a challenge for neuropathological classification systems. Diagnostic categories must reflect the wide range of LP severity and anatomical distribution, while also enabling robust inter-rater reliability. The existing neuropathological classification systems used for the diagnosis and staging of LP include the Braak LB stages (Braak) [
5], the DLB consensus criteria published by McKeith and colleagues (McKeith) [
17], the modified DLB consensus criteria by Leverenz and colleagues (Leverenz) [
14], and the Unified Staging System for LBD by Beach and colleagues (Beach) [
3]. These staging systems are based on the semi-quantitative scoring of LBs and LNs in neuroanatomically defined regions, in particular the dorsal motor nucleus of the vagal nerve, locus coeruleus, substantia nigra, transentorhinal cortex, amygdala, cingulate cortex, temporal cortex, frontal cortex, and parietal cortex. For the McKeith, Leverenz, and Beach systems the severity of LBs and LNs is scored on a 5-tier scale: 0 = absent, 1 = sparse LBs or LNs, 2 = more than one LB per high power field and sparse LNs, 3 = more than four LBs and scattered LNs in a low power field, 4 = numerous LBs and LNs, as illustrated by McKeith and colleagues [
17]. For the Braak system, a four-tier scale is used to reflect the extent of α-synuclein immunolabelling: 0 = absent, 1 = “slight”, 2 = “moderate”, 3 = “severe”, as described by Braak and colleagues [
5].
The BrainNet Europe Consortium (BNE) found mean inter-rater agreement rates of 65% (range 32–100%) for the Braak system and 81% (range 45–100%) for the McKeith system when 22 experts assessed 31 cases which all showed some LB pathology [
2]. BNE developed a new protocol which was not based on semi-quantitative scoring but simply on the presence or absence of LBs and/or LNs, and added the category “amygdala predominant” for cases with pathology most severe in the amygdala and less pronounced in brainstem areas. This protocol achieved inter-rater agreement of 83% for the Braak system and 84% for the McKeith system [
2]. Similarly, Müller and colleagues applied the Braak system in an inter-rater study where a semi-quantitative score was only needed for stage 6, while stages 1–5 could be assigned based on the presence of LP in the relevant areas and achieved an inter-rater reliability of at least 76% [
20].
While all of these neuropathological staging systems are widely used, they exhibit relatively low inter-rater reliability and frequently make cases diagnostically unclassifiable; e.g., a case with severe LP in the neocortex but only mild in the brainstem cannot be classified in the Braak system and when using the McKeith system cases may sometimes be assigned to more than one category. Hence, there is a need for a LP staging system that shows high inter-rater reliability, allows for the unequivocal classification of all possible cases, and is readily applicable in neuropathological routine diagnostics. To address this unmet need, we developed a new LP classification system based on a modification of the McKeith system and which uses the dichotomized approach introduced by the BNE. 16 raters in 13 different centres used this new classification system as well as the Braak, McKeith, Leverenz, and Beach systems to score and stage LP in 34 cases. In addition, regional LP scores retrieved from diagnostic neuropathological reports from the University of Pennsylvania brain bank (UPBB) and the Newcastle Brain Tissue Resource (NBTR) were used to re-assign LP categories according to all systems for 363 LP cases.
Discussion
We have devised and tested a new staging system for the assessment of LP. Our proposed LPC system was applied together with previously established Braak, McKeith, Leverenz, and Beach systems, by 16 raters on 34 cases. The LPC system showed good inter-rater reliability: comparable to McKeith and Leverenz systems, and considerably better than Braak and Beach systems (Fig.
3b). Using the LPC system, the majority of raters were able to classify all cases; in comparison, while most cases (over 95%) could be classified using Beach, over 10% of cases could not be classified using Leverenz, over 25% using McKeith and nearly 30% using Braak systems, respectively (Fig.
3c). Percentages were even higher when UPBB and NBTR archival cases with a clinical diagnosis of AD dementia were evaluated (Fig.
5).
Since the initial identification of α-synuclein in LB [
25], several staging systems have been proposed and implemented to classify LP [
2,
3,
5,
14,
17]. The Braak system was developed to assess the typical patterns of severity and distribution of the LP in PD. However, later studies showed divergent patterns of progression in PD where the accumulation of pathological α-synuclein begins in the brainstem, as opposed to AD or DLB, where LP may be limited to limbic and neocortical regions [
3,
30]. This helps explain the relatively high number of non-classifiable cases observed when applying the Braak system in our study. The McKeith system showed a similar high percentage of non-classifiable cases, partly reflecting the necessity to have at least some brainstem pathology to assign any stage, which is also true for the Leverenz system. In addition, according to the McKeith system, some cases can equally fulfil the criteria for limbic and neocortical LP (e.g., brainstem and limbic regions, score 3; temporal cortex score 2 and frontal cortex score 1); consequently, such cases cannot be assigned to just a single category and thus are not classifiable. Both Braak [
5] and McKeith [
17] systems were published before it was shown that LP may be restricted to the olfactory bulb or amygdala [
2,
3,
13] and, therefore, such cases cannot be assigned a category in both Braak and McKeith systems. However, in our study, only three cases were categorized as “Amygdala predominant” and one as “Olfactory only”. While application of the method suggested by the BrainNet Europe [
2] resulted in a reduction of percentage rates of cases that could not be classified, they were still higher than for all other systems.
Assignment of a category in both Braak and Beach systems depends heavily on the semi-quantitative score for LP in each region. Since that is relatively subjective, it is not surprising that both Braak and Beach systems had the lowest inter-rater reliability in our study (Fig.
3c). Semi-quantitative scores are also used in McKeith and Leverenz systems, but regional scores may range from 1 to 3 and individual scores do not, therefore, influence the assignment of a category as much as they do in Braak and Beach systems. We have seen a high inter-rater reliability for both McKeith and Leverenz systems as well as for our proposed LPC system; the use of a dichotomized approach where a region can either be scored negative or positive for LP greatly reduces the probability of differences in scores between multiple raters. This is further supported by our finding of Braak systems showing higher inter-rater reliability and both Braak and McKeith system showing highest percentage of cases with 100% agreement, when the dichotomized method suggested by the BrainNet Europe was used. However, 100% agreement was only reached in 29.4% when using the LPC system, which is still higher than the 100% agreement rates for Braak, McKeith, Leverenz, and Beach systems, but admittedly relatively low considering the dichotomized scoring and the simple staging approach. We assume that the use of only digital images had an adverse impact on the scoring accuracy of raters, who are used to assessing slides on a microscope, in particular since sometimes relatively large areas had to be screened for minimal amounts of pathology (e.g., single LNs in a neocortical section).
In addition to our multi-rater assessment, we evaluated the LPC system in comparison with Braak, McKeith, Leverenz, and Beach systems, in a total of 336 archival cases from the UPBB and NBTR: a large sample of consecutive non-selected cases with a broad range of clinical diagnoses. LP in PD cases with or without cognitive impairment was classifiable by all staging systems. However, when dementia was the main presenting feature, LP was not classifiable in 41–82% of cases staged according to Braak or McKeith systems (Fig.
5). This inability to stage a high proportion of cases according to Braak or McKeith systems is in keeping with previous findings by Beach and colleagues [
3]. Both Beach and our proposed LPC system are better suited for the classification of LP pathology across the entire spectrum of neurodegenerative diseases and ageing.
We scored a region positive if sparse LBs or LNs were seen thereby giving equal importance to LBs and LNs for assigning the lowest possible positive LP score, which is in agreement with previous publications on the assessment of LP in post-mortem brains [
2,
3,
14,
17]. Hence, our dichotomous LP scoring approach leads to cases with relatively low amounts of LP in limbic/neocortical areas being categorised as limbic/neocortical LP. While this could in theory possibly result in a relatively high proportion of cognitively unimpaired individuals being diagnosed as having neocortical LP, in the multi-rater assessment all 15 cases with neocortical LP, as determined by the majority of raters, had a clinical diagnosis of dementia. Moreover, in both UPBB and NBTR, a LPC category of neocortical LP was associated with significantly increased odds of having dementia in life even after controlling for neurofibrillary tangle tau pathology. However, some α-synuclein antibodies may produce non-specific immunolabelling [
8] and, therefore, we suggest that the presence of single dot-like immunopositivity in the neuropil alone in the absence of any neuronal immunopositivity is not sufficient to score the section positive (Fig.
2a, b). We further suggest that detailed clinico-pathological correlative studies should not be based on diagnostic staging systems, like the one we present here, but always aim to obtain more quantitative measures of the burden of pathological protein aggregates (
e.g., image analysis).
To make our system applicable for neuropathological routine diagnostics at relatively low costs, we have deliberately limited the number of regions that need to be assessed to an absolute minimum and have chosen those regions that have been widely used in previous staging systems. However, LP in particular in PD, may be present in a variety of tissues such as the spinal cord [
7], gut [
6,
27], sympathetic ganglia [
26], adrenal gland [
11], heart [
22], and skin [
9] among others. The systematic pathological assessment of LP in regions outside the brain may be possible in the future if post-mortem examination related to neurodegeneration routinely combines assessment of both cerebral and relevant extra-cerebral tissues, and will lead to the development of staging systems for LP that encompass LP in the entire human body.
In our study, two different antibodies were used, the KM51 clone (Leica, UK), which detects full length α-synuclein was used for NBTR cases while UPBB cases were stained with Syn303 (CNDR) which detects epitopes with amino acid residues 2–4. We did not observe any differences in inter-rater reliability or ability to classify cases between cases from NBTR and UPBB, suggesting that the reliability of LPC is not dependent on specific α-synuclein antibodies.
The LPC system was devised primarily to increase the reliability of diagnostic assessment, without implying any particular pattern of topographical spread of pathology, such as in the Beach system [
1,
3]. Our findings confirm that the Beach system, based on the putative pathological processes underlying disease progression, allows most cases to be staged and is, therefore, a useful scheme if used by experienced raters, although due to the low inter-rater reliability it may not practicable for day-to-day routine diagnostics and collection of data across brain bank networks. We would also note that we did not include the assessment of substantia nigra cell loss in the inter-rater evaluation as this is not included in previous LP staging systems and was not within the aims of our study. However, we suggest that evaluation of substantia nigra cell loss should routinely be performed, as previously recommended by the BrainNet Europe Consortium [
2]. The Fourth Consensus Report of the DLB Consortium further suggests to score nigral neuronal cell loss to subclassify cases into those likely or not to have Parkinsonism and the LPC categories can be used to determine the likelihood that pathological findings are associated with a typical DLB clinical syndrome (Table 2 in [
18]).
We used the term LP instead of LBD in the LPC system categories and we recommend that the terms PD-MCI, PDD or DLB not be used to describe the neuropathological findings alone. These diagnoses should only be made once the clinical presentation, including neuropsychological evaluation, is combined with the post-mortem neuropathological findings. In addition, as the ageing brain typically includes multiple pathologies which together can lower the threshold for one specific pathology to cause dementia (or other neurological impairment) [
4,
12,
28], the neuropathological report should contain information on all observed pathologies,
e.g., AD neuropathological change [
19], TDP-43 pathology [
16,
21], cerebrovascular pathology [
24], and LP.
We conclude that the LPC system is a useful classification system for LP. It has good reproducibility and clinical utility, and our expectation is that it will be reliable and useful in routine diagnostic practice, allowing neuropathologists to classify the majority of cases into categories that are compatible with the clinical findings. We suggest that the LPC system should be the standard future approach for the basic post-mortem evaluation of LP in individuals with and without concomitant neurodegenerative diseases.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.