Introduction
Parkinson’s disease (PD) is the second most common progressive neurodegenerative disease and affects 8.5 million individuals worldwide as of 2017 [
1]. It is characterized by a progressive loss of dopaminergic neurons within the substantia nigra pars compacta (SNpc), considered to cause PD’s classical motor symptoms [
2]. Currently, PD diagnosis relies on the clinical features acquired from patient history and neurological examination; accurate diagnosis is difficult in early stages, with a misdiagnosis rate of approximately 25% [
3]. Although 60–80% of the dopaminergic neurons of the SNpc are lost before any clinical symptoms appear [
4], to date, conventional MRI has been unsuccessful in detecting pathological changes in the SNpc, compromising the effectiveness of prophylactic approaches and new therapies [
5] which attempt to slow the neuronal loss. Therefore, objective PD biomarkers are urgently desired.
In routine clinical practice, the role of MRI in patients with Parkinson-like motor symptoms is today limited to ruling out atypical parkinsonisms [
6]. Recently, among other promising approaches [
7,
8] developed to detect neurodegeneration in the SNpc, neuromelanin-MRI (NM-MRI) was proposed to visualize neuromelanin, as its depigmentation is a key pathological feature of PD [
9]. Iron–neuromelanin complexes stored inside healthy dopaminergic neurons have highly paramagnetic properties that increase the NM-MRI signal intensity through a combination of magnetization transfer and T1 effects [
10]. After neuronal death, unbound neuromelanin and iron become extracellular [
11], contributing to neurodegeneration by activating the microglia and proinflammatory factors [
12]. In patients with PD, low levels of intracellular iron–neuromelanin complexes result in decreased NM-MRI signal intensity. Several authors showed that quantifying the SNpc signal loss in NM-MRI can yield high diagnostic accuracy for distinguishing PD patients from controls [
13‐
15], even at an early stage [
16]. Furthermore, some studies reported a correlation with the severity of the disease [
17,
18] and L-dopa induced motor complications [
19].
To that purpose, various segmentation techniques have been proposed for assessing the hyperintense area of the SNpc: simple manual delineation [
14], SNpc hyperintense area (or volume) estimation using a signal intensity-threshold derived from the manually segmented background midbrain [
15,
18,
19], and the semiautomated region growing technique [
20]. The only automated process described to date is the atlas-based method [
13], which involves aligning new images to a set of manually labeled examples. However, this method may not be able to capture the full anatomical variability of the target subjects due to the use of a fixed set of atlases, affecting its accuracy [
21], and is known to be computationally intensive.
In this study, we used as reference a threshold signal intensity method using manual segmentation (MS) first described by Schwarz et al. [
15], as it is the only method demonstrating a stage-dependant SNpc signal loss in PD, unlike the atlas-based experiment. This method attempts to count the SNpc hyperintense pixels above a determined threshold based on the background signal of the midbrain. Several steps, including manually delineating the SNpc and midbrain, determining the threshold, and calculating the resulting hyperintense areas, are required. Despite attractive diagnostic performances, the clinical applicability of this method is impeded by these time-consuming steps, first and foremost MS; in this regard, automatized segmentation would be a significant improvement.
In this context, deep learning segmentation appears as an appealing option. It uses neural networks trained to perform tasks using examples and to grasp intricate structures in datasets [
22]. Specifically, convolutional neural networks (CNNs) have significantly advanced computerized image recognition performance. They have successfully been applied to the neuroradiology field to segment various structures such as brain tumors [
23], white matter hyperintensities [
24], or organs-at-risks prior to radiation therapy [
25]. Among CNNs, the U-net [
26] is the most commonly used model in biomedical image segmentation.
We hypothesized that a U-net architecture CNN could replace manual segmentation of NM-MR images as the initial step of a previously described method aiming to assess SNpc signal intensity and achieve equivalent diagnostic accuracy for PD diagnosis. Therefore, we evaluated (1) the segmentation accuracy and (2) the diagnostic test performance of the U-net segmentation-based method compared to the established MS method.
Discussion
Here, we developed a U-net model to segment the SNpc and midbrain in NM-MRI and showed that our model could achieve equivalent diagnostic performance to that of manual segmentation using a validated thresholding method for the hyperintense area of the SNpc, despite a moderate segmentation accuracy of the SNpc by our model.
U-net segmentation of the midbrain was highly accurate in both datasets; however, the U-net could not achieve a segmentation of the SNpc in the same range as the inter-reader precision in the external validation dataset. The lower accuracy of the US for SNpc in the external dataset implies that different imaging parameters and signal intensity variations challenge the U-net inference capabilities. Also, applying the optimal threshold for the principal dataset to the external dataset could have affected the diagnostic test accuracy, because the threshold should be adapted to the neuromelanin-sensitivity level of the pulse sequence. To address this specific issue, Schwarz et al. [
18] proposed a normalization procedure based on the theoretical volume of the SNpc hyperintense area in healthy controls. Because we wanted to test independently the accuracy of the U-net in the external dataset, we did not try to normalize the signal intensity level.
Another finding is that the accuracy of the SNpc segmentation was consistently lower than that of the midbrain, denoting the difficulty in determining the boundary of the SNpc regardless of the segmentation method. Unlike the boundaries between the midbrain and surrounding cisterns, the boundaries between the SNpc and the background are difficult to delineate precisely because hyperintense pixels depict only neuromelanin content and not the entire SNpc. The relative subjectivity inherent in the manual segmentation of the SNpc seems to have affected both manual and U-net segmentation accuracies. Further, the DSCs were lower in the patient group compared to the healthy group probably because reduced-hyperintense areas result in an even more challenging segmentation task.
Despite the relative lack of precision of the SNpc segmentation in the external dataset, the calculated hyperintense areas were significantly reduced in patients with PD compared to HC in both datasets, consistent with the results of previous studies [
13,
15,
19,
20]. The diagnostic test accuracy for PD of the thresholding method was not affected: AUC were similar using U-net or manual segmentation in both datasets, with a slight comparative advantage for the U-net method, and as high to that of the previously described manual techniques, where it ranged from 0.82 to 0.93 [
13,
15,
20]. These results suggest that an extremely precise segmentation of the SNpc is not required to provide useful size estimates of the hyperintense area. Our U-net model is sufficient to obtain a satisfying diagnostic accuracy.
The hyperintense areas were correlated to motor severity (reflected by UPDRS-III scores) in the external validation dataset but not in the larger principal dataset. We do not have a clear explanation for this finding, as disease severity was similar between the two groups. Due to its small size (12 PD patients), the correlation analyses performed on the external validation dataset should be viewed cautiously. As previous studies on smaller samples also found weak [
18] or no correlation [
13,
17] with UPDRS-III scores, the utility of NM-MRI as a monitoring tool for patients with PD could not be proved.
This study had several limitations. First, the sample size was relatively small for a case-control study, particularly of the external validation dataset. Second, PD diagnosis in this study was not histopathologically confirmed; thus, misdiagnosis could be possible. Third, as pathological examination could not be used as a criterion, the U-net model was trained using manually obtained masks of the SNpc and midbrain from NM-MRI as input. MS relies on recognition of the hyperintense area and the anatomical knowledge of the radiologist and is therefore subject to subjectivity bias. Hyperintense areas could be underestimated in patients with PD, amplifying the difference between the patients with PD and HC. Thus, additional sequences providing clearer SNpc images, such as proton density-weighted images, could be beneficial for creating more accurate SNpc masks for application to NM-MR images. Fourth, both the methods relied on a threshold to define the hyperintense area. A drawback of this approach is the loss of information, such as the magnitude of the signal intensity above the threshold or its spatial distribution [
30]. Several studies have found sub-regional patterns of neuromelanin loss within the SNpc using manually placed regions of interest [
16,
18] or voxel-wise analysis [
30], with differences between HC and patients with PD preferentially involving the posterior and lateral parts of the SNpc. Studying the whole SNpc could have contributed to the lack of correlation with the clinical status in our study, which remains an important focus for further improvement of NM-MRI. Additional studies focusing on this region of the SNpc could help achieve this goal. Finally, mean disease duration was longer in the external validation dataset and it may have influenced positively the diagnostic accuracy of the method in the external dataset. Additionally, the mean age differed between the datasets; thus, the potential influence from these factors cannot be ignored because the midbrain is subject to age-related changes [
31]. However, despite these limitations, because the U-net saves times and does not affect the diagnostic accuracy of the thresholding method, it may be useful to promote the clinical application of NM-MRI for PD diagnosis.
In conclusion, U-net segmentation provided relatively high accuracy in the evaluation of the SNpc in NM-MRI and yielded diagnostic performance comparable to that of the established manual method, but its segmentation accuracy should be further improved to be able to fully replace manual segmentation.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.