Introduction
Prostate cancer is the most common solid tumor in men, with 1,094,916 incidence cases and 307,481 deaths estimated globally in 2012 [
1]. The accurate detection of the disease and its subsequent staging are critical for selection of appropriate treatment strategies. Especially the differentiation between those with localized or regional disease who can be treated with curative intent versus those with metastatic disease is crucial. Whether or not surgery, radiation, and/or systemic treatments are appropriate for a given patient is driven in large part by the clinical stage [
2]. Targeted molecular imaging with positron emission tomography/computed tomography (PET/CT) is a highly versatile imaging technology to inform staging and management decisions for patients with a variety of cancers.
In prostate cancer, PET tracers targeting prostate-specific membrane antigen (PSMA) have demonstrated high diagnostic accuracy for the detection of both regional and distant metastatic prostate cancer [
3,
4]. The higher sensitivity and specificity of PSMA PET in detecting metastatic prostate cancer will have strong implications in management of patients. To demonstrate the association of PSMA imaging with clinical outcome, there is an urgent need to standardize PSMA assessment. Recent efforts in standardizing the assessment of PSMA scans have resulted in proposals for lesion characterization and reporting—EANM, PSMA-RADS, and PROMISE criteria [
5‐
7]. While all the proposed criteria are focused on the characterization of individual PSMA lesions based on the location and the definition of significant uptake, the PROMISE standard is also proposing a patient level classification (miTNM), which is based on the total burden and its location of the disease in the PET/CT image. A recent study comparing such standardized assessments has shown that they have high inter-reader reproducibility [
8].
However, the adoption and implementation of these standards in routine clinical practice is limited by the fact that the adherence to these guidelines is a manual and a labor-intensive process. The manual work can be greatly facilitated through automated image analysis. The structural radiological processes, including the segmentation of anatomical structures (from CT), can be automated to contextualize and characterize the functional imaging. Knowing the anatomical context is needed both for normal tissue reference uptake estimation as well as accurate detection of potential lesions, since uptake in the lesion as well as in the background may differ between tissues.
Deep learning organ segmentations in CT have been used in automated analyses of PSMA PET to exclude physiological uptake in certain high-uptake organs when detecting PSMA-positive lesions and estimating tumor burden. However, achieving high sensitivity while limiting the number of false positives outside these organs remains challenging. Previous lesion detection approaches for PSMA-PET [
9,
10] used a liver uptake-based threshold to select possible lesions in patients with advanced prostate cancer. Such methodologies likely capture most lesions with tracer uptake more than the liver but cannot be used to detect PSMA avid disease in general, as many lesions have a SUV
max below the threshold of mean liver uptake. Others have presented deep learning-based methods for detection and segmentation of possible lesions [
11,
12]. In automated image analysis, blob detection algorithms are commonly used to detect salient regions in images [
13], and the use of such methods has the potential to capture lesions with maximal standardized uptake value (SUV) below liver uptake. Blob detection algorithms would also have the capacity to detect lesions in, e.g., uncommon locations, or with unusual uptake patterns and could also be easily extended to handle a wider range of tracers, somethings deep learning-based methods may struggle with.
An additional issue with threshold-based lesion segmentation is that in lesions with low or subtle uptake, a rigid rule of segmenting based on 50% or 30% of SUV
max of the lesion, would result in inaccurate over-segmentation. High uptake adjacent to lesions, for example in intestines, also confounds threshold-based segmentation. The fast marching method, used for segmentation in a wide variety of tasks [
14], can be employed for lesion segmentation in this setting to avoid these problems.
To overcome the technical challenges and to assist readers in adhering to the standardized guidelines for the implementation of PSMA imaging, we have developed aPROMISE—
automated
Prostate
Molecular
Imaging
Standardized
Evaluation. aPROMISE is a CE marked software as a medical device that employs deep learning technology to automate the segmentation of organs in low-dose CT images and quantifies the mean tracer uptake in the reference organs. Subsequently, aPROMISE uses blob detection and fast marching methodology to detect and segment regions of interest as potential pathological lesions in PSMA PET/CT. The intent of aPROMISE is to reduce the laborious task and to assist the readers in standardizing the PSMA imaging assessment. Therefore, in the application, it is the physicians that still must review the image and make the selection for the lesions. However, when the physician makes the call that a lesion needs to be marked as suspicious, then the technology facilitates the standardization of assessment by automating the laborious task of localization, segmentation, and quantification. The illustrative workflow has been demonstrated in supplemental Figure
1.
The aPROMISE workflow has demonstrated low inter-reader variability and high efficiency in the quantification and staging of intermediate to high-risk prostate cancer [
15]. In the current study, we intend to analytically evaluate the technical performance of aPROMISE. The objective of the study is threefold: (1) to evaluate the accuracy of the automated organ segmentation applied to low-dose CT scans, (2) to evaluate the consistency of the automated quantitative tracer uptake in reference organs of PSMA PET/CT, and (3) to evaluate the sensitivity of automated detection of potential lesions in PSMA PET/CT.
Discussion
The increasing availability and use of novel imaging agents within nuclear medicine warrants the development and validation of technology that reliably localizes, segments, and quantifies the specific tracer activity in PET/CT. Additionally, the functional imaging tracers are specific to the biological activity of their respective targets. The biodistribution and pathophysiological uptake of PSMA-targeted imaging tracers is distinct from that of FDG. Our effort has been to apply automated image analysis to tailor anatomical contextualization and potential lesion detection to PSMA PET/CT, with the aim to provide relevant structural information as well as high sensitivity of detecting lesions.
The deployment of automated image analysis systems into routine diagnostic imaging has many potential advantages. First, automation can standardize interpretations thus improving inter-reader agreement in localization and quantitative assessment. Second, automation can improve reader efficiency by reducing time spent evaluating obvious image findings, while simultaneously guiding the human reader’s attention to more challenging, equivocal findings. Third, automation can potentially accelerate the “learning curve” human readers must face when interpretations of new imaging modalities are integrated into routine care. Finally, automated image analysis might be used not only to identify abnormal lesions similar to human readers, but also extract additional diagnostic, prognostic, or predictive information contained in the raw imaging data not otherwise accessible to human readers.
Accurate and consistent anatomical segmentation in CT is essential in medical image analysis and radiation dose planning. The manual segmentation task is mundane, labor intensive, and inherently variable. There have been prior reports on the use of deep learning technology in semantic segmentation of contrast-enhanced or diagnostic CT for image analysis, particularly for application in treatment planning [
17‐
20]. In recent work, Liu C et al. demonstrated a Dice score of 0.85–0.88 for automated prostate segmentation [
19]; their work using the contrast enhanced CT achieved performance similar to that observed with MRI imaging in the PROMISE12 challenge [
21]. However, the low soft tissue contrast and resolution in low-dose non-contrast-enhanced CT images of PET/CT provide a more difficult challenge in obtaining a clear automated volumetric segmentation of small organs. The performance of our aPROMISE algorithm in prostate segmentation in low-dose CT, without contrast, was similar to that of Nemoto T et al. who also demonstrated a mean Dice score of 0.79 for prostate [
22]. The Dice score of the bones and the visceral organ were observed to be 0.88 or above, indicating a much better performance of the algorithm in larger organs. The prostate data does warrant manual review of the prostate segmentation in the aPROMISE analysis of patients with localized disease in PSMA PET/CT.
The first step of aPROMISE, to accurately segment the organs in the low dose CT, enables the subsequent step of quantification in the reference organs of PSMA-ligand PET. PSMA expression in prostate cancer in relation to the reference organs as detected by PSMA ligand PET would standardize quantitative reporting [
6]. Notably, quantification of PSMA uptake in PET/CT in relation to liver and blood pool are likely to be critical parameters for selection of patients for PSMA-targeted therapeutics. In ongoing clinical trials, PSMA-positive lesions where SUVmax is above 1 or 1.5 times liver SUVmean have been used as a threshold for selecting patients to be treated with 177Lu-PSMA 617 (NCT03805594) and for 177Lu-PSMA I&T (NCT04297410). Translating such quantitative criteria from clinical trials into clinical practice would require a platform that can provide the consistency of centralized reading at the local level. Our study demonstrates that aPROMISE enables greater reproducibility and higher consistency in reporting the quantitative assessment of reference organs than that of three experienced nuclear medicine physicians.
The overall performance of our methodology in detecting sites of prostate cancer was similar to the recent work by Zhao et al., which employed deep learning for detecting PSMA lesions in the local pelvic area [
12]. The independent evaluation of aPROMISE demonstrated that the analytical detection algorithm is proficient in detecting lesions (above 90%) that are manually determined to be pathological in nature. In a recent study [
9], a threshold above SUV 4.3 was used for detecting lesions. Had a threshold of SUV=4.3 been used in our study, the detection sensitivity of regional lymph nodes in high-risk localized disease would have dropped from 91.5 to 75.0%, the sensitivity of lymph node metastasis in metastatic disease would have dropped from 90.6 to 76.2%, and the sensitivity of bone metastases in metastatic disease would have dropped from 86.7 to 61.8%. With the lower threshold of SUV=3.0 employed for bone metastases in another study [
10], the sensitivity would still have dropped from 86.7 to 77.1%.
The detection and pre-segmentation algorithm demonstrated high sensitivity, also when considering lesions with low uptake. This is beneficial for the reader, decreasing the time spent on segmenting lesions and simultaneously mitigating inter- and intra-reader variability in quantitative assessments. The detection algorithm did however also generate a high number of false positives. The majority of these false positives can be readily disregarded by a reader as they arise in physiological uptake, most notably in the intestines. One can employ CNN for detection and segmentation. However, to successfully train a CNN to account for both soft tissue and bone lesions in uncommon locations, or with unusual uptake patterns, an enormous data set is required. Furthermore, training of CNN will also be tracer specific, so for tracer agnostic detection and pre-segmentation, a large data set comprised of all PSMA tracer will be required. In comparison, our approach of blob detection and fast marching methodology in lesion detection and pre-segmentation has demonstrated a robust solution of whole-body image analysis.
The study also demonstrated disparity of outcome based on reader experience in PSMA imaging. In comparison to his counterparts, reader 2 was consistently conservative in calling the PSMA positive lesions in all tissue types (Table
6). Concurrently, this reader also had very limited experience with PSMA PET/CT. A more trust in automation and in algorithms that have been validated can enhance the consistency of patient diagnosis. We are keen to explore and enhance the relationship of aPROMISE with the physician in real-world practice.
The retrospective design without pre-defined success criteria was a limitation of the current study; however, the objective of the study was to evaluate the performance of the novel platform for its subsequent validation in specific clinical context. The use of three independent and experienced nuclear medicine readers in the evaluation of the aPROMISE algorithms has mitigated some of the risk of bias. The individual organ segmentation is a laborious process, as an example—it takes an estimated 15 to 20 min to volumetrically segment a typical organ in low-dose CT, we were limited in our reliance on the segmentations performed by one experienced reader, and there was no consensus segmentation from multiple readers. Some studies have used overlap of multi-reader segmentations [
23]. Such a solution of taking the intersection of multiple readers would result in a truncated volume and not necessarily yield a more accurate standard for comparison against the deep learning algorithm. A limitation in the study design was to not evaluate detection and pre-segmentation of primary tumors in the prostate gland. One of the primary limitations of aPROMISE in analyzing PSMA PET/CT images was the absence of ureter segmentation. The hotspots in the ureter from the physiological uptake in urine are a confounding factor in the assessment of PSMA uptake in lymph nodes in the pelvic area. We are generating labeled data which can enable the algorithm to avoid urine uptake in subsequent versions of the aPROMISE platform.
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.