Introduction
New cancer treatments with ICI have improved outcomes and altered management strategies for patients with melanoma and a variety of other malignancies. Many trials have shown the safety and efficacy of ICI targeting programmed death-1 (PD-1), programmed death ligand-1 (PD-L1), and cytotoxic T-lymphocyte antigen-4 (CTLA-4) and these agents are now widely implemented treatments for melanoma in both metastatic and adjuvant settings. None of the anti-PDL-1 agents has been approved for treatment of melanoma as single agent or in combination with other ICIs [
1‐
5].
ICI treatments might cause immune-related toxicities, called irAE, where an immune response is generated against healthy tissue. These irAEs can occur in any organ system [
6]. Though the exact pathophysiology of irAE is not completely understood, it is believed that they are provoked by immune upregulation and inflammation. Depending on the treatment regimen, irAEs might occur less frequently when compared to cytotoxic chemotherapy-related toxicities [
7]. However, some patients might experience higher grades of irAEs that require hospitalization or prolonged treatment and might be life-threatening [
8,
9]. Due to the novel mechanism of action, unpredictable nature, and broad usage of ICIs, development of biomarkers capable of early detection and monitoring of irAEs is an area of urgent need [
10].
Positron emission tomography/computed tomography with [
18F]2fluoro-2-deoxy-D-glucose (
18F-FDG PET/CT) is a sensitive and non-invasive test commonly used in diagnosis, staging, and treatment response evaluation in MM [
9,
11,
12]. The combination of
18F-FDG PET and CT allows functional and morphological evaluation of the disease and guides clinical decision-making and treatment selection. It is also a very sensitive method for recognizing inflammatory processes that can be reflective of irAE [
13‐
16]. Some preliminary analysis of irAE detection and monitoring via
18F-FDG PET/CT has been undertaken [
17‐
19]. None of them perfomed a quantitative assessment of the observed organs.
For melanoma patients receiving ICI,
18F-FDG PET/CT is performed for the purpose of disease response assessment. However, incidental findings of increased tracer uptake in off-target organs are sometimes reported. The fact that
18F-FDG PET/CT is currently not used to detect irAE may be due in part to the impracticality of performing a manual whole-organ assessment of
18F-FDG uptake. Small organs such as the thyroid can be assessed manually with relative ease, but larger organs such as the bowel or lung would require hours of expert physician time to contour manually. We hypothesize that this is a contributing factor to why irAE assessment on
18F-FDG PET/CT has thus far been limited to the thyroid [
17,
18], or to qualitative assessment only in the bowel [
19]. To overcome this hurdle, automated image analysis techniques can be employed. In particular, the segmentation of organs on PET/CT via deep learning-based CNNs has recently become possible [
20,
21].
The goal of this study was to identify quantitative imaging biomarkers of irAE development on 18F-FDG PET/CT in a cohort of patients with MM who were treated with ICI. MM is specifically selected for this pilot retrospective study as ICI regimens and response evaluation with 18F-FDG PET/CT are standard of care treatment for this disease and routinely performed. Other malignancies that utilize 18F-FDG PET/CT for response evaluation use ICI combined with other treatment modalities (e.g., cytotoxic chemotherapy or targeted therapy) which makes it difficult to tease out irAE from other treatment-related AEs. We hypothesized that an automated organ segmentation method would be capable of quantifying irAE-susceptible organ inflammation in 18F-FDG PET/CT images in metastatic melanoma patients treated with ICI.
Patients and methods
We conducted a retrospective pilot study, analyzing
18F-FDG PET/CT images from patients with metastatic melanoma who were treated per standard of care with ICI (anti-CTLA-4 or anti-PD1) at the Institute of Oncology Ljubljana (OIL), Slovenia (January 2016–January 2019) or at the University of Wisconsin Carbone Cancer Centre (UWCCC), Madison, WI, USA (June 2012–June 2019). None of the patients had an autoimmune disease. All available
18F-FDG PET/CT data acquired before and during ICI treatment was collected for review. We determined the date of clinical irAE detection via chart review. If the irAE grade was not explicitly documented in the chart, when possible, irAE grading was assigned retrospectively based on the available clinical course documentation following Common Terminology Criteria for Adverse Events (CTCAE, v.5.0) [
22]. Clinical and demographic data were collected from both hospital databases. Clinical and imaging data were anonymized and stored in a secure LabKey database server [
23].
The study was approved by the Institutional Review Board Committee of both Institutions (Approval number: 2016-0418 in Madison, USA; ERIDKE-0005/2020 in Ljubljana, Slovenia) and was conducted following the ethical standards defined by the Declaration of Helsinki. At OIL, patients have signed informed consent for treatment and consent allowing the usage of their data for scientific purposes. At UWCCC, the study was approved with a waiver of informed consent.
PET acquisition
PET scans were primarily performed for immunotherapy treatment response evaluation in melanoma patients. Images were acquired on five PET/CT scanners: GE Discovery 710, GE Discovery STE, GE Discovery IQ, GE Discovery MI (General Electric, Waukesha, WI), and mCT (Siemens, Knoxville, TN). In all cases, the imaging protocol required patients to fast for 6 h prior to injection of the radiotracer and have a blood glucose level below 200 mg/dL (UWCCC) or 6–10 mmol/L (OIL) at the time of the scan. Patients were required to hold all diabetic medication, including metformin, for 6 h prior to radiotracer injection. On the GE Discovery IQ, patients were injected with 259±52 MBq of 18F-FDG, while on other scanners, patients were injected with a weight-based dose of 5 (OIL) to 5.2 MBq per kilogram and a minimum 370 MBq (UWCCC) of 18F-FDG. Scans were acquired 60±10 min post-injection. For UWCCC patients, the CT used in segmentation was a low-dose CT acquired for attenuation correction. At OIL, CT that meets RECIST analysis needs was acquired according to adjusted protocol including SAFIR reconstruction to minimize dose. Following reconstruction, images were normalized by patient weight and injected dose to compute standardized uptake values (SUV). If available, TOF reconstruction was used.
18F-FDG PET/CT image analysis
To quantify organ
18F-FDG uptake, a CNN was trained to segment the thyroid, lungs, and bowel from the low-dose CT component of patients’ PET/CT imaging data. A CNN was chosen for segmentation for the ability to segment irregular and variable structures and for the ability to successfully segment multiple target structures with very different sizes (e.g., thyroid versus lung) [
21]. The network architecture used was DeepMedic, a 3-D, patch-based CNN with multi-resolution pathways [
24]. The loss function used was Dice similarity coefficient (DSC). The optimizer was RMSprop [
25]. Sixty manual contours of the bowel, lung, and thyroid were produced using a public dataset of
N=20 patients from the VISCERAL.eu Anatomy3 benchmark [
26], and an additional private institutional dataset of
N=40 patients by an experienced graduate student using 3D Slicer [
27]. Labelled data were split 80%/20% (
n=48/
n=12) for CNN training/validation. Images were resampled to a cubic 2-mm grid and normalized to have a mean of 0 and variance of 1 within the patient. Data augmentation via histogram shifting, histogram scaling, and random rotation was used to increase the effective training dataset size. The CNN was trained using a workstation with one NVIDIA Titan Xp GPU with 12 GB of memory.
The trained CNN was used to perform inference on the CT component from the
18F-FDG PET scans and produce contours of the thyroid, lung, and bowel. The contours were then applied to the PET image to quantify
18F-FDG uptake within the three target organs. To determine the ability of PET to detect irAE, percentiles of the distribution of SUV from within each target organ (SUV
X%) were extracted. Percentiles of the distribution of organ SUV were pursued as potential biomarkers of irAE due to their improved reliability as compared to SUV
max [
28].
Receiver operating characteristic (ROC) analysis was performed to determine the value of organ SUV percentiles as potential quantitative imaging biomarkers of irAE development. This was done by comparing organ SUV percentiles with the clinical irAE status as determined by chart review. For patients who had multiple
18F-FDG PET/CT exams during ICI treatment, the maximum organ SUV percentile value was used as a predictor of irAE. The optimal organ SUV percentile (SUV
OPT%) was defined to be the percentile that maximized the area under the ROC curve (AUROC) for predicting irAE status (Eq.
1).
$$\mathrm{SU}{\mathrm{V}}_{\mathrm{OPT}\%}=\underset{x\in \mathrm{SU}{\mathrm{V}}_{\mathrm{X}\%}}{\mathrm{argmax}}\mathrm{AUROC}(x)$$
(1)
where SUVX% are the set of percentiles of the distribution of organ SUV. SUVOPT% was measured on all available 18F-FDG PET scans and tracked longitudinally to assess if changes in target organ 18F-FDG uptake may precede clinical irAE identification.
Target organ
18F-FDG uptake was also assessed in patients who did not experience irAE. This was done to establish normal ranges for organ
18F-FDG SUV
OPT% values against which SUV
OPT% values from patients with irAE can be compared. The 95% confidence interval for SUV
OPT% was determined for each target organ using the baseline PET images of
N=15 patients who did not experience any irAE (Eq.
2).
$$\mathrm{C}{\mathrm{I}}_{95}=\left[\mu -1.96\sigma, \mu +1.96\sigma \right]$$
(2)
where μ and σ are the mean and standard deviation of baseline SUVOPT% values of patients who did not experience irAE, respectively.
Statistical analysis
For each target organ (bowel, lung, thyroid), patients were divided into two groups: patients who experienced irAE and patients who did not experience irAE. Differences in SUV metric by irAE status were assessed with Wilcoxon rank-sum tests. p<0.05 was considered to be statistically significant. ROC analysis was performed to determine the ability of SUV metrics to detect irAE. Optimal cutoff values for detecting irAE were assigned to maximize the Youden’s index (sensitivity+specificity-1). Image analysis and statistical testing was done using MATLAB R2020b (The MathWorks, Inc., Natick, MA, USA).
Discussion
This is the first study to propose a quantitative imaging analysis of irAE development based on 18F-FDG PET/CT imaging. We hypothesized that patients with melanoma who were treated with ICIs and experience irAE would demonstrate increased 18F-FDG uptake in the involved organ at the time of clinical irAE diagnosis. In our cohort, patients who experienced immune-related thyroiditis, pneumonitis, or colitis demonstrated increased 18F-FDG uptake in the involved organs. This increased uptake could be quantified and monitored longitudinally utilizing an automated image analysis platform that employs CNN-based whole-organ segmentation. Furthermore, elevated 18F-FDG uptake preceded clinical detection of irAE in several cases, indicating that 18F-FDG PET/CT might provide valuable information in the early detection and management of irAE. This represents an automatic quantitative procedure that is reproducible, non-subjective, and provides information not available by visual analysis only. It provides additional quantitative information for both, radiologist/nuclear medicine to facilitate faster and more accurate informative reads, as well as to a treating physician about the extent of inflammation (e.g., pattern of inflammation, intensity of inflammation). As currently no clinical biomarkers for the development of irAE exists, clinicians must rely on laboratory tests and often perform invasive procedures such as biopsies; quantitative imaging biomarkers would therefore provide welcome additional tool for improved patient management.
We selected three ICI patients who were imaged regularly by
18F-FDG PET/CT to highlight as case studies of irAE detection by PET (Fig.
5,
6 and
7). Delay in irAE diagnosis might cause more severe symptoms, which are harder to reverse and could turn into life-threatening situations. Increased radiotracer uptake in the scans preceded the symptomatic clinical diagnosis. In the case of the patient highlighted in Fig.
5, the patient had increased
18F-FDG uptake in the lungs on day 245 scan (SUV
95%=4.1 g/mL) while having minimal symptoms of pneumonitis. This radiographic finding allowed the patient to have all necessary diagnostic procedures performed to confirm irAE, including bronchoscopy. For the patient highlighted in Fig.
6, increasing bowel
18F-FDG uptake is seen on the day 84 scan (SUV
95%=3.1 g/mL) and day 173 scan (SUV
95%=4.0 g/mL). Immune-related colitis was not clinically diagnosed until 3 weeks later when the patient was hospitalized and underwent colonoscopy with biopsy. This patient also experienced immune-related pneumonitis on day 273, which can be seen as bilateral, diffuse, elevated
18F-FDG uptake in the lungs (SUV
95%=2.2 g/mL). Both patients had a complete resolution of their irAE after receiving systemic corticosteroids, while they continued to have a favorable treatment response in their melanoma.
For identifying irAE, SUV
95% was the optimal SUV percentile for the bowel and lung; however, for the thyroid, SUV
75% performed best. This difference in the optimal SUV percentile is likely due to the difference in inflammation patterns in different organs. For example, inflammation in thyroid is often relatively uniform throughout the organ volume, and so the SUV histogram is shifted up uniformly in patients with thyroid AE. This is further supported by the relatively small difference in AUROC for thyroid AE classification in the range of SUV
65% to SUV
95% seen in Fig.
1. In contrast, irAE findings in the lung and bowel are often localized to a small fraction of the organ volume (e.g., a single segment of bowel), so elevated uptake is only reflected in the top few percent of the organ volume. Additionally, the lower segmentation performance of thyroid, being a smaller organ, may also be a contributing factor for reducing the optimal SUV percentile, as mis-segmentation can impact SUV percentile quantification [
28]. To further justify our methodology, the metric we used for irAE identification — percentiles of the distribution of organ SUV — is less sensitive to segmentation error than other non-histogram SUV metrics, where only a few mislabelled out of distribution pixels can significantly change the final value. The diversity of scanner and treatment setting in our multicenter study improves the robustness of this analysis by including factors that could potentially impact the SUV percentile calculations.
There are several previous case reports of radiographic findings associated with irAE on
18F-FDG PET/CT [
30]. In the case of thyroiditis, intense diffuse radiotracer uptake in the thyroid was reported in previous studies [
31]. In a small retrospective study of lung cancer patients, increased radiotracer uptake in
18F-FDG PET was found to predict thyroiditis even before laboratory findings indicating changes in the thyroid function [
17]. Both of these studies relied on manual segmentation of the thyroid for quantification. In our pilot study, we also observed elevated thyroid
18F-FDG uptake as quantified by SUV
75% in patients who experienced thyroiditis. For colitis,
18F-FDG PET was prospectively studied in a cohort of 100 metastatic melanoma patients for colitis/diarrhea [
19]. In this study, the determination of PET-colitis was made based on the presence of diffuse, clearly elevated tracer uptake in the colon as interpreted by a radiologist. However, no quantification of colonic
18F-FDG uptake was performed. They found a significant correlation between PET-colitis and clinical presentation of diarrhea. In contrast to these studies, our method for irAE detection does not require manual organ segmentation, nor does it rely on radiologist interpretation. Additionally, our study uses optimized metric, percentiles of the SUV distribution, to detect irAE. Our study is also the first to perform a quantitative analysis of
18F-FDG uptake of the lung and bowel and evaluate its association with the timing of clinical presentation of irAE in those organs.
For our proposed imaging biomarkers of irAE, the sensitivity and specificity reported were for the cutoff which maximized the sum of sensitivity and specificity. However, depending on clinical need, a different operating point may be more appropriate. The high sensitivity may be traded for increased specificity. This is especially salient in the case of the bowel, where the reported operating point has 100% sensitivity, but only 49% specificity. Instead, operating points of 83% sensitivity/64% specificity or 66% sensitivity/81% specificity could be used for bowel as seen on the ROC curve in Fig.
2.
Another important possible confounding factor in the bowel is the antidiabetic medication metformin that can cause diffuse radiotracer uptake in the bowel. Withholding metformin for 6 h prior to the exam in order to avoid hypoglycemia is not sufficient to reduce the influence of metformin on radiopharmaceutical biodistribution in the bowel. In our cohort, 5/58 (9%) patients had metformin listed in their medication list in the electronic health record (EHR) at the time of 18F-FDG PET/CT imaging. None of the patients receiving metformin experienced irColitis; however, 3/5 (60%) demonstrated elevated bowel uptake above the established normal range on at least one 18F-FDG PET/CT scan. This is one factor which contributed to decreasing the specificity of SUV95% as a biomarker of irColitis. Additionally, high variation in physiological 18F-FDG uptake not related to metformin usage may also contribute to the reduced specificity.
Quantification of 18F-FDG uptake in relevant organs for patients undergoing ICI could improve personalized management of these patients. The use of imaging biomarkers could help with proactively managing the treatment-related AE and direct when to hold or discontinue treatment before severe clinically noticeable symptoms appear. The use of accurate quantification could help to detect subclinical inflammation and guide the clinicians to be more vigilant in cases with evidence of high 18F-FDG uptake scores. There is a possibility of a subclinical increased inflammation in the target organs which might increase the 18F-FDG uptake in patients who did not experience clinically diagnosed irAE. This would have increased the mean baseline tracer uptake. However, this subclinical increase in tracer uptake does not impact the treatment course and was therefore not considered irAE.
Many unanswered questions need to be addressed in future studies. The optimal timing of
18F-FDG PET during ICI treatment for early detection of irAEs is currently unknown. The timing of irAE development varies widely, and an irAE can even occur after discontinuation or completion of ICI therapy [
32]. Perhaps a quantitative analysis of an early timepoint
18F-FDG PET/CT (e.g., at 1 month after ICI treatment initiation) could identify patients who are at greater risk for irAE, as
18F-FDG PET/CT after 1 month has been shown to be predictive of patient response to ICI [
11]. Furthermore, quantitative analysis of irAE severity and the extent of organ involvement could help to decide if ICI rechallenge could be considered with a reasonably low risk of irAE recurrence. A recent retrospective study reported a 29% recurrence rate of the same irAE after ICI rechallenge [
33]. Possibly,
18F-FDG PET/CT quantification could help us predict which patients are at higher risk for recurrence of irAE.
In this study, we quantified irAE of the lung, bowel, and thyroid. However, it is known that irAE can affect practically any organ, some of them being very rare but often life threatening [
6,
8,
9]. Due to their rarity, a large study cohort would be required to identify imaging biomarkers of irAE in these organs. On the other hand, some irAE are common and can be readily diagnosed clinically, for example, cutaneous irAE (rash, pruritis), which are readily apparent without imaging.
There are several limitations of this study which should be discussed. First, this study was conducted in a retrospective cohort of patients. Due to retrospective nature of the study, some data on irAE was collected by reviewing the available medical documentation and was limited for some patients. Also, there was a variability of timing of
18F-FDG PET/CT assessments based on the clinical course of the disease, clinicians’ decision, and mainly for disease status and response assessment and not focused on the irAE assessment. Prospective trials with larger groups of patients, more regimented
18F-FDG PET/CT imaging and data collection, and prospective irAE grading and tumor response assessment are needed in the future [
34]. Despite these limitations, we provided preliminary data on the process of development of an AI-supported platform capable of quantitative assessment of irAE using
18F-FDG PET/CT scans.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.