Evaluation of Automatic Atlas-Based Lymph Node Segmentation for Head-and-Neck Cancer

doi:10.1016/j.ijrobp.2009.09.023

International Journal of Radiation OncologyBiologyPhysics

Volume 77, Issue 3, 1 July 2010, Pages 959-966

https://doi.org/10.1016/j.ijrobp.2009.09.023 Get rights and content

Purpose

To evaluate if automatic atlas-based lymph node segmentation (LNS) improves efficiency and decreases inter-observer variability while maintaining accuracy.

Methods and Materials

Five physicians with head-and-neck IMRT experience used computed tomography (CT) data from 5 patients to create bilateral neck clinical target volumes covering specified nodal levels. A second contour set was automatically generated using a commercially available atlas. Physicians modified the automatic contours to make them acceptable for treatment planning. To assess contour variability, the Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm was used to take collections of contours and calculate a probabilistic estimate of the “true” segmentation. Differences between the manual, automatic, and automatic-modified (AM) contours were analyzed using multiple metrics.

Results

Compared with the “true” segmentation created from manual contours, the automatic contours had a high degree of accuracy, with sensitivity, Dice similarity coefficient, and mean/max surface disagreement values comparable to the average manual contour (86%, 76%, 3.3/17.4 mm automatic vs. 73%, 79%, 2.8/17 mm manual). The AM group was more consistent than the manual group for multiple metrics, most notably reducing the range of contour volume (106–430 mL manual vs. 176–347 mL AM) and percent false positivity (1–37% manual vs. 1–7% AM). Average contouring time savings with the automatic segmentation was 11.5 min per patient, a 35% reduction.

Conclusions

Using the STAPLE algorithm to generate “true” contours from multiple physician contours, we demonstrated that, in comparison with manual segmentation, atlas-based automatic LNS for head-and-neck cancer is accurate, efficient, and reduces interobserver variability.

Introduction

Intensity-modulated radiation therapy (IMRT) has allowed for multiple advances in the treatment of head-and-neck cancer (HNC), including improved parotid gland sparing 1, 2 and higher radiation doses for tumors located near critical structures. To fully exploit the advantages of IMRT, all target volumes and critical structures must be contoured before treatment planning. This time-consuming process may be repeated multiple times during a treatment course because of tumor response or changes in patient weight or anatomy. Automatic segmentation can reduce physician contouring time, with time reductions up to 30–40% seen in studies of HNC and breast contouring 3, 4.

Another potential advantage of automatic segmentation is reduction in intra- and interobserver variability in anatomical volume delineation. Variability of contouring among physicians has been noted in a number of studies 5, 6, 7, 8. The impact of such inconsistencies may be especially evident in HNC radiotherapy, where the range of interobserver variability is somewhat larger and may exceed the errors due to position uncertainty and organ motion (9). Interobserver variability may not affect an individual radiation oncologist's contours, but it does impact the field as a whole in regards to interpretation of clinical trials results and consistency across the specialty.

Automatic segmentation has been shown to reduce variability of contours among physicians and improve efficiency for multiple disease sites 3, 4. The gains in efficiency and consistency are valuable only if accuracy is not compromised. Assessment of accuracy is a complex issue, because there is no objective volume for comparison. A standard approach in the evaluation of automatic segmentation for radiotherapy planning has been to use individual expert physician segmentations for comparison. The shortcoming of this approach is that it does not address interobserver variability.

The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm is a widely accepted tool that adjusts for intra- and interobserver variability in image segmentation (10). It takes a collection of segmentations and calculates a probabilistic estimate of the true segmentation. Using this algorithm, we took a collection of physician manual contours and generated an estimate of the true segmentation, to use as the “reference standard” for contour comparisons. We compared manual, automatic, and automatic-modified (AM) contours to this standard.

This study focused on the target volume of lymph node regions for HNC patients, as manually contouring these volumes is a time-intensive task. The goal of this study was to use multiple assessment tools, including the STAPLE algorithm, to evaluate if automatic segmentation could decrease inter-physician variability while maintaining accuracy. Using these same methods, we analyzed how physicians modify automatic anatomical segmentations in terms of size, shape, and position.

Section snippets

Study overview

We selected 5 adult patients with non-bulky neck nodes who were treated with IMRT for HNC of either the oropharynx or nasopharynx. For each patient, a three-step process was performed: physicians manually contoured designated regions of interest (ROIs) on the planning CT scans; HNC atlas was automatically registered to the planning CT and delineated atlas-based ROI; and physicians reviewed and modified the atlas-based ROI.

Creation of automatic contours

A commercially available HNC atlas (Velocity Medical Systems, Atlanta,

Accuracy

The physicians' qualitative assessments of the automatic contours deemed 32% of the contours to be acceptable for treatment planning without modification. Four of the five physicians had consistent answers for all the patients, answering either yes or no for the automatic contours for all five of the patients. All of the physicians who answered no to the acceptability question indicated that the CTVs were too large.

Figure 2(a, b) and Table 1 show the comparisons between the STAPLE-manual and

Discussion

By creating “true” contours from multiple experienced physicians' manual contours, we have demonstrated that the use of atlas-based automatic lymph node segmentation can improve efficiency and decrease interobserver variability while maintaining accuracy. Variability is one of the most challenging issues in the IMRT era and recognition of this fact has motivated recent efforts to quantify variability and develop systematic approaches to improve consistency (15). The ability to accurately and

Conclusion

By creating a ground truth from multiple segmentations, the STAPLE algorithm provides a unique tool to assess variability in contouring. With the application of STAPLE, we have shown that atlas-based automatic LNS in HNC is accurate, efficient, and reduces interobserver variability. Further analysis of the variability in IMRT contouring may help improve consistency across the field and augment the education process for physicians learning IMRT.

References (18)

F.M. Fang et al.
Quality of life and survival outcome for patients with nasopharyngeal carcinoma receiving three-dimensional conformal radiotherapy vs. intensity-modulated radiotherapy—A longitudinal study
Int J Radiat Oncol Biol Phys
(2008)
K.S. Chao et al.
Intensity-modulated radiation therapy reduces late salivary toxicity without compromising tumor control in patients with oropharyngeal carcinoma: A comparison with conventional techniques
Radiother Oncol
(2001)
V.K. Reed et al.
Automatic segmentation of whole-breast using atlas approach and deformable image registration
Int J Radiat Oncol Bio Phys
(2009)
K.S.C. Chao et al.
Reduce in variation and improve efficiency of target volume delineation by a computer-assisted system using a deformable image registration approach
Int J Radiat Oncol Biol Phys
(2007)
J.S. Cooper et al.
An evaluation of the tumor-shape definition by experienced observers from CT images of supraglottic carcinomas (ACRIN Protocol 6658)
Int J Radiat Oncol Bio Phys
(2007)
T.S. Hong et al.
Variations in target delineation for head and neck IMRT: An international multi-institutional study
Int J Radiat Oncol Biol Phys
(2004)
R. Hermans et al.
Laryngeal tumor volume measurements determined with CT: A study on intra- and interobserver variation
Int J Radiat Oncol Biol Phys
(1998)
C. Rasch et al.
The potential impact of CT-MRI matching on tumor volume delineation in advanced head and neck cancer
Int J Radiat Oncol Biol Phys
(1997)
V. Gregoire et al.
CT-based delineation of lymph node levels and related CTVs in the node-negative neck: DAHANCA, EORTC, GORTEC, NCIC, RTOG consensus guidelines
Radiother Oncol
(2003)

There are more references available in the full text version of this article.

Cited by (115)

Reference standard for the evaluation of automatic segmentation algorithms: Quantification of inter observer variability of manual delineation of prostate contour on MRI
2024, Diagnostic and Interventional Imaging
The purpose of this study was to investigate the relationship between inter-reader variability in manual prostate contour segmentation on magnetic resonance imaging (MRI) examinations and determine the optimal number of readers required to establish a reliable reference standard.
Seven radiologists with various experiences independently performed manual segmentation of the prostate contour (whole-gland [WG] and transition zone [TZ]) on 40 prostate MRI examinations obtained in 40 patients. Inter-reader variability in prostate contour delineations was estimated using standard metrics (Dice similarity coefficient [DSC], Hausdorff distance and volume-based metrics). The impact of the number of readers (from two to seven) on segmentation variability was assessed using pairwise metrics (consistency) and metrics with respect to a reference segmentation (conformity), obtained either with majority voting or simultaneous truth and performance level estimation (STAPLE) algorithm.
The average segmentation DSC for two readers in pairwise comparison was 0.919 for WG and 0.876 for TZ. Variability decreased with the number of readers: the interquartile ranges of the DSC were 0.076 (WG) / 0.021 (TZ) for configurations with two readers, 0.005 (WG) / 0.012 (TZ) for configurations with three readers, and 0.002 (WG) / 0.0037 (TZ) for configurations with six readers. The interquartile range decreased slightly faster between two and three readers than between three and six readers. When using consensus methods, variability often reached its minimum with three readers (with STAPLE, DSC = 0.96 [range: 0.945–0.971] for WG and DSC = 0.94 [range: 0.912–0.957] for TZ, and interquartile range was minimal for configurations with three readers.
The number of readers affects the inter-reader variability, in terms of inter-reader consistency and conformity to a reference. Variability is minimal for three readers, or three readers represent a tipping point in the variability evolution, with both pairwise-based metrics or metrics with respect to a reference. Accordingly, three readers may represent an optimal number to determine references for artificial intelligence applications.
Clinical acceptability of automatically generated lymph node levels and structures of deglutition and mastication for head and neck radiation therapy
2024, Physics and Imaging in Radiation Oncology
Auto-contouring of complex anatomy in computed tomography (CT) scans is a highly anticipated solution to many problems in radiotherapy. In this study, artificial intelligence (AI)-based auto-contouring models were clinically validated for lymph node levels and structures of swallowing and chewing in the head and neck.
CT scans of 145 head and neck radiotherapy patients were retrospectively curated. One cohort (n = 47) was used to analyze seven lymph node levels and the other (n = 98) used to analyze 17 swallowing and chewing structures. Separate nnUnet models were trained and validated using the separate cohorts. For the lymph node levels, preference and clinical acceptability of AI vs human contours were scored. For the swallowing and chewing structures, clinical acceptability was scored. Quantitative analyses of the test sets were performed for AI vs human contours for all structures using overlap and distance metrics.
Median Dice Similarity Coefficient ranged from 0.77 to 0.89 for lymph node levels and 0.86 to 0.96 for chewing and swallowing structures. The AI contours were superior to or equally preferred to the manual contours at rates ranging from 75% to 91%; there was not a significant difference in clinical acceptability for nodal levels I-V for manual versus AI contours. Across all AI-generated lymph node level contours, 92% were rated as usable with stylistic to no edits. Of the 340 contours in the chewing and swallowing cohort, 4% required minor edits.
An accurate approach was developed to auto-contour lymph node levels and chewing and swallowing structures on CT images for patients with intact nodal anatomy. Only a small portion of test set auto-contours required minor edits.
Evaluation of different algorithms for automatic segmentation of head-and-neck lymph nodes on CT images
2023, Radiotherapy and Oncology
To investigate the performance of 4 atlas-based (multi-ABAS) and 2 deep learning (DL) solutions for head-and-neck (HN) elective nodes (CTVn) automatic segmentation (AS) on CT images.
Bilateral CTVn levels of 69 HN cancer patients were delineated on contrast-enhanced planning CT. Ten and 49 patients were used for atlas library and for training a mono-centric DL model, respectively. The remaining 20 patients were used for testing. Additionally, three commercial multi-ABAS methods and one commercial multi-centric DL solution were investigated. Quantitative evaluation was assessed using volumetric Dice Similarity Coefficient (DSC) and 95-percentile Hausdorff distance (HD_95%). Blind evaluation was performed for 3 solutions by 4 physicians. One recorded the time needed for manual corrections. A dosimetric study was finally conducted using automated planning.
Overall DL solutions had better DSC and HD_95% results than multi-ABAS methods. No statistically significant difference was found between the 2 DL solutions. However, the contours provided by multi-centric DL solution were preferred by all physicians and were also faster to correct (1.1 min vs 4.17 min, on average). Manual corrections for multi-ABAS contours took on average 6.52 min Overall, decreased contour accuracy was observed from CTVn2 to CTVn3 and to CTVn4. Using the AS contours in treatment planning resulted in underdosage of the elective target volume.
Among all methods, the multi-centric DL method showed the highest delineation accuracy and was better rated by experts. Manual corrections remain necessary to avoid elective target underdosage. Finally, AS contours help reducing the workload of manual delineation task.
Practical and technical key challenges in head and neck adaptive radiotherapy: The GORTEC point of view
2023, Physica Medica
Anatomical variations occur during head and neck (H&N) radiotherapy (RT) treatment. These variations may result in underdosage to the target volume or overdosage to the organ at risk. Replanning during the treatment course can be triggered to overcome this issue. Due to technological, methodological and clinical evolutions, tools for adaptive RT (ART) are becoming increasingly sophisticated.
The aim of this paper is to give an overview of the key steps of an H&N ART workflow and tools from the point of view of a group of French-speaking medical physicists and physicians (from GORTEC). Focuses are made on image registration, segmentation, estimation of the delivered dose of the day, workflow and quality assurance for an implementation of H&N offline and online ART. Practical recommendations are given to assist physicians and medical physicists in a clinical workflow.
A simple single-cycle interactive strategy to improve deep learning-based segmentation of organs-at-risk in head-and-neck cancer
2023, Physics and Imaging in Radiation Oncology
Interactive segmentation seeks to incorporate human knowledge into segmentation models and thereby reducing the total amount of editing of auto-segmentations. By performing only interactions which provide new information, segmentation performance may increase cost-effectively. The aim of this study was to develop, evaluate and test feasibility of a deep learning-based single-cycle interactive segmentation model with the input being computer tomography (CT) and a small amount of information rich contours.
A single-cycle interactive segmentation model, which took CT and the most cranial and caudal contour slices for each of 16 organs-at-risk for head-and-neck cancer as input, was developed. A CT-only model served as control. The models were evaluated with Dice similarity coefficient, Hausdorff Distance 95th percentile and average symmetric surface distance. A subset of 8 organs-at-risk were selected for a feasibility test. In this, a designated radiation oncologist used both single-cycle interactive segmentation and atlas-based auto-contouring for three cases. Contouring time and added path length were recorded.
The medians of Dice coefficients increased with single-cycle interactive segmentation in the range of 0.004 (Brain)–0.90 (EyeBack_merged) when compared to CT-only. In the feasibility test, contouring time and added path length were reduced for all three cases as compared to editing atlas-based auto-segmentations.
Single-cycle interactive segmentation improved segmentation metrics when compared to the CT-only model and was clinically feasible from a technical and usability point of view. The study suggests that it may be cost-effective to add a small amount of contouring input to deep learning-based segmentation models.
Integrating external beam and prostate seed implant dosimetry for intermediate and high-risk prostate cancer using biologically effective dose: Impact of image registration technique
2022, Brachytherapy
Combining external beam radiation therapy (EBRT) and prostate seed implant (PSI) is efficacious in treating intermediate- and high-risk prostate cancer at the cost of increased genitourinary toxicity. Accurate combined dosimetry remains elusive due to lack of registration between treatment plans and different biological effect. The current work proposes a method to convert physical dose to biological effective dose (BED) and spatially register the dose distributions for more accurate combined dosimetry.
A PSI phantom was CT scanned with and without seeds under rigid and deformed transformations. The resulting CTs were registered using image-based rigid registration (RI), fiducial-based rigid registration (RF), or b-spline deformable image registration (DIR) to determine which was most accurate. Physical EBRT and PSI dose distributions from a sample of 91 previously-treated combined-modality prostate cancer patients were converted to BED and registered using RI, RF, and DIR. Forty-eight (48) previously-treated patients whose PSI occurred before EBRT were included as a “control” group due to inherent registration. Dose-volume histogram (DVH) parameters were compared for RI, RF, DIR, DICOM, and scalar addition of DVH parameters using ANOVA or independent Student's t tests (α = 0.05).
In the phantom study, DIR was the most accurate registration algorithm, especially in the case of deformation. In the patient study, dosimetry from RI was significantly different than the other registration algorithms, including the control group. Dosimetry from RF and DIR were not significantly different from the control group or each other.
Combined dosimetry with BED and image registration is feasible. Future work will utilize this method to correlate dosimetry with clinical outcomes.

View all citing articles on Scopus

: Dr. Fox is entitled to royalties derived from Velocity Medical Solution's sale of products. The terms of this agreement have been reviewed and approved by Emory University in accordance with its conflict of interest policies.

View full text

Physics ContributionEvaluation of Automatic Atlas-Based Lymph Node Segmentation for Head-and-Neck Cancer

Purpose

Methods and Materials

Results

Conclusions

Introduction

Section snippets

Study overview

Creation of automatic contours

Accuracy

Discussion

Conclusion

Int J Radiat Oncol Biol Phys

Radiother Oncol

Int J Radiat Oncol Bio Phys

Int J Radiat Oncol Biol Phys

Int J Radiat Oncol Bio Phys

Int J Radiat Oncol Biol Phys

Int J Radiat Oncol Biol Phys

Int J Radiat Oncol Biol Phys

Radiother Oncol

Physics Contribution
Evaluation of Automatic Atlas-Based Lymph Node Segmentation for Head-and-Neck Cancer