Physics Contribution
Evaluation of Automatic Atlas-Based Lymph Node Segmentation for Head-and-Neck Cancer

https://doi.org/10.1016/j.ijrobp.2009.09.023Get rights and content

Purpose

To evaluate if automatic atlas-based lymph node segmentation (LNS) improves efficiency and decreases inter-observer variability while maintaining accuracy.

Methods and Materials

Five physicians with head-and-neck IMRT experience used computed tomography (CT) data from 5 patients to create bilateral neck clinical target volumes covering specified nodal levels. A second contour set was automatically generated using a commercially available atlas. Physicians modified the automatic contours to make them acceptable for treatment planning. To assess contour variability, the Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm was used to take collections of contours and calculate a probabilistic estimate of the “true” segmentation. Differences between the manual, automatic, and automatic-modified (AM) contours were analyzed using multiple metrics.

Results

Compared with the “true” segmentation created from manual contours, the automatic contours had a high degree of accuracy, with sensitivity, Dice similarity coefficient, and mean/max surface disagreement values comparable to the average manual contour (86%, 76%, 3.3/17.4 mm automatic vs. 73%, 79%, 2.8/17 mm manual). The AM group was more consistent than the manual group for multiple metrics, most notably reducing the range of contour volume (106–430 mL manual vs. 176–347 mL AM) and percent false positivity (1–37% manual vs. 1–7% AM). Average contouring time savings with the automatic segmentation was 11.5 min per patient, a 35% reduction.

Conclusions

Using the STAPLE algorithm to generate “true” contours from multiple physician contours, we demonstrated that, in comparison with manual segmentation, atlas-based automatic LNS for head-and-neck cancer is accurate, efficient, and reduces interobserver variability.

Introduction

Intensity-modulated radiation therapy (IMRT) has allowed for multiple advances in the treatment of head-and-neck cancer (HNC), including improved parotid gland sparing 1, 2 and higher radiation doses for tumors located near critical structures. To fully exploit the advantages of IMRT, all target volumes and critical structures must be contoured before treatment planning. This time-consuming process may be repeated multiple times during a treatment course because of tumor response or changes in patient weight or anatomy. Automatic segmentation can reduce physician contouring time, with time reductions up to 30–40% seen in studies of HNC and breast contouring 3, 4.

Another potential advantage of automatic segmentation is reduction in intra- and interobserver variability in anatomical volume delineation. Variability of contouring among physicians has been noted in a number of studies 5, 6, 7, 8. The impact of such inconsistencies may be especially evident in HNC radiotherapy, where the range of interobserver variability is somewhat larger and may exceed the errors due to position uncertainty and organ motion (9). Interobserver variability may not affect an individual radiation oncologist's contours, but it does impact the field as a whole in regards to interpretation of clinical trials results and consistency across the specialty.

Automatic segmentation has been shown to reduce variability of contours among physicians and improve efficiency for multiple disease sites 3, 4. The gains in efficiency and consistency are valuable only if accuracy is not compromised. Assessment of accuracy is a complex issue, because there is no objective volume for comparison. A standard approach in the evaluation of automatic segmentation for radiotherapy planning has been to use individual expert physician segmentations for comparison. The shortcoming of this approach is that it does not address interobserver variability.

The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm is a widely accepted tool that adjusts for intra- and interobserver variability in image segmentation (10). It takes a collection of segmentations and calculates a probabilistic estimate of the true segmentation. Using this algorithm, we took a collection of physician manual contours and generated an estimate of the true segmentation, to use as the “reference standard” for contour comparisons. We compared manual, automatic, and automatic-modified (AM) contours to this standard.

This study focused on the target volume of lymph node regions for HNC patients, as manually contouring these volumes is a time-intensive task. The goal of this study was to use multiple assessment tools, including the STAPLE algorithm, to evaluate if automatic segmentation could decrease inter-physician variability while maintaining accuracy. Using these same methods, we analyzed how physicians modify automatic anatomical segmentations in terms of size, shape, and position.

Section snippets

Study overview

We selected 5 adult patients with non-bulky neck nodes who were treated with IMRT for HNC of either the oropharynx or nasopharynx. For each patient, a three-step process was performed: physicians manually contoured designated regions of interest (ROIs) on the planning CT scans; HNC atlas was automatically registered to the planning CT and delineated atlas-based ROI; and physicians reviewed and modified the atlas-based ROI.

Creation of automatic contours

A commercially available HNC atlas (Velocity Medical Systems, Atlanta,

Accuracy

The physicians' qualitative assessments of the automatic contours deemed 32% of the contours to be acceptable for treatment planning without modification. Four of the five physicians had consistent answers for all the patients, answering either yes or no for the automatic contours for all five of the patients. All of the physicians who answered no to the acceptability question indicated that the CTVs were too large.

Figure 2(a, b) and Table 1 show the comparisons between the STAPLE-manual and

Discussion

By creating “true” contours from multiple experienced physicians' manual contours, we have demonstrated that the use of atlas-based automatic lymph node segmentation can improve efficiency and decrease interobserver variability while maintaining accuracy. Variability is one of the most challenging issues in the IMRT era and recognition of this fact has motivated recent efforts to quantify variability and develop systematic approaches to improve consistency (15). The ability to accurately and

Conclusion

By creating a ground truth from multiple segmentations, the STAPLE algorithm provides a unique tool to assess variability in contouring. With the application of STAPLE, we have shown that atlas-based automatic LNS in HNC is accurate, efficient, and reduces interobserver variability. Further analysis of the variability in IMRT contouring may help improve consistency across the field and augment the education process for physicians learning IMRT.

References (18)

There are more references available in the full text version of this article.

Cited by (115)

View all citing articles on Scopus

Dr. Fox is entitled to royalties derived from Velocity Medical Solution's sale of products. The terms of this agreement have been reviewed and approved by Emory University in accordance with its conflict of interest policies.

View full text