Introduction
Treatment of ductal carcinoma in situ of the breast (DCIS) currently consists of surgery [
1,
2], radiotherapy [
2‐
6] and sometimes even (low-dose) tamoxifen [
2‐
4,
7‐
9]. However, it is believed that an unknown number of DCIS patients are treated for lesions that may never progress into invasive breast cancer [
10‐
13]. Therefore, four randomized controlled clinical trials aim to identify a group of low-risk DCIS patients that, under active surveillance, may safely forgo surgical treatment [
11,
12,
14‐
16].
For all trials, the main biomarker which identifies DCIS as being low-risk, is histologic (nuclear) grade, although different classifications are used by the LORD-, LORIS-, COMET-, and LARRIKIN trials [
11,
12,
14‐
16]. The general hypothesis of these trials is that progression risk, or at least speed of progression is higher for high-grade lesions [
17,
18], and if a low-grade DCIS does become invasive, it will be a low-grade invasive carcinoma with favorable characteristics and excellent survival rates after treatment [
11,
19].
Besides the fact that grade may become the single biomarker that is used to decide whether patients are treated for their DCIS, grade already plays an important role in clinical patient management. For example, grade influences radiotherapy decisions (omitting a boost, considering partial breast irradiation) [
1,
6] and indicates (on biopsy) whether a sentinel lymph node procedure is required [
1]. Thus, accurate, consistent, and reproducible grading is of key importance. However, we previously showed that variation in grading, between pathology laboratories and between pathologists within laboratories, is substantial in daily clinical practice on a nationwide scale in the Netherlands [
20]. Furthermore, studies in which a set of DCIS was assessed by several pathologists showed significant inter-observer variation, regardless of the used classification, as well [
21‐
23].
As studies have shown that quality of breast cancer care can be improved by auditing and benchmarking [
24‐
29], the results of our previous study were sent to all participating Dutch pathology laboratories as feedback reports, in which their proportions per grade were benchmarked against other laboratories. This enabled pathologists to discuss and reflect upon their grading practices. The present study was conducted to investigate the effect of these feedback reports on grading variation between laboratories on a nationwide scale.
Methods
Data source
All data were retrieved from PALGA, the Dutch nationwide network and registry of histo- and cytopathology, which contains excerpts of all pathology reports from laboratories in the Netherlands [
30]. Data within this database are pseudonymized in the laboratories themselves and by a trusted third party (ZorgTTP, Houten, the Netherlands). In addition, as data on the reporting pathologist was not available in PALGA, laboratories provided the pathologist information directly to the PALGA data-analyst. In a final step, the PALGA data-analyst anonymized all laboratories and pathologists to the researchers. This study was approved by the scientific and privacy committee of PALGA and data were retrieved and handled in compliance with the General Data Protection Regulation Act (GDPR).
Study population
We retrieved all synoptic pathology reports of patients with pure DCIS resection specimens (i.e. without a report of a known adjacent invasive breast cancer) in the Netherlands between March 1, 2017 and March 1, 2019 from PALGA (
n = 3336). Per pathology report, patient- and tumor characteristics were extracted (sex, age, type of surgery, histologic grade and tumor size). Reports with missing data on histologic grade and/or tumor size were excluded (Supplementary Fig.
1). Pathology reports of residual in situ lesions after neoadjuvant treatment of primary invasive breast carcinomas were excluded. Furthermore, ipsilateral DCIS reports within six months of the previous DCIS resection specimen report were considered paired measurements of which we only included the first report.
In total, 38 out of 42 pathology laboratories in the Netherlands reported resection specimens via the synoptic (PALGA) protocol from March 1, 2017 and onwards. Of these 38 laboratories, we included those that annually reported a minimum of 15 DCIS resection specimens within the protocol.
Feedback reports
Laboratory-specific feedback reports, with proportions per histologic grade of individual laboratories benchmarked against other anonymized laboratories [
20], were sent out by March 1, 2018. The general feedback report is available on the PALGA website (in Dutch only) [
31], while laboratory-specific reports are only available to the individual laboratories themselves.
Laboratory-specific reports consisted of funnel plots, in which absolute differences in proportion of histologic grade, are plotted against the number of included IBC per laboratory. Within these funnel plots, the national proportions per grade with their 95% confidence intervals as limits were set as targets.
In addition, all laboratories were asked to provide coded information of the reporting pathologist per pathology report, to compare grading practices of the different pathologists within their laboratory. These data on pathologist’ level were provided by ten laboratories, which as a result received feedback on both laboratory- and pathologist’ level. Thus, pathologists within these ten laboratories were enabled to discuss and reflect upon both their personal- and overall laboratory grading practices.
As feedback reports were sent to the laboratories by March 1, 2018, we considered the period from March 1, 2017 up to and including March 1, 2018 as pre-feedback period, while the period from March 2, 2018 up to and including March 1, 2019 is considered the post-feedback period.
DCIS grading classification
Histologic grade was defined in the pathology reports as grade I, II or III, without a specification of which classification was used. The Dutch guideline [
1] recommends to use the classification of Holland [
32]. However, we know from our previous research that numerous different guidelines are used by Dutch pathologists in daily practice [
20].
Statistical analysis
Patient and tumor characteristics from the pathology reports were summarized in Table
1. Differences of these characteristics between pre- and post-feedback pathology reports were tested by means of a χ
2-test for categorical variables and by means of a Mann-Whitney U test for continuous variables.
Table 1
Characteristics of the 2954 included ductal carcinoma in situ (DCIS) lesions from the PALGA database between March 1, 2017 and March 1, 2018 (PRE feedback), and March 1, 2018 to March 1, 2019 (POST feedback)
Histologic grade, n (%) |
Grade 1 | 383 (13.0%) | 169 (11.3%) | 214 (14.6%) | 0.016 |
Grade 2 | 1171 (39.6%) | 590 (39.5%) | 581 (39.8%) | |
Grade 3 | 1400 (47.4%) | 734 (49.2%) | 666 (45.6%) | |
Age (y)a | 59.8 (10.3) | 59.8 (10.5) | 59.8 (10.1) | 0.922 |
Sex, n (%) | | | | 0.790 |
Female | 2943 (99.6%) | 1487 (99.6%) | 1456 (99.7%) | |
Male | 11 (0.4%) | 6 (0.4%) | 5 (0.3%) | |
Tumor size (cm)a | 2.5 (2.3) | 2.5 (2.3) | 2.5 (2.3) | 0.321 |
Type of surgery, n (%) | | | | 0.096 |
Mastectomy | 944 (32.0%) | 456 (30.5%) | 488 (33.4%) | |
Breast conserving | 2010 (68.0%) | 1037 (69.5%) | 973 (66.6%) | |
Overall proportions for DCIS grades I-III were determined pre- and post-feedback. Absolute differences from the overall proportions per laboratory are presented in bar charts per feedback period for grades I-III. We calculated an overall deviation score (ODS), consisting of the sum of absolute deviations from the grade-specific overall proportions per period, to compare the absolute deviation for all grades at once. Differences in ODS of the individual laboratories between the pre- and post-feedback period were compared by means of a Wilcoxon signed-rank test.
We used several, arbitrary, definitions of change as a possible way to interpret the type of change in laboratories. Laboratories who showed an absolute change of ≤2% were defined as ‘not shifting’. Among laboratories that showed an absolute change of > 2%, laboratories moving closer to the overall mean after feedback were defined as ‘less deviant’, while laboratories moving further from the overall mean after feedback were defined as ‘more deviant’.
A logistic regression analyses, providing odds ratios (ORs) and 95% confidence intervals (CI) per laboratory, was used to compare relative differences among laboratories. For the logistic regression model, grade was dichotomized into low-grade DCIS (grade I-II) and high-grade DCIS (grade III), as this is the same definition that is used by the majority of the clinical trials [
11,
12,
14,
16,
20]. The laboratory best resembling the national distribution on low- (grade I-II) and high-grade (grade III) was chosen as reference laboratory. We performed a multivariate logistic regression analysis to correct for differences in case-mix. Variables were selected based on our previous research [
20] and consisted of tumor size and type of surgery.
To compare differences in case-mix adjusted ORs of the individual laboratories, the positive OR difference, i.e. the difference of the laboratory OR to the reference OR of 1.00, was calculated. These positive OR differences of the individual laboratories were compared pre- and post-feedback by Wilcoxon signed rank test.
All analyses were performed by using IBM SPSS Statistics version 15.0.0.2.
Discussion
Using data from structured pathology reports, we investigated the effect of feedback reports on nationwide inter-laboratory grading variation of DCIS. The decrease in absolute variation for grades II (6.5%) and III (22.0%), as well as the overall majority of laboratories becoming less deviant after feedback, and the decrease of both the mean and maximum ODS seem to indicate a promising decrease in DCIS grading variation. However, absolute variation increased for grade I (6.2%), and the range of case-mix adjusted ORs remained fairly stable and large after feedback. Furthermore, the absolute range between laboratories remains substantial, and maybe even clinically unacceptable, for all grades.
We hypothesize that the lack of consensus on a grading classification [
13,
20], which is also reflected by the use of different classifications by the trials [
11,
12,
14‐
16], may be the explanation for these mixed results as we believe that uniform grading starts with the use of single grading classification by all pathologists. Furthermore, in comparison to grading of invasive breast cancer, which is performed according to the modified Bloom and Richardson grading classification (Elston-Ellis modification) [
33,
34], with scoring of the three subcategories (tubular differentiation, nuclear polymorphism and mitotic count) grading of DCIS is less standardized. Recently, dichotomous histopathological assessment of ductal carcinoma was studied as an alternative with (better) acceptable degrees of interobserver variability, however, interobserver variation remained considerable and at best acceptable [
35]. In addition, other recent data showed that even among 35 expert breast pathologists, the threshold for comedonecrosis is highly variable [
36]. This highlights the complexity of histologic grading of DCIS and the need for clear and uniform guidelines [
36]. This not only important to the possible implementation of trials results into daily clinical practice, but it may also benefit the quality of the trials itself as central pathology review is not always carried out [
37].
Interestingly, 7 out of 38 laboratories that used the synoptic PALGA protocol, were excluded for grading less than 15 DCIS resection specimens (via the protocol), which practically means they grade little over one (pure) DCIS specimen per month. For two laboratories this was because they likely only started using the protocol while another laboratory, stopped using the protocol for unknown reasons. Yet, for the remaining four laboratories, numbers per year (pre- or post-feedback) were fairly stable and low. We would like to emphasize, however, that pathologists within these laboratories may still report DCIS resection specimens outside the protocol, and therefore, they may grade more DCIS than our data would suggest. Nevertheless, if these are the actual numbers of DCIS’ that are assessed in specific pathology laboratories per year, it may be questioned whether this is desirable, especially with regard to the complexity of histologic grading.
Easy data extraction on a large (nationwide) scale and an increased overall completeness of reports [
38] were the main reasons to only include synoptic pathology reports. Furthermore, over 80% of breast resection specimens is reported via the synoptic protocol [
39].
The mean numbers per histologic grade in this study (13.0% for grade I, 39.6% for grade II and 47.4% for grade III) are in line with other studies [
20,
40,
41]. Interestingly, we observed a significant change in grade distribution after feedback as the proportion of grade I tumors increased by over 3%, while the proportion of grade III tumors decreased by a similar percentage. Whether this is initiated by the feedback reports or whether it reflects a true change in the population of DCIS patients remains unknown. Nonetheless, it did make it more difficult to interpret the results regarding deviations from the mean.
Overall, after analyzing the data in an absolute and relative manner, we did observe promising and positive changes, reflected by the decrease in absolute variation for DCIS grade II and III and the decrease of both the mean and maximum ODS. Furthermore, the majority of laboratories became less deviant after feedback for grades II and III. Hence, for grades II and III most deviant laboratories became less deviant, indicating that there is a decrease of the extremes, while changes of individual laboratories (ODS, positive OR-differences) were not significant. In contrast, the results for DCIS grade I showed an increase in the absolute range between laboratories, while the overall range of ORs remained stable and substantially large, ranging from 0.39 (95% CI: 0.18–0.86) to 3.69 (95% CI: 1.30–10.51). This shows that variation in histologic grading is still far from clinically acceptable levels.
Our results confirmed that feedback on the individual level (i.e. the pathologist) may be more effective than feedback on the general level (i.e. laboratories) [
42‐
45]. We observed a larger absolute decrease of the mean ODS of laboratories who also received individual feedback. Furthermore, these laboratories also showed more improvement as compared to laboratories who only received feedback on laboratory-level. Due to the low number of laboratories who received pathologist-specific feedback reports, differences were not statistically significant.
The observed effects on grading variation may not exclusively be attributed to the feedback reports. However, we would like to emphasize that between 2016 and March 1, 2019, besides the feedback reports, no other interventions or guideline changes took place. Furthermore, our previous paper [
20] was only first published after feedback reports were sent to the individual pathology laboratories. We believe that these feedback reports may be a useful tool to, at least, monitor grading variation in daily clinical practice.
Conclusion
We observed a promising decrease in grading variation for DCIS grades II and III, while this was not observed for DCIS grade I. The overall variation for all grades remains substantial, and it seems that histologic grade is far from being a clinically acceptable biomarker for DCIS, let alone be the single biomarker that decides whether patients may be treated. In light of the current ongoing trials, improvement and standardization of DCIS grading is adamant. Continuing with feedback reports, especially on pathologist’ level, helps to create awareness and may open the discussion about nationwide consensus on a single grading classification. In addition, adequate training of (expert breast) pathologists, according to a single classification system, for example by e-learning, may help to establish uniform grading in clinical practice over time.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.