Main

The objectives of mammography screening are to detect small tumours and prevent them from growing to a larger size and becoming lethal (Tabar et al, 1999). As effectiveness of early detection in reducing breast cancer mortality has been demonstrated, there seems to be no doubt that this is the case for a significant proportion of screen-detected (SD) cases. However, SD cancers are not only smaller than clinically presenting cancers but also more indolent biologically as they are lower histological grade, express p53 and Ki-67 nuclear proteins less frequently, and have fewer mitotic cells, more moderate/rich oestrogen and progesterone receptor levels, and lower levels of microvessel density (Uyterlinde et al, 1991; Hakama et al, 1995; Moezzi et al, 1996; Tabar et al, 1999; Groenendijk et al, 2000; Ernst et al, 2002). An often debated question (Thomas, 1995) is: if left undiagnosed during their preclinical phase, would they become biologically more aggressive or would they only increase in size?

The only objective method to determine whether biological behaviour of preclinical breast cancer worsens over time is to periodically take and analyse cell samples or tissue specimens from SD lesions surgically untreatable. As no such observations have ever been reported, current knowledge is based on cross-sectional comparisons of SD lesions with clinically presenting tumours.

The SCREENREG study was conducted by a group of Italian cancer registries to evaluate the effect of mammography screening on the trends in stage-specific incidence of breast cancer. Among its secondary objectives was to investigate the question of biological progression of the preclinical disease.

Materials and methods

General methods of the SCREENREG study are reported in detail elsewhere (Buiatti et al, 2002, 2003). In brief, each participating registry contributed a consecutive series of breast cancer cases (International Classification of Diseases for Oncology (ICD-O) topography code 174) registered before and after implementation of the local screening programme. Staging and treatment information was retrospectively retrieved and prospectively collected by trained personnel with a review of original pathology and clinical case records. Data were submitted to the coordinating centre according to a common set of variables. These included the following: original index number; date of birth; date of registration; histological type (ICD-O morphology code); simultaneous bilaterality (yes, no); multifocality (yes, no); surgical treatment (unknown, unperformed, conservative, radical); tumour size (invasive component) in mm; pT category according to tumour, node, metastasis (TNM) classification; number of axillary lymph nodes recovered; number of positive axillary lymph nodes; pN category; distant metastases (yes, no/unknown); and detection modality (SD, death certificate only, clinical diagnosis). Women with simultaneous bilateral cancers were classified according to the lesion with the highest pN (or pT in the case of equal pN).

Rationale

Starting from the universal observation that the average risk of lymph node involvement is lower for SD breast cancer cases compared with clinically diagnosed cases, the rationale of the current study was based on the following assumptions: (1) axillary lymph node status is the product of biological aggressiveness and chronological age of the disease (Mittra and MacRae, 1991, 2) for any breast cancer case, tumour size is an indicator of its chronological age, and (3) for SD cases, tumour size is specifically an indicator of the duration of the preclinical phase, that is, an inverse indicator of lead time (Anderson et al, 1991; Norden et al, 1997; Ernst et al, 2002). We evaluated the tumour size-specific risk of lymph node involvement for SD cases relative to that for clinical cases. The study hypothesis was that the relative protection of SD cases from the risk of nodal involvement (and thus, their relative biological indolence) decreases with increasing tumour size, that is, with increasing duration of the preclinical phase or decreasing lead time. If this hypothesis was true, then it would be suggested that biological characteristics of breast cancer worsen progressively during the preclinical phase.

As reported elsewhere (Buiatti et al, 2003), the SCREENREG study showed an incidence increase of early-stage breast cancer following introduction of screening that was only partially explained by the proportion of SD cases. As this was compatible with a concomitant diffusion of spontaneous screening outside organised programmes, the series of breast cancers registered in the last 5 years prior to screening implementation was considered a more reliable comparison group.

Case series

The SCREENREG database included 20 258 cases. Selection of eligible cases was based on the following criteria. First, we excluded all records from three cancer registries supplying only prescreening cases or cases with unknown tumour size in mm. The study was restricted to seven provinces situated in northern Italy (Torino, Parma, Modena, Ferrara, Ravenna, Forlì-Cesena, and Rimini). The total female population was about 2 300 000, that is, 8% of Italian women according to 1991 census. The year of registration varied between 1988 and 1999. The year of first implementation of screening for women aged 50–69 years was 1992 for Torino, 1995 for Modena, 1996 for Ravenna and Forlì-Cesena, 1997 for Ferrara and Rimini, and 1998 for Parma. The years of screening covered by the study varied between 1 and 4.

Second, we excluded the clinical cases registered in the years when each local screening programme was ongoing, the cases registered >5 years before the implementation of each programme, and the cases aged <50 or >70 years (a considerable proportion of cases detected by mammography at age 69 years were surgically treated and registered at age 70 years).

Third, we excluded the cases with the following characteristics: registration with death certificate only; ICD-O morphology code of sarcoma, lymphoma, and leukaemia; ICD-O behaviour code 2 or pTis according to TNM classification; tumour size of 1 mm or pT1mic; no evidence of primary tumour or pT0; pT4; presurgical chemotherapy; surgical treatment unperformed or unknown; and multifocality.

There remained 4055 potentially eligible cases of invasive, surgically treated, unifocal, pT1a-pT3 breast carcinoma aged 50–70 years. These comprised 1111 SD cases and 2944 clinically presenting cases. The current analysis considered only those cases undergoing axillary dissection and classified for tumour size in mm, number of lymph nodes recovered, and pN. These numbered 994 (89%) among SD cases and 2335 (79%) among clinical cases, for a total of 3329 (82%).

Statistical analysis

General characteristics of SD cases were compared with those of clinically diagnosed cases using the Kruskal–Wallis test (distribution by age and number of lymph nodes recovered) and the χ2 test (distribution by histological type and tumour size). As shown in Figure 1, the frequency distribution of tumour size was found to be compatible with a major phenomenon of rounding-up to the nearest multiple of 5 mm. To reduce biases, tumour size was categorised as 2–7, 8–12, 13–17, 18–22, 23–27, and 28 mm.

Figure 1
figure 1

Frequency distribution of study cases (n=3329) by detection modality and tumour size.

As a first step in data analysis, the odds ratio (OR) estimate (with 95% confidence interval (CI)) of the risk of nodal involvement for SD cases compared with clinical cases was calculated for each tumour size category. Total OR was adjusted for tumour size category using the Mantel–Haenszel method. In both groups of cases, the association of tumour size with nodal status was evaluated with the χ2 test for trend.

A multiple logistic regression model (model #1) was then built that included terms for detection modality, tumour size category, patient age (as a continuous variable), histological type (ductal, lobular, tubular, other), number of lymph nodes recovered (continuous), and registry. These were removed from the model if the likelihood ratio statistic based on the maximum-likelihood estimates had a probability greater than 0.10.

A term for the detection modality-by-tumour size category interaction was then entered. The objective of this second (#2) model was to determine whether the relative risk of lymph node involvement for SD cases compared with clinically detected cases varied in relation to tumour size category. The OR associated with detection modality was calculated as

where x1 is the detection modality, x2, …, x5 are the dummy variables for tumour size category, x6, …, x9 are the dummy variables for the detection modality-by-tumour size interaction, and x10 to xp are the other covariates. Significance of interaction was tested with the deviance χ2 test.

Adequacy of model #2 was examined with the goodness-of-fit test. In addition, points or cases for which the model did not fit sufficiently were identified with the calculation of the standardised residuals or outliers, the leverage points, and the delta–beta points. As original records could not be retrieved and checked for coding and data-entry errors, all of these were removed (model #3) and analysis was repeated.

Finally, the OR for the main effect of detection by screening in model #3 was computed as exp ( β 1 ) for the 2–7 mm size category, and exp ( β 1 + β j ) (where j=2, …, 5) for the subsequent categories (Kleinbaum and Klein, 2002).

Results

The median age was 61 years in both groups of cases. SD cancers had a more favourable distribution by tumour size (P=0.000), with 73% cases 17 mm vs 48%, a greater median number of axillary lymph nodes recovered (17 vs 15, P=0.000), and a distribution by histological type (ductal, 80%; lobular, 14%; tubular, 3%; other, 2%) similar to that of clinical cases (ductal, 80%; lobular, 13%; tubular, 2%; other, 4%).

Table 1 shows that the average proportion of SD cases with positive lymph nodes, 23%, was lower than that of clinically detected cases, 40%. The crude OR was 0.44. After adjustment for tumour size category, overall protection from the risk of lymph node involvement decreased to 0.62 (95% CI: 0.52–0.75). In both groups, the proportion of node-positive cases was positively associated with tumour size (P=0.000). However, it increased more steeply among SD cases. This led the size-specific OR to roughly increase until a diameter of 22 mm.

Table 1 Univariate analysis of the risk of axillary lymph node metastases

Table 2 shows the results of the first two logistic models fitted. In model #1, after simultaneous adjustment for tumour size, patient age, number of axillary lymph nodes recovered, and histological type, SD cases showed a significantly lower risk of lymph node involvement (OR=0.59).

Table 2 Multivariate analysis of the risk of axillary lymph node metastases

Model #2 revealed a significant detection modality-by-tumour size interaction (deviance χ2=13.78, df=5, P=0.017). Although irregularly, the risk of lymph node involvement for SD cases relative to that of clinical cases increased with increasing tumour size. The OR for the effect of detection by screening was 0.33 when the tumour size was 2–7 mm (reference category). Data in the table indicate that the OR was 0.33 × 1.57 (or 0.52) in the 8–12 mm size category, 0.33 × 1.28 in the 13–17 mm size category, and so on.

In model #2, 49 standardised residuals (four of which were also delta–beta points) and 35 leverage points (three of which were also delta–beta points) were identified. After exclusion of these cases, analysis was repeated (model #3) (Table 3). The results showed a stronger detection modality-by-tumour size interaction. With a size of 2–7 mm, detection by screening appeared to be associated with an OR as low as 0.05. The other ORs in the table must be interpreted like those resulting from model #2.

Table 3 Multivariate analysis of the risk of axillary lymph node metastases after removal of standardised residuals, leverage points, and delta–beta points from model #2 in Table 2

Table 4 gives the final outcome of analysis. The OR for the main effect of detection by screening was computed from the results of model #3. It appears that, taking the detection modality-by-tumour size interaction into account after exclusion of residuals and leverage points, the risk of nodal metastases increased linearly with increasing tumour size and approached unity among cases 18–22 mm in diameter. Although with borderline significance, the OR was below unity for the last two size categories.

Table 4 Risk of axillary lymph node metastases for SD cases vs clinical cases: estimate of the main effect of detection by screening according to tumour size category as obtained from model #3

Discussion

As expected, the average proportion of patients with lymph node metastases was lower among SD cases than it was among clinically presenting cases. Moreover, it was positively associated with tumour size in both groups. With increasing tumour size, however, it increased more steeply among SD cases. Accordingly, analysis of interaction demonstrated that the relative risk of lymph node involvement for preclinical cancers increased with increasing tumour size. In other words, their relative advantage was progressively eroded. If our assumption of tumour size of SD cases as a proxy inverse indicator of lead time is valid, then our results are compatible with the interpretation that the biological aggressiveness of breast cancer increases progressively during the preclinical phase.

This does not explain why, after progressively approaching unity, the relative risk of nodal metastases decreased again – although at a borderline level of significance – for SD cases of larger size. Owing to the paucity of such cases (Table 1), their behaviour might be subject to random variation. A similar observation, however, was also reported from the Edinburgh Randomised Breast Screening Project (Anderson et al, 1991). The hypothesis we raise points to the fact that most years of screening covered by this study were the initial years of each local programme. As large tumour size suggests short or virtual lead time, the relative risk of nodal metastases for those cases is likely due to the presence of poorly aggressive diseases detected at prevalence screen. If so, a consistent and comprehensive interpretation of results is that (1) a small subset of SD cancers with a relatively stable biological indolence actually exist, (2) they become apparent only among the few large-sized cases detected at first screen but with no significant lead time, and (3) among small, true preclinical SD lesions, their stable biological behaviour is overwhelmed by those cases for which the relative risk of nodal metastases is inversely related to lead time.

Our findings are at variance with some previous studies. In particular, the common statement (Klemi et al, 1992; Norden et al, 1997; Molino et al, 2000; Heimann et al, 2002) that the risk of nodal involvement adjusted for tumour size is lower for SD cancers was demonstrated to be misleading, although formally correct. With a size-specific pattern of relative risk such as that shown in Table 1, tumour size qualifies as an interaction factor rather than a confounder. This is equivalent to saying that adjustment for tumour size obscures the real effect of this variable on the relative risk of nodal involvement. Using a study design similar to our own, Tabar et al (1987) failed to demonstrate a significant detection modality-by-tumour size interaction. The statistical power of the study, however, was insufficient for this effect to be formally evaluated.

The current investigation confirms one observation reported by Anderson et al (1991). In a case series from the Edinburgh Randomised Breast Screening Project, SD cancers had a crude (univariate) advantage in the frequency of positive lymph nodes that decreased progressively from pT1a to pT1c lesions. Moreover, our findings are compatible with those by Ernst et al (2002), who observed univariate differences between some biological characteristics of SD and clinical cases that were restricted to lesions 20 mm in diameter. Assuming that lymph node status for any given tumour size reflects the biological aggressiveness of the disease, this study also adds support to the view that malignancy grade of preclinical breast cancer increases with increasing tumour size (Duffy et al, 1991; Tabar et al, 1999).

Could there be alternative explanations for our results? In the first place, we have to consider that the proportion of potentially eligible cases included into analysis was smaller for clinical cases. The difference, however, was limited (79 vs 89%). Moreover, there is no specific reason to believe that the observed trend in the relative risk of nodal involvement reflected a selection bias, if any.

Another problem is that screening may be responsible for overdiagnosis of nonaggressive cancers (Holmberg et al, 1992). In a pooled estimate, these accounted for 10–20% of cases detected in three screening trials (Wald et al, 1994). In many studies, however, no evidence for overdiagnosis was obtained (Peeters et al, 1989; Olsen et al, 2003). Most importantly, no published data support an inverse association between overdiagnosis and tumour size, that is, the conditio sine qua non for this phenomenon to be considered a potential explanation for our results.

Overdiagnosis may also result from histological misinterpretation of benign lesions as malignant. As small tumours are expected to be highly differentiated, the risk of this type of overdiagnosis occurring is inversely related to lesion size and, thus, is greater among SD cases (Holmberg et al, 1992). It clearly appears, however, that size stratification of analysis made this potential problem to have no influence, unless one speculates that impalpability itself conveys a greater risk of misinterpretation. Moreover, the hypothesis that lesions detected by mammography were more likely to be misinterpreted implies that histology evaluation was more accurate in the prescreening years and/or less accurate in the hospitals involved in screening. In fact, an opposite time trend and a higher standard of quality in breast surgery reference hospitals are more conceivable, if any.

One related hypothesis is that study results were biased by differences in accuracy of histopathological staging. The observed relative risk of nodal involvement for small tumours detected by screening might be accounted for by a systematic overestimate of their diameter and/or an opposite mismeasurement of clinically diagnosed lesions. Frequency distribution in Figure 1, however, suggested that major error in tumour size measurement was a random one. We also considered that the risk of nodal involvement in the earliest cases could be biased by differences in detection of microinvasive carcinomas. In fact, we excluded such lesions from both study groups. As to lymph node status, SD cases had a slightly greater median number of lymph nodes recovered (17 vs 15). This variable, however, was entered into the multivariate models.

Finally, our results raise questions about mammography sensitivity. It is generally agreed that mammography is more sensitive for indolent lesions (Thomas, 1995). If one assumes that biological characteristics of breast cancer are stable, then our results may be compatible with the explanation that the tendency for mammography to be more sensitive for indolent lesions is concentrated among small tumours and decreases with increasing size. To our knowledge, however, such a hypothesis has never been raised.

Some methodological limitations in the study design need to be pointed out. First, the large size of this multicentre study allowed for a formal analysis of the effect of tumour size on the relative risk of lymph node involvement. However, the case series was not large enough to enable the nodal status to be defined as the number of positive lymph nodes. In particular, the number of SD cases with 4 positive nodes was negligible. In the reference 2–7 mm size category, there were only two such cases.

Second, a linear increase in the relative risk of nodal metastases for SD cases – as one would expect from a ‘natural’ phenomenon – was observed only after removal of residuals, leverage points, and delta–beta points. Unfortunately, we were unable to check the original records for coding and data-entry errors, if any. Exclusion of those cases (model #3) led to an OR as low as 0.05 for the smallest lesions. Although more reliable for our purposes, this estimate must be considered with caution.

Third, we could not compare the two study groups of cases for the biological indicators most commonly used in the literature. The only item of biological information virtually collected by Italian cancer registries, that is, tumour grade, was excluded from the SCREENREG database because of incomplete availability and poor standardization. However, as pointed out in other studies similar to our own (Norden et al, 1997; Heimann et al, 2002), nodal status is the strongest and most objective single indicator of biological virulence of breast cancer for any given tumour size.

In this study, we attempted to explore one of the most interesting and uncertain theoretical issues of mammography screening. We conclude that our results add further support to the view that biological aggressiveness of breast cancer increases during the preclinical phase. Our observations suggest that if a preclinical breast cancer is left undiagnosed, its biological behaviour worsens before the disease surfaces clinically.