This section is structured in four parts: a) The obtention of the US breast cancer hazard rate functions, by stage and also by age and stage; b) The description of the existent and inferred Catalan survival data; c) The estimation of the hazard ratios (HR) or relative hazards that compare Catalonia versus the US, which sometimes are time-dependent; and d) The estimation of age- and stage-specific breast cancer survival functions for Catalonia using the US functions and the estimated HR.
Hazard ratios (HR) of breast cancer mortality, Catalonia versus USA
Our goal was to obtain age- and stage-specific Catalan breast cancer survival functions. But the number of breast cancer cases in the GCR precluded estimation of these functions from the GCR data. Thus, we looked for a relation between the USA and the GCR
(
t) at the disease stage level, the hazard ratio (HR).
Since the size of the US population covered by SEER and the Girona province populations were very dissimilar, we used the USA data as reference data. First, we used the GCR relative cumulative survival and the number of all-cause deaths to estimate the number of deaths due to breast cancer (D
bc
) in the GCR at each year interval [t, t + 1) after diagnosis (see first section in the Appendix for further details).
Second, we multiplied the
by the GCR
person-years at risk to estimate the
expected number of breast cancer deaths (
expected
bc
) in the GCR, if the hazard rates were those of the USA (see second section in the Appendix). Third, we fitted a Poisson regression model with dependent variable
and with the
as an
offset variable. This corresponds to a proportional hazards model, with the usual Cox baseline hazard
λ
0 replaced by the known hazard function from the US data. When the proportional hazards assumption was not fulfilled we included a function with time from diagnosis as a covariate. The
log of time (
log(
t)) was the function that worked best. We included it in the models when it was statistically significant [
18].
We checked for overdispersion by fitting negative binomial regression models and assessing the significance of the extra variation parameter. The Poisson model was adequate in all cases. In addition, we assessed the goodness-of-fit of the model using the deviance. All the final models had non-significant p-values (> 0.05) for the deviance χ
2 test. Degrees of freedom were equal to the number of observations minus 1, or minus 2 if time was included as a covariate.
The following equations reflect the Poisson model and the expression used to obtain the estimated HR:
Then, the stage-specific hazard ratios were obtained as:
The model was fitted for the first 14 years after breast cancer diagnosis, because after 14 years of follow-up the number of breast cancer deaths approaches 0. After year 14 we assumed that the HR was constant and equal to the year 14 estimated value.
Ninety-five percent confidence intervals (95% CI) for the HR were obtained using the expression:
exp(log(HR) ± 1.96 Var(log(HR))1/2)
When the HR was constant over time (log(HR) = β
0), the 95% CI for the HR were estimated as:
exp(β
0 ± 1.96 Var(β
0)1/2)
When the HR was time-dependent (HR(t)), the variance of the log(HR(t)) was obtained using the expression:
Var(log(HR(t))) = Var(β
0) + log(t)2 Var(β
1) + 2 log(t) Cov(β
0, β
1)
Then, we used expression (4) to obtain the 95% CI of the HR.
For the period of time prior to the dissemination of mammography (background), we used the 1975–1979 US hazard rates and the 1980–1989 GCR person-years at risk to estimate the GCR breast cancer deaths (expected
bc
). We performed this analysis by historical stage of disease (localized, regional, distant). For the recent period, we used the 1990–2001 data from the GCR and the 1975–79 data for the USA, using the AJCC five disease stages at diagnosis (I, II-, II+, III, IV).
The following scenarios provided the periods chosen for the comparisons:
-
C1: Catalonia 1980–89 vs the USA 1975–79
-
C2: Catalonia 1990–2001 vs the USA 1975–79
-
C3: Catalonia 1990–2001 vs the USA 1990–2001
Comparisons
C1 and
C2 provide the HR needed to perform estimations of the Catalan breast cancer age-and stage-specific
λ(
t) functions for
background and for
recent time periods by using
(
t) in 1975–79.
Additionally, from the HR obtained in C1 and C3, we can assess the differences in breast cancer survival between the USA and Catalonia, in the recent past, before the dissemination of screening with mammography.
Estimation of age and stage-specific Catalan survival functions: λ, cumulative survival, and pdfs
We used the age- and stage-specific USA
functions in the period 1975–79 and the HR by stage from
C1 and
C2 to estimate the age- and stage-specific Catalan
functions for the periods 1980–89 and 1990–2001, respectively. We used the same HR for all age groups within disease stages:
To obtain the AJCC
(
t) for Catalonia in 1980–89 we applied the historical stage
localized HR to the USA hazard rate functions for AJCC stages I and II-. We used the historical stage
regional HR for AJCC stages II+ and III, and the
disseminated HR for the AJCC stage IV.
We also assessed whether the
(
t) functions, estimated using expression (7), fit the data well by using the goodness-of-fit deviance statistic. We obtained the deviance between the estimated breast cancer deaths in the GCR and the predicted number of deaths obtained assuming a Poisson distribution with parameter
(
t) with the offset term being the number of
person-years at risk. In all cases the p-values of the
χ
2 deviance tests were higher than 0.01, except for C1-Disseminated, C2-I, C2-II+, and C3-II+.
To estimate confidence bands for the variance of
we used the delta method [
19]. First, we applied the logarithmic transformation:
Then, assuming independence of the hazard ratios and the USA hazard rates:
The term
was obtained from the Poisson regression models and the term
was obtained using the delta method as follows. The delta method stays that if we want to approximate the variance of
G(
X) where
X is a random variable with mean
μ and
G is differentiable, then
G(X) = G(μ) + (X - μ)G' (μ)
and
Var(G(X)) = Var(X) * [G' (μ)]2
with
being the variance obtained by bootstrap as described in subsection
Hazard functions of the
Methods section.
Cumulative survival is related to the hazard rate through
. Thus, the cumulative survival function in Catalonia can be estimated as:
Once the cumulative survival function
S(
t) is estimated, the survival
can be obtained from the relation [
17]:
pdf (t) = -dS(t)/dt