Population for analysis
The population of women used in this analysis has been described in detail in several previous publications [
1,
15]. Briefly, we excluded women with unknown, inconsistent, or out-of-range reports for height, weight in 1976 or at age 18, age at menarche or menopause or each pregnancy, parity, and duration or type of postmenopausal hormone (PMH) use (n = 42,886). Additionally, women with a simple hysterectomy (and hence unknown age at menopause) (n = 10,301) were excluded. Participants who were ineligible for the study (for example, prevalent cancer in 1976) or no follow-up after 1978 (n = 2,360) were excluded. In the current analysis, women who were premenopausal throughout follow-up were excluded (n = 6,342), but once they became postmenopausal, they could contribute person-time. Overall, 59,812 participants remained for this analysis. These women contributed 750,086 person-years from 1980 to 2000, during which 1,559 incident invasive ER
+ breast cancer cases occurred.
Blood subcohort and nested case-control study
From 1989 to 1990, 32,826 cohort members provided blood samples. Informed consent was obtained from each participant; details about the blood collection methods have been published previously [
16,
17]. Briefly, women arranged to have their blood drawn and shipped with an icepack via overnight courier to our laboratory, where it was processed and archived in liquid nitrogen freezers. Estradiol is stable in cooled whole blood for 24 to 48 hours [
18]. At blood collection, women completed a short questionnaire that included questions on recent use of PMH (within the last 3 months). Follow-up of the blood study cohort was 99% in 2000.
In the current analyses, we used a previously described nested case-control study of sex steroids and breast cancer risk with cases diagnosed after blood collection through 31 May 1998 [
9,
16]. In addition, cases diagnosed up through 31 May 2000 and their matched controls (that is, a 2-year extension of the published report [
9]) are included. At blood collection, cases and controls were postmenopausal, were not recent users of PMH, and had no prior diagnosed cancer (except nonmelanoma skin cancer). Control subjects were matched by age, month/year and time of day of blood collection, and fasting status and had not been diagnosed with breast cancer before the diagnosis date of their matched case. To mimic the larger population used in the risk prediction modeling, only cases and controls meeting the inclusion criteria described above were included (for example, no prior simple hysterectomy). Women were considered postmenopausal if they reported having a natural menopause (for example, no menstrual cycles during the previous 12 months) or had a bilateral oophorectomy. In all, 164ER
+ cases and 346 controls were included.
Description of the risk prediction model
We fit the log-incidence model of breast cancer to incident ER
+cases, as previously described [
1,
15]. We assume that incidence at time,
t(
I
t
), is proportional to the number of cell divisions,
C
t
, accumulated throughout life up to age
t; that is,
I
t
= kC
t
The cumulative number of breast cell divisions is factored as follows:
(2)
Thus,
λ
i
=
C
i+1/
C
i
represents the rate of increase of breast cell divisions from age
i to age
i+1. Log (
λ
i
) is assumed to be a linear function of risk factors that are relevant at age
i. The set of risk factors and their magnitude may vary according to the stage of reproductive life. Details of the representation of the
C
i
are given in [
1,
15]. The overall model is given by:
(3)
t = age
t
o
= age at menarche
t
m
= age at menopause
s
t
= parity at age t
t
i
= age at ith birth, i = 1, ..., s
t
b = birth index =
for parous women, = 0 for nulliparous women
b
it
= 1 if parity is greater than or equal to i at age t (otherwise, b
it
= 0)
m
A
= 1 if natural menopause (otherwise, m
A
= 0)
m
B
= 1 if bilateral oophorectomy (otherwise,m
B
= 0)
bbd = 1 if breast disease is benign (otherwise, bbd = 0)
fhx = 1 if there is a family history of breast cancer in mother or sister (otherwise, fhx = 0)
pmh
A
= number of years on oral estrogen
pmh
B
= number of years on oral estrogen and progestin
pmh
C
= number of years on other types of PMHs
pmh
cur,t
= 1 if current user of postmenopausal hormones at age t (otherwise, pmh
cur,t
= 0)
pmh
past,t
= 1 if past user of postmenopausal hormones at age t (otherwise, pmh
past,t
= 0)
BMI
j
= body mass index (BMI) at age j (kg/m2)
alc
j
= alcohol consumption (grams) at age j
h = height (inches)
The general rationale for a log-incidence model is that the number of precancerous cells increases multiplicatively with time but that historical exposures differentially affect the rate of increase. Specifically, for breast cancer, the number of precancerous cells is assumed to increase annually at the rate of exp(β
0) prior to menopause for nulliparous women, at the rate of exp(β
0 + β
1s) prior to menopause for parous women with parity = s, and so forth. Finally, the number of precancerous cells increases immediately after the first birth by exp [β
2(t1 - t0)]. The incidence rate of breast cancer is assumed to be approximately proportional to the number of precancerous cells.
The log-incidence model was fit using iteratively reweighted least squares with PROC NLIN in SAS (SAS version 6.12; SAS Institute Inc., Cary, NC, USA) (1996). The parameters of the model are readily interpretable in a relative risk (RR) context. For example, exp(-β
0) = RR for a 1-year increase in age at menarche among nulliparous women, exp [-(β
0 + β
2)] = RR for a 1-year increase in age at menarche among parous women, and so forth. In this analysis, women were followed until they had an event (ER+ breast cancer) or were censored if they developed (a) ER- breast cancer, (b) breast cancer in which ER status is unknown, or (c) other types of cancer except nonmelanoma skin cancer or (d) if they died.
Imputation and inclusion of estradiol in the risk prediction model
Ideally, we would have estradiol levels measured on each main study participant at several points in time. However, since this was not possible, we used an indirect approach to impute estradiol. Let x = estradiol and z = other covariates in the risk prediction model.
From the main study, we can obtain Pr(D|
z) given under the rare disease assumption by:
(4)
We want to estimate Pr(D|x,z), where under the rare disease assumption
Pr(D|x,z) ≅ exp(α* + β*z + δ*x).
From the blood study, we can estimate
δ* based on conditional logistic regression. Indeed, in principle, we could also estimate
β* from the blood study, but the estimates will be very imprecise due to the small sample size. Therefore, we used the main study population to estimate the parameters in Equation
5 by estimating
x for all subjects in the main study based on a linear regression derived from the blood study:
x = α
o
+ γ
o
y + γ z
imp
+ e,
where x = ℓn (estradiol) as a continuous variable, y = 1 if case and 0 if control, and Z
imp
= a subset of the other covariates Z in the risk prediction model. Z
imp
was ascertained by first forcing in y and then using stepwise-up regression to determine the subset of components of Z in the main study which were significantly associated with x at the 5% level.
In the blood study, estradiol levels on average were higher for cases than controls. The rationale for including
y as a covariate in Equation
6 is to account for this relationship in the main study as well. In addition, because there is substantial overlap between the estradiol distribution of cases and controls, we used an imputation strategy to estimate
x by adding error to the prediction such that for each main study participant we obtain
(7)
where (a)
e
i
= an error term that is normally distributed with mean 0 and variance σ
2, (b) y
i
= 1 if a breast cancer case and = 0 otherwise, (c) σ
2 is estimated from Equation
6, and
e is obtained by the RANNOR function of SAS so as to add error to the estimate of
x for individual women. We then fit the model in Equation
5 using
instead of
x, thus obtaining the model
(8)
Since the parameter estimates in Equation
8 may be influenced by the random error introduced in Equation
7, we repeated this imputation approach four additional times and used multiple imputation [
19] to combine estimates from the separate imputations to obtain an overall estimate.
To assess the additional predictive power of serum estradiol, we computed age-specific (5-year age groups) deciles of the risk function without estradiol (model A) as well as including imputed estradiol (model B). From the cross-classification of risk decile model A × risk decile model B, we then compared the observed number of cases in specific risk deciles of model B with the expected number of cases within strata defined by model A risk decile. Specifically, let X
ij = the number of breast cancer cases, N
ij = the number of person-years, and p
ij = X
ij/N
ij, which is the estimated incidence rate within the ith age-specific risk decile for model A and the jth age-specific risk decile for model B, and let ln(p
ij) = α
i + β(j - 1). 100% × [exp(
)-1] is an estimate of the percentage increase in breast cancer incidence for an increase of one model B risk decile, holding the model A risk decile constant [
20]. We wish to test the hypothesis H
0: β = 0 versus H
1: β ≠ 0. This approach of cross-classifying individuals by two different risk prediction rules is similar to the reclassification table approach used to compare risk prediction rules in the Framingham Heart Study [
21]. In addition, to assess the predictive ability of our risk prediction models, we used the area under the receiver operating characteristic curve (that is, the concordance or C statistic). This statistic ranges from 0.5 to 1.0 and represents the probability that, for a randomly selected pair of women, one with ER
+ breast cancer and one without breast cancer, the woman with ER
+ breast cancer has the higher estimated disease probability. Also, we compared the C statistic for different risk prediction rules [
22]. In our primary analysis, we evaluated the addition of imputed estradiol levels to risk prediction models in the entire cohort. As a secondary approach, we calculated Rosner and Colditz model risk scores in the entire cohort and then, in the nested case-control data set, assessed the impact of adding this score to the plasma estradiol and breast cancer model.