Skip to main content
Erschienen in: Population Health Metrics 1/2004

Open Access 01.12.2004 | Research

Estimating age conditional probability of developing disease from surveillance data

verfasst von: Michael P Fay

Erschienen in: Population Health Metrics | Ausgabe 1/2004

Abstract

Fay, Pfeiffer, Cronin, Le, and Feuer (Statistics in Medicine 2003; 22; 1837–1848) developed a formula to calculate the age-conditional probability of developing a disease for the first time (ACPDvD) for a hypothetical cohort. The novelty of the formula of Fay et al (2003) is that one need not know the rates of first incidence of disease per person-years alive and disease-free, but may input the rates of first incidence per person-years alive only. Similarly the formula uses rates of death from disease and death from other causes per person-years alive. The rates per person-years alive are much easier to estimate than per person-years alive and disease-free. Fay et al (2003) used simple piecewise constant models for all three rate functions which have constant rates within each age group. In this paper, we detail a method for estimating rate functions which does not have jumps at the beginning of age groupings, and need not be constant within age groupings. We call this method the mid-age group joinpoint (MAJ) model for the rates. The drawback of the MAJ model is that numerical integration must be used to estimate the resulting ACPDvD. To increase computational speed, we offer a piecewise approximation to the MAJ model, which we call the piecewise mid-age group joinpoint (PMAJ) model. The PMAJ model for the rates input into the formula for ACPDvD described in Fay et al (2003) is the current method used in the freely available DevCan software made available by the National Cancer Institute.

Background

Fay, Pfeiffer, Cronin, Le, and Feuer [1] showed how to calculate the age-conditional probabilities of developing a disease (ACPDvD) from registry data. Throughout this paper we use "cancer" as our disease of interest, but the method applies to specific types of cancer as well as other diseases where information is collected by population based surveillance methods. Fay et al [1] provided a formula (see equation 1 below) to calculate ACPDvD after inputing the rate function by age of (1) first incidence of cancer per person-years alive, (2) death from cancer per person-years alive, and (3) death from other causes per person-years alive. Fay et al [1] used a simple piecewise constant model for the three rate functions, which have constant rates within each age group.
Here we detail two more complicated models for the rates. The first model is a segmented regression model or joinpoint model for the rates, where the rate function is a series of linear functions that join at the mid-points of the age groups, and the rate function is constant before the first mid-point and after the last "mid-point" (because the last interval goes to infinity, the last "mid-point" is not really a mid-point at all, see below). We will call this model the MAJ (mid-age group joinpoint) model for the rates. In Figure 1 we show how both the piecewise constant model and the mid-age group joinpoint model apply to all invasive cancer incidence from the Surveillance Epidemiology and End Results (SEER) program of the U.S. National Cancer Institute in 1998–2000. Figure 1 uses the SEER 12 registries which cover about 14 percent of the U.S. population, covering 5 states (Connecticut, Hawaii, Iowa, New Mexico, Utah), 6 metropolitan areas (Atlanta, Detroit, Los Angeles, San Francisco-Oakland, San Jose-Monterey, Seattle-Puget Sound) and the Alaska Native Registry (see [2]). Similar graphs showing the MAJ model can be made for the other rates required in the calculations, death from cancer and death from other causes per person-years alive.
Notice that the MAJ model gives a more smoothly changing and probably a better modeled rate. The only place where the MAJ model may not perform better than the piecewise constant model is at peaks or valleys, where there may be some bias. In Figure 1 we see that the smoothness of the MAJ appears to produce more plausible estimates for ages 0 through 85 and from ages 90 and above, and the only age group with a noteworthy bias problem is 85 to 90. Thus, for almost all of the age range the MAJ model is more plausible.
A problem with the mid-age group joinpoint model is that it requires numeric integration for its calculation. The second model uses a series of piecewise constant values to approximate the mid-age group joinpoint model. We call this second model the PMAJ (piecewise mid-age group joinpoint) model. The PMAJ does not require numeric integration, so it is much faster than the MAJ model. The PMAJ model is a piecewise constant model that only differs from the piecewise constant model of Fay et al [1] in that the pieces are smaller and the corresponding values of the rates are motivated by the MAJ model. Starting with version 5.0, the freely available DevCan software [3] uses the PMAJ method. (There was a small calculation error in versions 5.0 and 5.1 that has been corrected in version 5.2). DevCan calculates ACPDvD or age conditional probability of dying from a disease for U.S. cancer data or for user supplied data.
The outline of this paper is as follows. The review and overview section reviews the issues in estimating the age conditional probability of developing disease from surveillance data. This section includes a motivation for using this type of statistic to describe population data. The review and overview section additionally gives graphical descriptions of the MAJ and PMAJ methods. The paper is structured so that readers not interested in the details may skip the next two sections and the appendix, which give precise and notationally involved definitions of the MAJ estimators. The examples and discussion section gives examples of the estimator of ACPDvD using three different methods for estimating the rates, the simple piecewise constant method proposed in Fay et al [1], the MAJ method, and the PMAJ method. In supplimental material [see Additional file 1] we compare the PMAJ method with the method of Wun, et al [4], since the latter method was the method used in versions of the DevCan software before version 5.0.

Review and overview

Consider a surveillance program like the SEER program of the U.S. National Cancer Institute. This program attempts to count every incidence of cancer within the catchment area of the program. Because cancer is a disease in which the rates of the disease are highly dependent on age, in order to give interpretability to the counts within the SEER registries, we must somehow account for the age distribution in the popoulation.
One simple and popular statistic is the age adjusted rate or directly standardized rate (DSR). In the SEER Cancer Statistics Review [2] DSRs are used to compare different cancer sites, trends on specific cancer sites over time, and rates by sex and race. The DSR is calculated by a simple weighted sum of the age specific rates for each 5 year age group, where the weights are proportional to the U.S. 2000 population. Thus, the DSR may be interpreted as the rates adjusted as if all the populations being compared had age distributions similar to the U.S. 2000 population. The DSRs are useful for gaining an overall picture of how the incidence and mortality of each cancer effects different populations (e.g., different races, SEER population at different times), while controling for the effect of differing age distributions between populations being compared. A disadvantage of the DSR is that it is hard to relate to an individual's risk. For example, Table I-4 of the SEER Cancer Statistics Review, 1975–2000 [2] states that the DSR for breast cancer for females for the years 1996–2000 is 135 per 100,000 person-years. The average American woman may wonder, how does that relate to my risk? Will I be likely to get breast cancer in my lifetime? If I am 40 years old now, what is my risk of getting breast cancer in the next 10 years given that I have survived to this old without getting it? These questions are the motivation for using the age conditional probability of developing disease (ACPDvD), and in order to estimate the ACPDvD for female breast cancer, we require information not only about the rate of female breast cancer but also about the rates of dying from female breast cancer and dying from other causes.
The ACPDvD uses cross-sectional incidence and mortality rates to estimate the age-conditional probabilities of developing disease in a hypothetical cohort in which we assume the age specific rates do not change over time. This gives a personal interpretation to the cross-sectional data, allowing statements like the following: if the incidence and mortality rates remain at their present values (as observed in SEER 12, 1998–2000), then a female born today would have a 13.5% chance of developing breast cancer over her lifetime (see Table 2). We can also calculate ACPDvD over intervals. For example, a female who has reached 40 years old without developing breast cancer has a 1.5% chance of developing breast cancer by the time she is 50.
Table 2
Age Conditional Probability of Developing Different Types of Invasive Cancers (in Percent) from SEER 12, 1998–2000
Start Age
End Age
Model
All Invasive (Both Sexes)
Prostat(Male)
Breast (Female)
Acute Lymphocytic Leukemia (Both Sexes)
0
20
Piecewise const
0.3158
0.0009
0.0015
0.0669
  
PMAJ, interval = .5
0.3260
0.0011
0.0021
0.0633
  
MAJ
0.3260
0.0011
0.0021
0.0633
0
50
Piecewise const
4.0690
0.2002
1.9188
0.0837
  
PMAJ, interval = .5
4.1657
0.2550
1.9492
0.0808
  
MAJ
4.1657
0.2550
1.9492
0.0808
40
50
Piecewise const
2.5260
0.2032
1.5131
0.0053
  
PMAJ, interval = .5
2.5976
0.2579
1.5169
0.0055
  
MAJ
2.5975
0.2579
1.5169
0.0055
0
Inf
Piecewise const
42.0876
17.4952
13.6471
0.1154
  
PMAJ, interval = .5
41.7547
17.3375
13.5477
0.1121
  
MAJ
41.7574
17.3389
13.5485
0.1121
60
61
Piecewise const
1.2340
0.5989
0.3822
0.0009
  
PMAJ, interval = .5
1.0852
0.4946
0.3627
0.0009
  
MAJ
1.0852
0.4946
0.3627
0.0009
64
65
Piecewise const
1.2758
0.6131
0.3872
0.0009
  
PMAJ, interval = .5
1.4453
0.7440
0.4045
0.0010
  
MAJ
1.4453
0.7440
0.4045
0.0010
60
65
Piecewise const
6.0331
2.9128
1.8777
0.0042
  
PMAJ, interval = .5
6.0622
2.9492
1.8758
0.0044
  
MAJ
6.0622
2.9492
1.8759
0.0044
Calculation of the ACPDvD is somewhat complicated, and we describe the complications in relation to the simple DSRs. Consider first the age specific incidence rates which are used to calculate the DSRs. These rates simply count the number of incident cases of a particular disease (e.g., female breast cancer) within each age group and divide by the total number of person-years estimated by the population. For counts of a single year, the person-years are estimated by the mid-year population of the catchment area (for sex-specific cancers like prostate cancer or female breast cancer, we only use the population of the appropriate sex). Note that the incident cases may include individuals who have previously been diagnosed with the cancer and have developed a new primary cancer.
For the ACPDvD for any specific disease we would like the rate of first incidence per person-years alive and disease-free. Thus, there are two difficulties, (1) the usual age specific incidence rates include persons with multiple primary cancers, and (2) the denominators include persons who have previously been diagnosed. Merrill and Feuer [5] discuss both difficulties and adjust for them creating risk-adjusted cancer incidence rates. Merrill and Feuer [5] study the effect of these adjustments for several cancer sites. To handle the first difficulty, (similar to [5]) we can remove cases where we have a record of a previous diagnosis of that particular type of cancer. Because the registries in SEER were not all begun at the same time, to avoid bias the DevCan program only searches the records for previous cancers back until the year when the last registry was added. This year is denoted the follow-back year. (If the disease of interest is any malignant cancer, then the difficulty is handled differently. Although at each cancer record we do not record what specific types of cancers were previously diagnosed for the person, we do know whether any tumors were previously diagnosed. Thus, if the disease of interest is any malignant cancer and if the record states there was a previously diagnosed tumor, then we assume that the previously diagnosed tumor was malignant, and do not count that case as a first incidence.) To handle the second difficulty, the additional person-years in the denominator, Merrill and Feuer [5] adjust the denominator by multiplying the age-specific population by 1 minus an estimate of the prevalence of the disease in the population. Merrill and Feuer [5] also estimate the prevalence of medical procedures which remove individuals from the at-risk population, such as hysterectomy which removes the risk of uterine cancers.
In calculating the ACPDvD we use only first incident of the disease of interest as in [5], but we correct for the denominators in a different way using an assumption and some mathematics from the theory of competing risks. This second correction is detailed with precise mathematical notation in Fay et al [1]; here we give more heuristic arguments.
In the following let the disease of interest be "cancer". The ACPDvD between ages x and y, given alive and cancer-free at age x, may be written as the fraction,
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equa_HTML.gif
To calculate the numerator, we integrate over the probability that the first cancer occurred at exactly age a. In math notation this probability is
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equb_HTML.gif
where f c (a) is a probability function representing the probability that the first cancer occurred at exactly age a. One key result described in Fay et al [1] is that f c (a) can be written as the product of two functions,
λ c (a) = the probability that the first cancer occurred at exactly age a, given the individual is alive just before age a, and
S a (a-) = the probability that the individual is alive just before age a.
The function λ c (a) is known as a cause-specific hazard function, and it is estimated by some function of the age-specific rates, such as the piecewise constant model of Fay et al [1] or the MAJ model introduced in this paper (see Figure 1). Using standard results for continuous survival data, we can write S a (a-) as
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equc_HTML.gif
where λ a (u) ( = the probability that the individual died at age u, given the individual is alive just before age u) is the usual hazard function. We estimate λ a (u) using some function of the age-specific rates. Thus, the numerator can be written as
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equd_HTML.gif
If we use the MAJ for both hazard functions, then there is no closed form solution. To see why this is so, note that within the exponential, the integral of a piecewise linear function is the sum of a series of quadratic functions, and the overall integral has no closed form solution. This problem motivates the piecewise mid-age joinpoint (PMAJ) model, where we use a series of piecewise constant functions to approximate the MAJ model. Figure 2 gives the PMAJ model together with the piecewise constant model used by Fay et al [1] for 70 to 90 year olds from the SEER 12, 1998–2000 rates for all invasive (first) cancer incidence rates per person-years alive. Remember, although both Figure 1 and Figure 2 plot incidence rates, we additionally need similar rate functions for mortality rates to calculate the ACPDvD.
Now consider the denominator of the ACPDvD, the probability of being alive and cancer-free at age x, denoted https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq1_HTML.gif . For reference, in Table 1 we give the notation. The only change from the notation in Fay et al [1] is that we use the subscript a to represent all causes of events instead of a blank subscript. For example, we let S*(u) = https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq2_HTML.gif . Other notation in this paper is defined as it is introduced. Fay et al [1] assumed that the risk of death from other causes does not change if you have previously been diagnosed with cancer, then used the key result mentioned above together with some algebra and calculus to derive the denominator. Then the ACPDvD between the ages of x and y given alive and cancer-free just before age x is
Table 1
Notation
Random Variables and Parameters
T = age at death
T* = age at first cancer or death before cancer
J = type of death
J* = type of event
   (J = d) = death from cancer
   (J* = c) = first cancer
   (J = o) = death from other causes
   (J* = o) = death before first cancer
λ c (t) = rate at t for first cancer given alive
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq3_HTML.gif = rate at t for first cancer given alive and cancer-free
λ o (t) = rate at t for death before cancer given alive
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq4_HTML.gif = rate at t for death before cancer given alive and cancer-free
λ d (t) = rate at t for death from cancer given alive
 
λ a (t) = rate at t for death given alive
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq5_HTML.gif = rate at t for first cancer or death before first cancer given alive and cancer-free
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq6_HTML.gif
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq7_HTML.gif
Observations
Within the age interval, [a i , a i +1), and within the calendar interval of interest we observe...
c i = number of first cancer incident cases
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq8_HTML.gif = estimate of person-years alive associated with j = c, d, o (DevCan uses the sum of mid-year populations during the calendar interval of interest)
d i = number of cancer deaths
 
o i = number of other deaths
 
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Eque_HTML.gif
The details of the MAJ and the PMAJ models are given in the next two sections.
Readers only interested in the practical ramifications of the choice in models may skip to the examples and discussion section.

Mid-age group joinpoint estimator

In Fay et al [1], the rates were estimated by a piecewise constant model. Here we use a mid-age group joinpoint (MAJ) model, where we draw lines connecting the midpoints of the intervals except the first and last interval. The first interval is constant until the midpoint, and the last interval is constant after a nominal "midpoint". This nominal "midpoint" is half the length of the previous age interval from the beginning of the last interval, and would be the midpoint if the last age interval was the same length as the previous interval.
We introduce new notation for breaking up the ages. Fay et al [1] used 0 = a 0 <a 1 < ··· <a k <a k+1= ∞. Here we use a joinpoint model with joins at the midpoints (and nominal midpoint),
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equf_HTML.gif
Let
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equg_HTML.gif
(The indices start at -1 so that the index values for the rate estimators,
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equh_HTML.gif
, match up with the count notation of [1].) The MAJ estimator for the rate of event j (for j = c, d, or o) at t i (for i = 0,1,..., k) is
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equi_HTML.gif
where j i is either c i , d i , or o i as defined in Table 1. (Note that https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq9_HTML.gif , where https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq10_HTML.gif is the piecewise constant function used in [1]). We define https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq11_HTML.gif and https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq12_HTML.gif . For j = a, MAJ estimator for the rate at t i is
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equj_HTML.gif
Then for t ∈ [t i , t i+1) for i = 1,..., k, we define https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq13_HTML.gif as the point on the line defined by connecting the points (t i , https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equh_HTML.gif ) and (t i+1, https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq14_HTML.gif ). In other words,
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equk_HTML.gif
Where
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equl_HTML.gif
and
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equm_HTML.gif
Thus, α j,-1= https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq15_HTML.gif and β j,-1= 0, and similarly by taking limits as t k+1→ ∞ then α j,k = https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq16_HTML.gif and β j,k = 0.
Now
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equn_HTML.gif
for u ∈ [t i , t i+1) is
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equo_HTML.gif
Note that (for ℓ = 0,1,..., k)
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equp_HTML.gif
so that for i = 0,1,...,k,
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equq_HTML.gif
Also notice that (when u < ∞)
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equr_HTML.gif
Therefore when u ∈ [t i , t i+1),
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equs_HTML.gif
Let
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equt_HTML.gif
(x, y) be the estimator of A(x, y) using the MAJ model. The two integrals we need to estimate for https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equt_HTML.gif (x, y) are of the type,
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equu_HTML.gif
where in the numerator of
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equt_HTML.gif
(x, y) we need https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq17_HTML.gif (i.e., j = c and h = a in equation 7), and in the denominator of https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equt_HTML.gif (x, y) we need https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq18_HTML.gif . Suppose, without loss of generality, that t ∈ [t i ,t i+1), then
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equv_HTML.gif
where R j,h (t , v) (for ℓ = - 1,0,1,2,..., i and vt ℓ+1) is defined implicitly (see the Appendix). Then,
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equw_HTML.gif

Piecewise mid-age group joinpoint estimator

In the MAJ model we divided up the age line into k + 2 intervals. Here we define those intervals in both the t i notation and the a i notation.
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equx_HTML.gif
In the MAJ model the rates for the first and the last intervals are represented by lines with zero slope, and the rates for the i th interval (i = 1,...,k) for the j th rate type (j = a, c, d, o) is a line defined by connecting the points (t i-1, https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq19_HTML.gif ) and (t i , https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equh_HTML.gif ) (see equations 2 and 3 for definition of https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equh_HTML.gif ). In the PMAJ model we divide the i th interval into m i equal sized intervals, and use a piecewise constant estimate on each of those m i intervals. One way to define m i is to chose m i so that each equal sized interval is 1/2 year long. In other words, m i = 2(t i - t i-1). This is the definition of m i that we use for the DevCan software (starting with version 5.0, see [3]), but all the following holds for arbitrary m i . In Figure 2 we show the PMAJ model with half-year intervals and the piecewise constant model for the US all invasive cancer mortality rates for ages 70 through 90 years.
Here are the details. Consider the h th (for h = 1,..., m i ) of the m i intervals within interval i (for i = 1,...,k) for rate type j (for j = a, c, d, o). This interval is
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equy_HTML.gif
For convenience we introduce new notation for the ends of this interval, let
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equz_HTML.gif
so that t i-1,0= t i-1and https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq20_HTML.gif = t i . At the beginning of this interval the value of the rate is
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equaa_HTML.gif
(see equations 4 and 6 for definitions of α j,i-1and β j,i-1). Similarly at the end of this interval the rate is
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equab_HTML.gif
For the PMAJ model we simply assume a constant rate equal to the average of the beginning and the end values of the rate over this interval. In other words, under the PMAJ model for any t ∈ [t i-1,h-1 ,t i-1,h ) we estimate the rate with
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equac_HTML.gif
Since the PMAJ model is a piecewise model, we can use Appendix A of [1] to express the estimator of age conditional probability of developing cancer. The only hard part is correctly defining the starting and ending of each piecewise interval. The ends of these intervals are
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equad_HTML.gif
For convenience write these interval ends with only a single index as
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equae_HTML.gif
where
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equaf_HTML.gif
and m 0 = 1. In other words, t -1 = τ 0 and for i = 0,1,..., k, then t i = τ g (i) and t i,h = τ g (i)+h, where https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq21_HTML.gif .
Now we can follow very similar notation to Appendix A of [1]. We now repeat that Appendix with the modifications to notation required for the PMAJ model. Let the estimator of A(x,y) under the PMAJ model be denoted https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq22_HTML.gif (x,y). Let τ i x <τ i+1and τ j <yτ j+1for x <y,ij, and jM + 2. For convenience we regroup the ages after inserting group delimiters at x and y. Let the new delimiters be 0 = b 0 ≤ b 1b 2 ≤ ··· ≤ b M+3= ∞ where b 0 = τ 0,..., b i = τ i , b i+1= x, b i+2= τ i+1,..., b j+1= τ j , b j+2= y, b j+3= τ j+1,..., b M+3= τ M+1= ∞. We let
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equag_HTML.gif
and similarly
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equah_HTML.gif
and https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq23_HTML.gif . In this notation, the probability of developing cancer by age y given survival until age x is A(x, y) = A(b i+1, b j+2), and under the PMAJ model we estimate it with
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equai_HTML.gif
Because
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equaj_HTML.gif
or https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq24_HTML.gif may equal zero and b ℓ+1 may equal infinity, we let https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq25_HTML.gif . These integrals are
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equak_HTML.gif
where the case λ = 0 and b ℓ+1 = ∞ is one of the "impossible" hypothetical cohorts (see Section 3.1 of [1]). Thus, we obtain,
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equal_HTML.gif

Examples and discussion

In this section we explore several different methods for estimating the rate functions, all using the formula of Fay et al [1] (e.g., all using equation 1). This comparison explores the differences between the piecewise constant method proposed in Fay et al [1], the PMAJ method, and the MAJ method. A different comparison emphasizing differences between versions of the DevCan software is described in the supplemental material [see Additional file 1].
For all of the examples we use data from 1998–2000 [6]. The incidence data come from the Surveillance, Epidemiology, and End Results (SEER) program of the (U.S.) National Cancer Institute, and mortality data from the (U.S.) National Center for Health Statistics. We use the SEER 12 registries which cover about 14 percent of the U.S. population. We only use the mortality data covering the same area as the SEER 12 registries cover. Because the SEER 12 registries have complete coverage only back through 1992, we only look back in the database until 1992 to delete any incident case that had previously been diagnosed with the cancer of interest. These incident cases are deleted so that they are not counted when estimating the counts of first cancer incidence (the c i values). The mid-year population estimates (the n i values) come from the sum U.S. Census estimates of mid-year populations from 1998, 1999, and 2000 for the SEER 12 catchment areas for the appropriate sex group (e.g., males for prostate cancer).
In Table 2 we show the results for all invasive cancers and acute lymphocytic leukemia for both sexes, prostate cancer for males, and breast cancer for females. We see the PMAJ values approximate the MAJ values very well.
In conclusion, we have described several methods for estimating rates for input into a formula to calculate ACPDvD, and we have shown that the PMAJ method provides fast and reasonable estimators for the rates.

Appendix: Calculation of R function

Recall that R j,h (t , v) represents an integral with 4 parameters. We can write it as
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equam_HTML.gif
To simplify notation substitute let t = u and α j = α j j= b j ,α h= a h , and β h= b h .
Thus,
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equan_HTML.gif

Case 1: b j = 0 and b h = 0

For our application, whenever v → ∞ then b j = 0 and b h = 0, so this is an important special case.
When b j = 0 and b h = 0 and a h = 0 and we obtain
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equao_HTML.gif
which goes to ∞ when v → ∞.
When b j = 0 and b h = 0 and a h ≠ 0 and we obtain
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equap_HTML.gif
which goes to a j /a h when v → ∞.

Case 2: General Case with v< ∞

To calculate the integral, R(u, v, a j , b j , a h , b h ) for finite v, we can use an adaptive use of Romberg's algorithm for numeric integration (we follow closely Lange [7], pp. 210–211).
Let
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equaq_HTML.gif
Divide the interval [u, v] into n equal subintervals of length (v - u)/n, and let
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equar_HTML.gif
Then lim n→∞ T n = R(u, v, a j , b j , a h , b h ).
A more accurate approximation uses Romberg's algorithm,
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equas_HTML.gif
Let
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equat_HTML.gif
be our estimate of R. The algorithm we use to calculate https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equat_HTML.gif is as follows:
1. Choose n.
2. Calculate T n .
3. Calculate T 2n .
4. For i = 1 to I max do:
• If
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equau_HTML.gif
then let https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_IEq26_HTML.gif and stop.
• Otherwise calculate
https://static-content.springer.com/image/art%3A10.1186%2F1478-7954-2-6/MediaObjects/12963_2003_Article_18_Equav_HTML.gif
, and continue.
For example, one could use n = 100 and δ = 10-5 and I max = 100.

Acknowledgements

I would like to thank Kathy Cronin for suggesting the PMAJ method and thank her and Ram Tiwari for reading and commenting on drafts of this article.
Literatur
1.
Zurück zum Zitat Fay MP, Pfeiffer R, Cronin KA, Le C, Feuer EJ: Age-conditional probabilities of developing cancer. Statistics in Medicine 2003,22(11):1837-1848. 10.1002/sim.1428CrossRefPubMedPubMedCentral Fay MP, Pfeiffer R, Cronin KA, Le C, Feuer EJ: Age-conditional probabilities of developing cancer. Statistics in Medicine 2003,22(11):1837-1848. 10.1002/sim.1428CrossRefPubMedPubMedCentral
4.
Zurück zum Zitat Wum L-M, Merrill RM, Feuer EJ: Estimating lifetime and age-conditional probabilities of developing cancer. Lifetime Data Analysis 1998, 4: 169-186. 10.1023/A:1009685507602CrossRef Wum L-M, Merrill RM, Feuer EJ: Estimating lifetime and age-conditional probabilities of developing cancer. Lifetime Data Analysis 1998, 4: 169-186. 10.1023/A:1009685507602CrossRef
5.
Zurück zum Zitat Merrill RM, Feuer EJ: Risk-adjusted cancer-incidence rates (United States). Cancer Causes and Control 1996, 7: 544-552.CrossRefPubMed Merrill RM, Feuer EJ: Risk-adjusted cancer-incidence rates (United States). Cancer Causes and Control 1996, 7: 544-552.CrossRefPubMed
6.
Zurück zum Zitat Surveillance, Epidemiology, and End Results (SEER) Program http://www.seer.cancer.gov DevCan database: SEER 12 Incidence and Mortality, 1993–2000, Follow-back year = 1992 National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2003, based on the November 2002 sub mission. Underlying mortality data provided by NCHS http://www.cdc.gov/nchs. Surveillance, Epidemiology, and End Results (SEER) Program http://​www.​seer.​cancer.​gov DevCan database: SEER 12 Incidence and Mortality, 1993–2000, Follow-back year = 1992 National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2003, based on the November 2002 sub mission. Underlying mortality data provided by NCHS http://​www.​cdc.​gov/​nchs.
7.
Zurück zum Zitat Lange K: Numerical Analysis for Statisticians Springer:New York 1999. Lange K: Numerical Analysis for Statisticians Springer:New York 1999.
Metadaten
Titel
Estimating age conditional probability of developing disease from surveillance data
verfasst von
Michael P Fay
Publikationsdatum
01.12.2004
Verlag
BioMed Central
Erschienen in
Population Health Metrics / Ausgabe 1/2004
Elektronische ISSN: 1478-7954
DOI
https://doi.org/10.1186/1478-7954-2-6

Weitere Artikel der Ausgabe 1/2004

Population Health Metrics 1/2004 Zur Ausgabe