Background
In epidemiological studies two main measures of interest are the risk of an event occurring (probability) and the rate at which it occurs (hazard) [
1]. Patients will often be at risk from more than one mutually exclusive event and the occurrence of one of these may alter or prevent the probability of any other event occurring [
2]. In this paper we focus on situations where the events are deaths from different causes and so it follows that any event will prevent the others from occurring. In this competing risks scenario, the cause-specific hazard will give the cause-specific mortality rate and the cumulative incidence function will give the proportion of patients at any one time that have died from a particular cause [
3].
There are two main approaches to modelling competing risks [
4]. The first is to model the cause-specific hazards and transform these to obtain the cumulative incidence function. The second is to model the cumulative incidence function directly [
5]. We advocate the first approach as both the cause-specific hazards and the cumulative incidence function can provide a better understanding of risk factors and their effect on the population as a whole [
1]. Cause--specific hazards can inform us about the impact of risk factors on rates of disease or mortality, while the cumulative incidence functions provide an absolute measure with which to base prognosis and clinical decisions on [
6].
Competing risks analyses are being increasingly carried out in epidemiological studies. However, the methodology applied varies and is not always optimal. Often, separate analyses will be carried out for each competing event and only the cause-specific hazard ratios will be reported for each [
7‐
9]. This method is not wrong if the researchers are only interested in the rate of disease or mortality. However, without estimating an absolute measure such as the cumulative incidence function, it is difficult to communicate these results in terms of the impact that risk factors have at a population level. In comparison, other researchers choose to model on the cumulative incidence scale using the Fine and Gray method and, therefore, provide no information on the cause-specific hazards [
10,
11].
In many research papers, the model used to estimate the cause-specific hazards will be different from the model used to estimate the cumulative incidence functions. For example, the cause-specific hazard ratios are reported from a Cox proportional hazards regression model but the cumulative incidence functions are estimated non-parametrically and separately for different subgroups of patient [
12‐
14]. Whilst non-parametric approaches are good for describing the data, there are many advantages for the use of modelling techniques in observational studies when there are a number of covariates that need to be adjusted for.
Many regression models used to estimate cumulative incidence functions will assume proportional hazards. In large epidemiological studies the assumption of proportional hazards is often unreasonable. Therefore, a model that can easily incorporate time-dependent effects is desirable.
In summary, we would like to be able to model competing risks scenarios using the approach that estimates both the cause-specific hazards and the cumulative incidence functions as we believe both to be useful measures. We would like to obtain smooth estimates for both of these measures rather than considering a step function. Finally, we want to be able to incorporate time-dependent effects for one or all of the competing events. Whilst the majority of the above can be addressed within a Cox modelling framework, we feel that parametric models have the advantage of directly estimating cause-specific hazard rates in the model as well as handling non-proportional hazards with ease. For these reasons, we advocate the use of the flexible parametric survival model to obtain both the cause-specific hazards and the cumulative incidence function in a competing risks framework.
Conclusions
We have shown how to estimate both the cause-specific hazards and the cumulative incidence functions using a flexible parametric survival model. This approach provides smooth estimates of the cause-specific hazard and the cumulative incidence function, both of which we consider to be measures of interest. The flexible parametric model can easily incorporate time-dependent effects for one or more of the competing events. We have also illustrated two other useful measures that can be obtained with some simple manipulation of the cause-specific hazard and cumulative incidence estimates.
The flexible parametric proportional hazards model produces very similar estimates to the Cox proportional hazards model in terms of both the cause-specific hazard ratios and the cumulative incidence functions. A further alternative is to use a mixture model for competing risks data as proposed by Larson and Dinse [
4,
33]. However, this approach has two main disadvantages: it is time consuming and the estimated distribution will depend on the length of follow-up [
34].
The confidence intervals obtained through the delta method have been shown to be very similar to those obtained through bootstrapping but have the added advantage of taking considerably less time to compute.
The assumption of proportional hazards is often unreasonable in epidemiological studies. It is important to understand the changing effect of a covariate over the time period rather than just assuming a constant hazard. For example, a treatment may have a large impact on mortality early on in the follow-up period but this effect could diminish as time goes on [
35]. It is, therefore, important to consider methods such as those described in this paper, that can account for time-dependent effects. The flexible parametric model may be criticized as the number and location of the knots are subjective. However, the sensitivity analysis demonstrates that the knot location has very little impact in terms of the cumulative incidence function. Similar results have been reported elsewhere in relation to the sensitivity of the knots [
15,
18,
20,
36].
In this paper we have grouped age into four categories for simplicity whilst illustrating the method. However, it may be preferable to model age continuously using regression splines as has been done in previous papers [
37,
38].
The main advantages of the flexible parametric model are in large studies where time-dependent effects will often play a prominent role. In much smaller studies where there are fewer events there may not always be sufficient information to adequately estimate the underlying hazard using this model.
This paper describes modelling cause-specific hazards and using these to obtain the cumulative incidence function. Alternatively, the cumulative incidence function can be modelled directly using, for example, Fine and Grays subdistribution approach [
5]. This may be useful when interest only lies in obtaining estimates of the cumulative incidence function for one of the competing events. However, if interest lies in visualising the overall probability broken down by specific events, such as those shown in Figure
2, then it should be noted that the direct regression approach does not have a boundary condition and so in some cases the overall probability may exceed one. We believe that the cause-specific approach, as described here, is advantageous for a full understanding of risk factors and real world implications.
Unlike measures of net survival, the cumulative incidence function allows us to present “real world” probabilities where a patient is not only at risk of dying from their cancer but also from any other cause of death. We can also estimate these “real world” probabilities using relative survival [
15]. The advantage of the cause-specific approach is that we can examine more causes of death but this is at the expense of having to rely on cause of death information.
Finally, a user friendly program has been written in Stata to enable users to implement the methodology described in this paper. This command is called stpm2cif and is available from the Statistical Software Components (SSC) archive [
25,
39].
Appendix 2–Stata analysis code for flexible parametric model section of illustrative example. For more information see the Stata help file [38] or the Stata Journal article [30]
***Expand the data so that each patient has 4 rows – one for each cause of death***
expand 4
bysort id: gen cause = _n
***Generate indicator variables for each cause of death along with an overall indicator ***
gen breast = cause==1
gen cancer = cause==2
gen heart = cause==3
gen other = cause==4
gen event = (cause==cod)
***Create interactions between age group and causes***
gen agebreast = agegrp*breast
gen agecancer = agegrp*cancer
gen ageheart = agegrp*heart
gen ageother = agegrp*other
***Create dummy variables for each age cause interaction***
tab agebreast, gen(agebreast)
tab agecancer, gen(agecancer)
tab ageheart, gen(ageheart)
tab ageother, gen(ageother)
***Re-name age cause dummy variables ***
foreach var in breast cancer heart other {
rename age`var'2 age`var'1
rename age`var'3 age`var'2
rename age`var'4 age`var'3
rename age`var'5 age`var'4
}
*** Create interactions between stage and causes***
gen stagebreast = seerhistoricstage*breast
gen stagecancer = seerhistoricstage*cancer
gen stageheart = seerhistoricstage*heart
gen stageother = seerhistoricstage*other
***Create dummy variables for each stage cause interaction***
tab stagebreast, gen(stagebreast)
tab stagecancer, gen(stagecancer)
tab stageheart, gen(stageheart)
tab stageother, gen(stageother)
*** Re-name stage cause dummy variables ***
foreach var in breast cancer heart other {
rename stage`var'2 stage`var'1
rename stage`var'3 stage`var'2
rename stage`var'4 stage`var'3
}
***stset the data to tell Stata we are dealing with survival data***
stset exit, origin(dx) failure(event) scale(365.24) exit(time dx + (10*365.24))
*** Fit a flexible parametric proportional hazards model using stpm2 command***
stpm2 breast cancer heart other agebreast? agecancer? ageheart? ageother? ///
stagebreast? stagecancer? stageheart? stageother?, ///
scale(hazard) rcsbaseoff nocons ///
tvc(breast cancer heart other) initstrata(cause) ///
knotstvc(breast 1.37 2.62 4.70 ///
cancer 1.00 2.95 5.87 ///
heart 1.79 3.87 6.37 ///
other 1.95 3.95 6.46) ///
bknotstvc(breast 0.038 9.96 ///
cancer 0.04 9.96 ///
heart 0.04 9.96 ///
other 0.04 9.96)
***Predict the cumulative incidence functions, the cause-specific hazard rates, the contribution to the total mortality and the contribution to the overall hazard for each covariate pattern using stpm2cif command***
forvalues j = 1/4 {forvalues l = 1/3 {
if `j'! = 1 {
}
if `j'==1 {
}
}
if `l'==1 {
stpm2cif breast`j'`l' cancer`j'`l' heart`j'`l' other`j'`l', ///
cause1(breast 1 agebreast`j' 1) ///
cause2(cancer 1 agecancer`j' 1) ///
cause3(heart 1 ageheart`j' 1) ///
cause4(other 1 ageother`j' 1) haz conthaz contmort
}
if `l'! = 1 {
stpm2cif breast`j'`l' cancer`j'`l' heart`j'`l' other`j'`l', ///
cause1(breast 1 agebreast`j' 1 stagebreast`l' 1) ///
cause2(cancer 1 agecancer`j' 1 stagecancer`l' 1) ///
cause3(heart 1 ageheart`j' 1 stageheart`l' 1) ///
cause4(other 1 ageother`j' 1 stageother`l' 1) haz conthaz contmort
}
if `l'==1 {
stpm2cif breast`j'`l' cancer`j'`l' heart`j'`l' other`j'`l', ///
cause1(breast 1) ///
cause2(cancer 1) ///
cause3(heart 1) ///
cause4(other 1) haz conthaz contmort
}
if `l'! = 1 {
stpm2cif breast`j'`l' cancer`j'`l' heart`j'`l' other`j'`l', ///
cause1(breast 1 stagebreast`l' 1) ///
cause2(cancer 1 stagecancer`l' 1) ///
cause3(heart 1 stageheart`l' 1) ///
cause4(other 1 stageother`l' 1) haz conthaz contmort
}
}
Authors’ contributions
SRH and PCL conceived the project. SRH carried out the analysis and extended the software to enable use of the method. Both authors participated in the interpretation of the results. SRH drafted the paper, which was later revised by both authors. Both authors read and approved the final manuscript.