Background
Prognostic and treatment predictive factors in breast cancer (e.g. number of positive lymph nodes, age at diagnosis, tumor size, estrogen receptor (ER) and progesterone receptor (PgR), histological grade, and human epidermal growth factor receptor type 2 (HER2)) can predict clinical outcome and hence facilitate treatment choice [
1,
2]. These factors can either be used individually or combined in indices such as e.g. the Nottingham Prognostic Index [
3],
CancerMath.net, Adjuvant! Online (
http://cancer.lifemath.net) [
4] or the St Gallen subtypes [
5]. Prognostic factors are often continuous or measured on an integer-valued scale, but categorized for clinical decision-making. This application of prognostic factors in breast cancer has a long history dating back to the invention of the TNM classification system. Categorization of prognostic factors is intuitively appealing, since the clinically relevant question is often to select between a limited number of treatment modalities, but categorization of individual factors is not necessary for construction of useful prediction models [
6]. On the contrary, numerous authors have discussed its negative consequences [
7‐
9]. Categorization will in general lead to loss of information and hence lower power to detect true associations to prognosis and/or treatment response. To use dichotomized factors in prognostic models corresponds to assuming threshold effects and such effects are often biologically implausible. The use of multiple cut-points per factor, like e.g. T0 to T3 for tumor size in the TNM system is a step in the right direction, but how should cut-points be chosen for new prognostic factors? Optimal cut-offs, maximizing the prognostic value of a new factor in a specific dataset, will in general lead to biased effect estimates, even though methods have been designed to deal with this problem [
10]. To avoid bias, pre-defined percentile-based cut-offs can be used, but different percentiles might be prognostically useful for different factors.
In survival analysis, the most commonly used model for analysis of multiple prognostic markers is the Cox proportional hazards regression model. In its simplest form, this model assumes constant, i.e. time independent, linear covariate effects on the log hazard scale or equivalently multiplicative effects on the hazard scale (proportional hazards). The log hazard is hence assumed to increase or decrease with the same constant additive factor for each step on the scale of the covariate, e.g. for each year of age at diagnosis of breast cancer. One way of relaxing this strong and often biologically unrealistic assumption of linear covariate effects is to use fractional polynomial (FP) transformations [
11‐
14]. Transformations of this kind are useful when one wishes to preserve the continuous nature of the covariates in a regression model, but suspects that some of the effects may be non-linear. By taking non-linearity into account, more prognostic information will be extracted, which might have important clinical implications. A limited number of studies have addressed this question. Sauerbrei et al. have evaluated the use of FP transformations in Cox modelling of recurrence-free survival in a lymph node-positive breast cancer data set from the German Breast Cancer Study Group [
15]. They conclude that analysis using FP transformations can extract important prognostic information which the traditional approaches may miss. More recently, Ejlertsen and co-workers have used FP transformations of age, tumor size, number of positive lymph nodes, and percentage of ER-positive nuclei, when developing a model for prediction of excess mortality after adjuvant endocrine therapy [
16]. Compared to models with categorized predictors, models with FP transformations could better identify patients without excess mortality compared to the general population [
16]. Another frequently used option is to model potential non-linear covariate effects on outcome using restricted cubic splines (RCS) [
17,
18].
The primary aim of this study was proof of principle, i.e. to evaluate if accounting for non-linear effects of the three factors age at diagnosis, tumor size, and number of positive lymph nodes improves prognostication; factors which will be utilized also in the future after an expected implementation of gene expression profiling in clinical routine. Our hypothesis was that by keeping the predictors continuous as long as possible during the modeling process, prognostication would be improved.
Discussion
Using a large cohort comprising 5609 patients with D-RFi as primary endpoint, we detected non-linear relationships to the relative hazard of distant recurrences for tumor size and number of positive lymph nodes, but not for age at diagnosis. These findings were, however, found to be of minor importance for prognostication of 10-years D-RFi in the multivariable modeling with FP transformations, since, in contrast to what we expected, only a modest increase in C-index was obtained for the model based on continuous variables compared to the model with categorized predictors. In the derivation set, a model with age and tumor size in three categories and number of positive lymph nodes in four categories, was considerably better than the corresponding model applying dichotomized variables (C-index: 0.674 vs. 0.628). These findings support the way tumor size and number of positive lymph nodes are used in the clinical decision-making today. The putative non-linear effects of these variables seem to be sufficiently captured by increasing the number of cut-offs from one to two or three. The drawback is information loss and that categorization might lead to tied predictions for large groups of patients, prohibiting the possibility to create risk groups of any size desired. Similar results were obtained in the validation set. Furthermore, the HR:s comparing the prognosis in the four groups, based on the 16th, 50th, and 84th percentile of the prognostic index derived from the final MFP model were similar in the derivation and validation sets. The relative effect estimates were smaller when the models fitted in the derivation set were applied to the validation set. This could be explained by over-fitting to the derivation set.
In contrast to previous studies, [
33,
34] we found no effect of age on D-RFi. This could be explained by that 33% (23/69) of the patients below the age of 35 have been treated with adjuvant chemotherapy compared to only 12% (511/4406) of the patients above 35 years. The importance of chemotherapy for the association between age and prognosis has been demonstrated by others [
33,
34]. Another possible explanation for the non-existing age trend in the present study is that the fraction of patients below 35 years is lower in this study than previously reported for population based breast cancer series, diluting the power to detect a trend.
In contrast to our results, Ejlertsen and co-workers have shown that FP transformation outperformed the predictions based on categorized variables [
16]. This may be explained by that they also included the percentage of ER-positive nuclei in their algorithm and furthermore used a population-based and more homogenous derivation cohort of 6529 postmenopausal high-risk patients, all receiving five years of adjuvant endocrine therapy. Also, the study by Sauerbrei and colleagues concluded that FP extracted more prognostic information in a study only including patients with lymph node-positive breast cancer (
N = 686; [
15]).
We have used both MFP- and RCS-transformations to model potentially non-linear relationships to prognosis for the factors age at diagnosis, tumor size and number of positive lymph nodes. The results, as measured by the C-indices and the functional form of the relationships, were strikingly similar. These transformation methods have advantages and disadvantages, as discussed by Royston and Sauerbrei [
13]. FPs are more sensitive to outliers, but this can be handled for example by restricting the degrees of freedom for each factor. A single patient with 47 positive lymph nodes, which was the most extreme value observed in the derivation set, altered the shape of the estimated relationship. A sensitivity analysis revealed that the final prognostic model suggested by the MFP procedure had fewer degrees of freedom when this patient was excluded. RCS, on the other hand can lead to over-fitting [
13] especially if many knots are used. The integrated automatic selection of variables and functional forms of these, implemented in the MFP procedure, gives some protection against over-fit, but to avoid capturing too much of nuances in the data set used for estimation, incorporation of prior knowledge should also be considered during the statistical modeling. Another, alternative modeling strategy is artificial neural networks (ANN), which was recently applied to a dataset, which largely overlaps with the derivation set in the present paper [
35]. The performance of ANN and Cox models were almost identical.
In an initial FP-modeling step, we revealed a non-monotonic relationship between the relative hazard of distant recurrences and tumor size. This was caused by incorrect values of tumor size for four patients in the very small subset of patients with tumors less or equal to 2 mm. This finding highlights the importance of the quality of the data. We have not been able to perform a complete examination of all figures in the database, but the study of Rydén et al., demonstrate good agreement for parts of the patient material included in the present study [
36].
One limitation with the present study is that only the three factors age at diagnosis, tumor size, and number of positive lymph nodes are included. A clinically applicable model should include all prognostic factors in use, i.e. according to current guidelines also ER, PgR, HER2, and histological grade [
2,
37]. Unfortunately, we did not have complete and standardized information for these additional factors. However, in an expected future situation, when different gene profiles have replaced single biomarker analyses, age, tumor size and number of positive lymph nodes will most likely still be included in clinical routine management of breast cancer patients, and therefore the results obtained with these three factors in the present work, should retain their value. Another limitation is that the derivation dataset is not population based, but rather consists of patients included in randomized controlled trials and well-defined cohorts from different geographical areas and time periods. A strength, on the other hand, is that the models fitted in the derivation set were successfully validated in an independent dataset, even though the validation set had a higher proportion of N0 compared to the derivation set. This suggests that the results are generalizable and robust. The discrimination was found to be better for high-risk patients than for patients whose prognostic factors indicated lower risk. Differences between the derivation set and the validation set in this study can explain the sub-optimal performance of the prediction models, but perfectly matching dataset are hard to find and it is desirable that the performance of prognostic models is good also in datasets with slightly different characteristics. Future studies aiming at clinically useful models, should be thoroughly assessed for both discrimination and calibration in external datasets, see [
30] for details.
In conclusion, categorization of age at diagnosis, tumor size, and number of positive lymph nodes into three to four groups was found to improve prognostication compared to dichotomization. The additional gain by allowing continuous non-linear effects modeled by FPs or RCS was modest – a finding in line with the famous statistician John Tukey’s advice of parsimony [
38].
Acknowledgements
We are indebted to participating departments of the South Sweden Breast Cancer Group and South East Sweden Breast Cancer Group for providing samples and clinical follow-up.