The online version of this article (doi:10.1186/1471-2288-13-83) contains supplementary material, which is available to authorized users.
The authors declare that they have no competing interests.
Study design, IB, IA and JMQ; proposal of analysis to be performed, IB and IA; statistical analysis and outcomes: IB and IA; interpretation of results, IB, IA and JMQ; drafting, review and revision of text, IB, IA and JMQ; data-collection, all IRYSS-COPD Group members. All authors read and approved the final manuscript.
In medical practice many, essentially continuous, clinical parameters tend to be categorised by physicians for ease of decision-making. Indeed, categorisation is a common practice both in medical research and in the development of clinical prediction rules, particularly where the ensuing models are to be applied in daily clinical practice to support clinicians in the decision-making process. Since the number of categories into which a continuous predictor must be categorised depends partly on the relationship between the predictor and the outcome, the need for more than two categories must be borne in mind.
We propose a categorisation methodology for clinical-prediction models, using Generalised Additive Models (GAMs) with P-spline smoothers to determine the relationship between the continuous predictor and the outcome. The proposed method consists of creating at least one average-risk category along with high- and low-risk categories based on the GAM smooth function. We applied this methodology to a prospective cohort of patients with exacerbated chronic obstructive pulmonary disease. The predictors selected were respiratory rate and partial pressure of carbon dioxide in the blood (PCO2), and the response variable was poor evolution. An additive logistic regression model was used to show the relationship between the covariates and the dichotomous response variable. The proposed categorisation was compared to the continuous predictor as the best option, using the AIC and AUC evaluation parameters. The sample was divided into a derivation (60%) and validation (40%) samples. The first was used to obtain the cut points while the second was used to validate the proposed methodology.
The three-category proposal for the respiratory rate was ≤ 20;(20,24];> 24, for which the following values were obtained: AIC=314.5 and AUC=0.638. The respective values for the continuous predictor were AIC=317.1 and AUC=0.634, with no statistically significant differences being found between the two AUCs (p =0.079). The four-category proposal for PCO2 was ≤ 43;(43,52];(52,65];> 65, for which the following values were obtained: AIC=258.1 and AUC=0.81. No statistically significant differences were found between the AUC of the four-category option and that of the continuous predictor, which yielded an AIC of 250.3 and an AUC of 0.825 (p =0.115).
Our proposed method provides clinicians with the number and location of cut points for categorising variables, and performs as successfully as the original continuous predictor when it comes to developing clinical prediction rules.
Additional file 1: Appendix 2: R code.(PDF 11 KB)12874_2012_966_MOESM1_ESM.pdf
Authors’ original file for figure 112874_2012_966_MOESM2_ESM.pdf
Authors’ original file for figure 212874_2012_966_MOESM3_ESM.pdf
Authors’ original file for figure 312874_2012_966_MOESM4_ESM.pdf
Authors’ original file for figure 412874_2012_966_MOESM5_ESM.pdf
Steyerberg EW: Clinical Prediction Models. A Practical Approach to Development, Validation, and Updating. 2009, New York: Springer
Hansson L, Zanchetti A, Carruthers S, Dahlöf B, Elmfeldt D, Julius S, Ménard J, Rahn K, Wedel H, Westerling S: Effects of intensive blood-pressure lowering and low-dose aspirin in patients with hypertension: principal results of the Hypertension Optimal Treatment (HOT) randomised trial. Lancet. 1998, 351: 1755-1762. 10.1016/S0140-6736(98)04311-6. CrossRefPubMed
Hastie T, Tibshirani R: Generalized Additive Models. 1990, London: Chapman & Hall
Currie I, Durban M, Eilers P: Generalized linear array models with applications to multidimensional smoothing. J R Stat Soc B. 2006, 68: 259-280. 10.1111/j.1467-9868.2006.00543.x. CrossRef
Green P, Silverman B: Nonparametric Regression and Generalized Linear Models. 1994, London: Chapman & Hall CrossRef
Ruppert D: Selecting the number of knots and for penalized splines. J Comp Graph Stat. 2002, 11: 735-757. 10.1198/106186002853. CrossRef
Eilers P, Marx B: Flexible smoothing with B-splines and penalties. Stat Sci. 1996, 11: 89-121. 10.1214/ss/1038425655. CrossRef
Quintana J, Esteban C, Barrio I, Garcia S, Gonzalez N, Arostegui I, Lafuente I, Bare M, Blasco J, Vidal S, TI G: The IRYSS-COPD appropriateness study: objectives, methodology, and description of the prospective cohort. BMC Health Serv Res. 2011, 11: 322-10.1186/1472-6963-11-322. CrossRefPubMedPubMedCentral
Akaike H: A new look at the statistical model identification. IEEE T Automat Contr. 1974, 19: 716-723. 10.1109/TAC.1974.1100705. CrossRef
Hosmer D, Lemeshow S: Applied Logistic Regression. 2000, New Jersey: Wiley CrossRef
Quintana JM, Garcia-Gutierrez S, Aguirre U, Gonzalez-Hernandez N: Estándares de uso adecuado de tecnologías sanitarias. Creación de criterios explícitos de indicación de ingreso hospitalario en la exacerbación de EPOC. 2008, Madrid: Agencia Laín Entralgo
de Boor: A Practical Guide to Splines. 1978, New York: Springer CrossRef
Marx B, Eilers P: Direct generalized additive modeling with penalized likelihood. Comput Stat Data Anal. 1998, 28: 193-209. 10.1016/S0167-9473(98)00033-4. CrossRef
- Use of generalised additive models to categorise continuous variables in clinical prediction
José M Quintana
- BioMed Central
Neu im Fachgebiet AINS
Meistgelesene Bücher aus dem Fachgebiet AINS
Mail Icon II