Skip to main content

01.12.2014 | Research article | Ausgabe 1/2014 Open Access

BMC Medical Research Methodology 1/2014

Validation of prediction models based on lasso regression with multiply imputed data

BMC Medical Research Methodology > Ausgabe 1/2014
Jammbe Z Musoro, Aeilko H Zwinderman, Milo A Puhan, Gerben ter Riet, Ronald B Geskus
Wichtige Hinweise

Electronic supplementary material

The online version of this article (doi:10.​1186/​1471-2288-14-116) contains supplementary material, which is available to authorized users.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

Authors JM, AZ and RG devised the statistical methods. Authors MP and GtR were responsible for the design and data collection of the study. Author JM performed the statistical analysis and wrote the paper. All authors read and corrected the draft versions of the manuscript, and approved the final manuscript.



In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved as well. It is unclear whether the performance of a model fitted using the lasso still shows some optimism. Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects. It is unclear how resampling should be performed in the presence of multiply imputed data.


The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients. We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead. Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets. In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI.


The discriminative model performance of the lasso was optimistic. There was suboptimal calibration due to over-shrinkage. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure. Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets. Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger.


Performance of prognostic models constructed using the lasso technique can be optimistic as well. Results of the internal validation are sensitive to how bootstrap resampling is performed.
Über diesen Artikel

Weitere Artikel der Ausgabe 1/2014

BMC Medical Research Methodology 1/2014 Zur Ausgabe

Neu im Fachgebiet AINS

Mail Icon II Newsletter

Bestellen Sie unseren kostenlosen Newsletter Update AINS und bleiben Sie gut informiert – ganz bequem per eMail.