Skip to main content

01.12.2012 | Research article | Ausgabe 1/2012 Open Access

BMC Medical Informatics and Decision Making 1/2012

Prediction of axillary lymph node metastasis in primary breast cancer patients using a decision tree-based model

BMC Medical Informatics and Decision Making > Ausgabe 1/2012
Masahiro Takada, Masahiro Sugimoto, Yasuhiro Naito, Hyeong-Gon Moon, Wonshik Han, Dong-Young Noh, Masahide Kondo, Katsumasa Kuroi, Hironobu Sasano, Takashi Inamoto, Masaru Tomita, Masakazu Toi
Wichtige Hinweise

Electronic supplementary material

The online version of this article (doi:10.​1186/​1472-6947-12-54) contains supplementary material, which is available to authorized users.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

MT (Takada) carried out the statistical analysis. MS performed data-mining analysis. MT, MS and YN drafted the manuscript. HM, WH and DN collected the validation data and drafted the manuscript. MK helped to design the study and helped to draft the manuscript. KK collected the training data. HS, TI and MT (Tomita) helped to design the study. MT (Toi) conceived the fundamental idea, designed the study and drafted the manuscript. All authors read and approved the final manuscript.



The aim of this study was to develop a new data-mining model to predict axillary lymph node (AxLN) metastasis in primary breast cancer. To achieve this, we used a decision tree-based prediction method—the alternating decision tree (ADTree).


Clinical datasets for primary breast cancer patients who underwent sentinel lymph node biopsy or AxLN dissection without prior treatment were collected from three institutes (institute A, n = 148; institute B, n = 143; institute C, n = 174) and were used for variable selection, model training and external validation, respectively. The models were evaluated using area under the receiver operating characteristics (ROC) curve analysis to discriminate node-positive patients from node-negative patients.


The ADTree model selected 15 of 24 clinicopathological variables in the variable selection dataset. The resulting area under the ROC curve values were 0.770 [95% confidence interval (CI), 0.689–0.850] for the model training dataset and 0.772 (95% CI: 0.689–0.856) for the validation dataset, demonstrating high accuracy and generalization ability of the model. The bootstrap value of the validation dataset was 0.768 (95% CI: 0.763–0.774).


Our prediction model showed high accuracy for predicting nodal metastasis in patients with breast cancer using commonly recorded clinical variables. Therefore, our model might help oncologists in the decision-making process for primary breast cancer patients before starting treatment.
Additional file 1: Appendix A: Processes used to develop the predictive model. Additional B: ADTree-based prediction models. Additional C: Calculation of the predictive score in each ADTree model. Additional D: Calibration plots of the ADTree-based model for the Kyoto and Seoul datasets. Additional E: AUC values and the number of nodes in the pruning analysis. Additional F: ROC curves of the ADTree model, the MSKCC nomogram and the Russells Hall Hospital scoring system using the Seoul dataset (n = 131). (DOC 351 KB)
Authors’ original file for figure 1
Authors’ original file for figure 2
Authors’ original file for figure 3
Authors’ original file for figure 4
Über diesen Artikel

Weitere Artikel der Ausgabe 1/2012

BMC Medical Informatics and Decision Making 1/2012 Zur Ausgabe