Background
Methods
Data collection
Nominal variable | Name | Description | Value | Proportion (%) |
---|---|---|---|---|
V2 | Marital status | The marital status of the patients | Married | 81.6 |
Not married | 18.4 | |||
V3 | Menopausal status | The way of menopausal encountered by the patients | Natural menopause | 50.6 |
Pre-menopause | 42.8 | |||
Surgical menopause | 6.6 | |||
V4 | Presence of family history | Presence of breast cancer in family history | Yes | 81.2 |
No | 18.8 | |||
V5 | Race | Ethnicity | Chinese | 68.4 |
Malay | 19.7 | |||
Indian | 11.9 | |||
V6 | Method of diagnosis | The method used by clinicians to confirm the diagnosis of breast cancer | Excision | 20.8 |
FNAC (Fine Needle Aspiration Cytology) | 24.5 | |||
Imaging only | 0.5 | |||
Trucut | 54.2 | |||
V7 | Classification of breast cancer | Invasive cancer is a type of malignant cell, can spread to other parts of body, called metastasized. In situ cancer is recognizable as malignant cell, but have not begun to act as malignant fashion, does not spread and does not go past the breast | Invasive | 95.3 |
Insitu | 4.7 | |||
V8 | Laterality | The laterality of breast diagnosed with cancer | Left | 45.5 |
Right | 49.5 | |||
Bilateral | 1.3 | |||
Unilateral | 3.7 | |||
V9 | Cancer stage classification | Stage 0 | Pre-cancer | 4.6 |
Stage 1, Stage 2, Stage 3 | Curable cancer | 84.2 | ||
Stage 4 | Metastatic cancer | 11.2 | ||
V10 | Grade of differentiation in tumour | Description of a tumour based on how abnormal the tumour cells and the tumour tissue look under a microscope. It is an indicator of how quickly a tumour is likely to grow and spread. G1 is poor, G2 is moderate, G3 and G4 are good differentiation described in this dataset. | Good | 32.9 |
Moderate | 37.1 | |||
Poor | 30.0 | |||
V12 | Eestrogen receptor (ER) status | Normal breast cells and some breast cancer cells have receptors that attach to the hormone Estrogen and depend on this hormone to grow. Breast cancers that have this hormone are called ER-positive. | Positive | 58.9 |
Negative | 41.1 | |||
V13 | Progesterone receptor (PR) status | Normal breast cells and some breast cancer cells have receptors that attach to the hormone progesterone and depend on this hormone to grow. Breast cancers that have this hormone are called PR-positive. | Positive | 46.0 |
Negative | 54.0 | |||
V14 | c-er-b2 status | c-er-b2 is a gene that produces a protein which acts as a receptor on the surface of the cancer cells. It is a proto-oncogene located on chromosome 17. This gene is amplified and thus the protein (HER-2) is over-expressed in around 20 to 25% of invasive breast cancers. | Positive | 24.1 |
Negative | 65.4 | |||
Equivocal | 10.5 | |||
V15 | Primary treatment type | The type of treatment underwent by the patients as their initial or first treatment. | Chemotherapy | 12.6 |
Hormone Therapy | 3.4 | |||
Surgery | 77.8 | |||
None | 6.2 | |||
V16 | Surgery status | The status of the patients weather they have been treated with surgery or not. | Surgery done | 85.5 |
No surgery | 14.5 | |||
V17 | Type of surgery | The type of surgery done to the cancer patients. The type of surgery depends on the cancer stage and tumour size. | Breast Conserving surgery | 24.3 |
Mastectomy | 61.1 | |||
No surgery | 14.6 | |||
V18 | Method of axillary lymph node dissection | Yes if it is done. The methods used to remove the axillary lymph nodes from the breast (SLNB, SLNB to AC). None, if it is not done. | Yes | 70.6 |
SLNB (Sentinel lymph node biopsy) | 6.7 | |||
SLNB to AC (Axillary clearance) | 0.4 | |||
None | 22.3 | |||
V19 | Radiotherapy | The status of the patients weather they have been treated with radiotherapy or not. | Radiotherapy | 49.4 |
No Radiotherapy | 50.6 | |||
V20 | Chemotherapy | The status of the patients weather they have been treated with chemotherapy or not. | Chemotherapy | 54.3 |
No chemotherapy | 45.7 | |||
V21 | Hormonal therapy | The status of the patients weather they have been treated with hormone therapy or not. | Hormonal therapy | 54.9 |
No hormonal therapy | 45.1 | |||
V24 | Status | The survival status of the patients. | Alive | 69.6 |
Dead | 30.4 |
Numerical variable | Name | Description | Minimum | Mean | Maximum |
---|---|---|---|---|---|
V1 | Age at diagnosis | Age of the patients when they are diagnosed with breast cancer | 0 | 50 | 92 |
V11 | Tumour size (cm) | The size of tumour (cm) | 0 | 3.2 | 30 |
V22 | Total axillary lymph nodes removed | The number of total axillary lymph nodes removed for examination | 0 | 13 | 45 |
V23 | Number of positive lymph nodes | The number of lymph nodes identified as cancerous | 0 | 3 | 19 |
No | Cluster | Estrogen receptor (ER) | Progesterone receptor (PR) | c-er-b2 status | Samples |
---|---|---|---|---|---|
1 | Hormone Receptor Sensitive (HRS) | + | + | +/− | 3520 |
2 | c-er-b2 over-expressed | – | – | + | 966 |
3 | Basal/Triple Negative Breast Cancer (TNBC) | – | – | – | 1975 |
Model evaluation
Random forest advanced modelling
Variable selection
Decision tree
Survival analysis
Results
Model evaluation
No | Algorithm | Accuracy (%) | Sensitivity | Specificity | AUC | Precision | Matthews correlation coefficient |
---|---|---|---|---|---|---|---|
1 | Decision tree | 79.80 | 0.82 | 0.75 | 0.72 | 0.91 | 0.52 |
2 | Random forest | 82.70 | 0.83 | 0.81 | 0.86 | 0.93 | 0.59 |
3 | Neural networks | 82.00 | 0.83 | 0.79 | 0.84 | 0.93 | 0.58 |
4 | Extreme boost | 81.70 | 0.84 | 0.75 | 0.87 | 0.89 | 0.57 |
5 | Logistic regression | 81.10 | 0.82 | 0.78 | 0.85 | 0.92 | 0.55 |
6 | Support vector machine | 81.80 | 0.81 | 0.84 | 0.85 | 0.95 | 0.57 |
Random forest advanced modelling
No | Cluster | Samples | Accuracy (%) |
---|---|---|---|
1 | Hormone Receptor Sensitive | 3520 | 84.00 |
2 | c-er-b2 Over-expressed | 966 | 77.60 |
3 | Triple Negative Breast Cancer | 1975 | 20.70 |
Variable selection
Cluster | VSURF (cut-off VI mean = 0.01) | RandomForestExplainer (First 6 variables) |
---|---|---|
All data | V11: Tumor size > V9: Cancer stage classification > V22: Total lymph nodes > V23: Positive lymph nodes > V15: Primary treatment type > V6: Method of diagnosis | V11: Tumor size > V9: Cancer stage > V6: Method of diagnosis > V15: Primary treatment type > V22: Total lymph nodes > V23: Positive lymph nodes |
Hormone Receptor Sensitive (HRS) | V9: Cancer stage classification > V11: Tumor size > V22: Total lymph nodes > V15: Primary treatment type > V23: Positive lymph nodes | V11: Tumor size > V9: Cancer stage > V15: Primary treatment type > V23: Positive lymph nodes > V6: Method of diagnosis > V22: Total lymph nodes |
CERB2 Over-expressed | V11: Tumor size > V23: Positive lymph nodes > V9: Cancer stage classification > V22: Total lymph nodes > V15: Primary treatment type | V11: Tumor size > V9: Cancer stage > V15: Primary treatment type > V23: Positive lymph nodes > V6: Method of diagnosis > V22: Total lymph nodes |
Basal/Triple Negative Breast Cancer (TNBC) | V11: Tumor size > V22: Total lymph nodes > V9: Cancer stage classification > V23: Positive lymph nodes> V15: Primary treatment type > V18: Method of axillary lymph node dissection > V17: Type of surgery > V6: Method of diagnosis > V16: Surgery status | V11: Tumor size > V9: Cancer stage > stage > V23: Positive lymph nodes > V22: Total lymph nodes V15: Primary treatment type > V6: Method of diagnosis |