Introduction
Methods
Databases
Confidence in risk factors
-
definite (an established association between outcome and exposure where chance, bias [systematic error], confounders [misrepresentation of an association by unmeasured factor/s] are eliminated with significant confidence)
-
probable (an association exists between the outcome and the exposure where chance, bias, confounders cannot be eliminated with sufficient confidence—inconsistent results found with different studies)
-
possible (inconclusive or insufficient evidence of an association between the outcome and the exposure)
Results
Potential risk factors included in breast cancer non-clinical predictive models
Name of model | Gail [37] | Rosner [42] | Rosner [25] | Colditz [50] | Ueda [38] | Boyle [39] | Lee [36] | Novotny [24] | Gail [32] | Matsuno [51] | Banegas [40] | Pffeifer [31] | Park [23] | Lee [33] | Effect | Level of evidence | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Basic characteristics | |||||||||||||||||
Age | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Increased risk | Definite | |||||
Ethnicity | Yes | Jewish increased risk | Definite | ||||||||||||||
Height | Yes | Increased risk | Definite | ||||||||||||||
Weight | Yes | Increased risk in post-menopausal | Probable | ||||||||||||||
BMI | Yes | Yes | Yes | Yes | Yes | Yes | Probable | ||||||||||
Alcohol intake | Yes | Yes | Yes | Yes | Yes | Increased risk | Probable | ||||||||||
Smoking | Yes | Yes | Increased risk | Possible | |||||||||||||
Physical activity | Yes | Yes | Yes | Decreased risk | Possible | ||||||||||||
Diet | Yes | Decreased risk | Probable | ||||||||||||||
Hormonal and reproductive factors | |||||||||||||||||
Age at menarche | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Increased risk | Definite | |||
Age at first live birth | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Increases risk | Definite | |
Age at subsequent birth | Yes | Yes | Increases risk | Definite | |||||||||||||
Age at menopause | Yes | Yes | Yes | Yes | Yes | Yes | Increased risk | Definite | |||||||||
Hormone replacement therapy use | Yes | Yes | Yes | Yes | Increases risk | Definite | |||||||||||
Oral contraceptive use | Yes | Yes | Yes | Increases risk | Definite | ||||||||||||
Breast feeding | Yes | Yes | Decreases risk | Probable | |||||||||||||
Pregnancy | Yes | Decreases risk | Possible | ||||||||||||||
Parity | Yes | Yes | Decreases risk | Definite | |||||||||||||
Children number | Yes | Yes | Decreases risk | Possible | |||||||||||||
Menopause type | Yes | Surgical menopause reduces risk | Possible | ||||||||||||||
Menstrual regularity | Yes | Menstrual regularity and duration—inconsistent results | Possible | ||||||||||||||
Menstrual duration | Yes | Yes | Possible | ||||||||||||||
Menopausal status | Yes | Yes | Post-menopause increases risk | Possible | |||||||||||||
Gestation period | Yes | Increases risk | Possible | ||||||||||||||
Family history of breast and/or ovarian cancer or diseases | |||||||||||||||||
Family history of breast cancer | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Increases risk | Definite | |||||
First-degree relatives with breast cancer | Yes | Yes | Yes | Yes | Increases risk | Definite | |||||||||||
Age of onset of breast cancer in a relative | Yes | Increases risk | Probable | ||||||||||||||
Benign breast disease | Yes | Yes | Yes | Increases risk | Probable | ||||||||||||
History of breast biopsies | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Increases risk | Definite | ||||||||
Mammogram | Yes | Increases risk | Probable | ||||||||||||||
Summary of risk factors included in each model | |||||||||||||||||
Definite factors | 5 | 5 | 6 | 10 | 3 | 5 | 3 | 6 | 3 | 5 | 5 | 5 | 7 | 5 | Max of 10 and min of 3 factors | ||
Probable factors | 0 | 0 | 0 | 4 | 1 | 3 | 2 | 1 | 0 | 1 | 0 | 3 | 3 | 2 | Max of 4 and min of 0 factors | ||
Possible factors | 0 | 0 | 0 | 2 | 0 | 2 | 3 | 0 | 0 | 0 | 0 | 1 | 3 | 5 | Max of 5 and min of 0 factors | ||
Total factors | 5 | 5 | 6 | 16 | 4 | 8 | 8 | 7 | 3 | 5 | 5 | 9 | 13 | 12 | Max of 16 and min of 3 factors |
Evaluation measures of the risk models
-
Calibration (reliability): the E/O statistic measures the calibration performance of the predictive model. Calibration involves comparing the expected versus observed numbers of the event using goodness-of-fit or chi square statistics. A well-calibrated model will have a number close to 1 indicating little difference between the E and O events. If the E/O statistic is below 1.0 then the event incidence is underestimated, while if the E/O ratio is above 1.0 then incidence is overestimated [14, 26].
-
Discrimination (precision): the C statistic (Concordance statistic) measures the discrimination performance of the predictive model and corresponds to the area under a receiver operating characteristic curve. This statistic measures how efficiently the model is able to discriminate affected individuals from un-affected individuals. A C-statistic of 0.5 indicates no discrimination between individuals who go on to develop the condition and those who do not. In contrast, a C-statistic of 1 implies perfect discrimination [27, 28]. Good discrimination is important for screening individuals and for effective clinical decision making [10].
-
Accuracy: is tested by measuring of ‘sensitivity’, ‘specificity’, ‘positive predictive value’ (PPV) and negative predictive value (NPV). All of these terms are defined in Table 2. These measures indicate how well the model is able to categorize specific individuals into their real group (i.e., 100% certain to be affected or unaffected). Accuracy is equally important for both individual categorisation and for clinical decision making. Nevertheless, even with good specificity or sensitivity, low positive predictive values may be found in rare diseases [10] as the predictive values also depend on disease prevalence. With high prevalence, PPV will increase while NPV will decrease [29].
-
Utility: this evaluates the ease with which the target groups (public, clinicians, patients, policy makers) can submit the data required by the model. Utility evaluation assesses lay understanding of risk, risk perception, results interpretation, level of satisfaction and worry [30]. This evaluation usually uses surveys or interviews [26].
Term | Definition | Equation |
---|---|---|
Sensitivity | Probability of a test will indicate ‘positive’ among those with the disease | (TP)/(TP + FN) |
Specificity | Probability of a test will indicate ‘negative’ among those without the disease | (TN)/(TN + FP) |
Positive predictive value | Probability of a patient having disease when test is positive | (TP)/(TP + FP) |
Negative predictive value | Probability of a patient not having disease when test is negative | (TN)/(FN + TN) |
Overview of current models
Model | Calibration | Discrimination | Accuracy | Utility | ||||
---|---|---|---|---|---|---|---|---|
Derived model | Internal | External | Derived model | Internal | External | Sensitivity, specificity, PPV, NPV | ||
Gail [37] | 0.79–1.12 | 0.58–0.67 | ||||||
Rosner [42] | – | – | – | – | – | – | – | |
Rosner [25] | – | 1.00 (0.93–1.07)d | – | 0.57 (0.55–0.59)d | ||||
Colditz [50] | – | – | 1.01 (0.94–1.09)d | – | 0.64 (0.62–0.66)d | – | Goode | |
Ueda [38] | – | – | – | – | – | – | – | |
Boyle [39]a | (a) 0.96 (0.75–1.16) cohort1 (b) 0.92 (0.68–1.16) cohort2 | – | 0.59 | – | – | |||
Lee [36] | – | – | – | – | – | – | – | |
Novotny [24] | – | – | – | – | – | – | – | |
Gail [32] | – | 1.08 (0.97–1.20) | 0.93 (0.97–1.20)f | – | 0.56 (0.54–0.58)f | – | – | |
Matsuno [51] | 1.17 (0.99–1.38) | 0.614 (0.59–0.64) | – | – | ||||
Banegas [40]b | – | (a) 1.08 (0.91–1.28); Hispanic (b) 0.98 (0.96–1.01); NHW | – | – | – | – | – | – |
Pfeiffer [31] | 1.00 (0.96–1.04) | 0.58 (0.57–0.59) | ||||||
Park [23]c | – | – | (a) 0.97(0.67–1.40); KMCC (b) 0.96 (0.70–1.37); NCC | – | (a) 0.63 (0.61–0.65) < 50 years (KMCC) (b) 0.65 (, 0.61–0.68) ≥ 50 years (KMCC) | (a) 0.61(0.49–0.72); KMCC (b) 0.89(0.85–0.93); NCC | – | – |
Lee [33] | Overall: 0.62 (0.620–0.623) Under 50: 0.61 (0.60–0.61) Above 50: 0.64 (0.63–0.64) | (a) Sensitivity Overall: 0.55 (0.54–0.56) < 50: 0.61 (0.60–0.62) > 50:0.59 (0.59–0.60) (b) Specificity Overall: 0.66 (0.65–0.67) > 50: 0.58 (0.57–0.59) < 50:0.64 (0.63–0.65) (c) Accuracy Overall: 0.60 (0.60–0.61) > 50:0.59 (0.59–0.60) < 50:0.61 (0.61–0.62) | – |
Author/model | Study design | Participants | Ethnicity | Outcome | Statistical method | Effect estimates | Sample size | Risk factors considered in the models | Age target | Stratification |
---|---|---|---|---|---|---|---|---|---|---|
Gail [37] | Case–control | White American females from the Breast Cancer Detection Demonstration Project (BCDDP) | American–Caucasian | Invasive breast cancer + in situ carcinoma | unconditional logistic regression | Relative risk | 2,852 cases 3,146 controls | Age at menarche, age at first live birth, number of previous biopsies, and number of first-degree relatives with breast cancer | Any age | None |
Rosner [42] | Cohort | Registered nurses | American–Caucasian | Invasive breast cancer | Poisson regression | Cumulative incidence | 2,341 cases, 91,523 controls | Age, age at all births, menopause age, menarche age | 30–55 years | Number of births |
Rosner [25] | Cohort | Registered nurses | American–Caucasian | Invasive breast cancer | Poisson regression | Relative risk | 2,249 cases, 89,132 controls | Menarche age, first live birth age, subsequent births age, menopause age | Any age | None |
Colditz [50] | Cohort | General women | American–Caucasian | Invasive breast cancer | Poisson regression | Cumulative incidence | 1,761cases 56,759 controls | Benign breast disease, use of HRT, weight, height, menopausal type, and alcohol intake | Women aged 30–55 years | None |
Ueda [38] | Case–control | General women | Japanese–Asian | Invasive breast cancer | Conditional logistic regression | Relative risk | 376 cases 430 controls | Menarche, first birth age, family history, and BMI in post-menopausal women | Any age | Menopausal status |
Boyle [39] | Case–control | General women | Italian–Caucasian | Invasive breast cancer | Conditional logistic regression | Absolute + relative risk | 2,569 cases 2,588 controls | Menarche age, first birth age, alcohol intake, family history, age of diagnosis in relatives, and one of the two diet scores. BMI and HRT were included only for women older > 50 | 23–74 years (cases) 20–74 years (controls) | Age (< 50 and > 50) |
Lee [36] | Case–control | 1-General women 2-Well educated (nurse/teacher) | Korean–Asian | Invasive breast cancer | Hosmer–Lemeshow goodness of fit | Probability | 384 cases 270 controls | With hospitalized controls: family history, menstrual regularity, total menstrual duration, first full-term pregnancy age, breastfeeding duration while with nurse/teacher controls: age, menstrual regularity, drinking status, smoking status | Age at least 20 years | None |
Novotny [24] | Case–control | General women | Czeck females–Caucasian | Invasive breast cancer | Unconditional Logistic regression | Relative risk | 4,598 matched pairs | Age at birth of first child, family history of breast cancer, No. of previous breast biopsy, menarche age, parity, history of benign breast disease | Age matched | None |
Gail [32] | Case–control | General women | African American | Invasive breast cancer | Conditional logistic regression | Absolute + relative risk | 1,607cases 1,647 controls | Menarche age, No. of affected mother or sisters, No. of benign biopsy | 35–64 years | Age (< 50 and > 50) |
Matsuno [51] | Case–control | General women | Asian and Pacific Islander American | Invasive breast cancer | Conditional logistic regression | Absolute + relative + attributable risks | 589 cases 952 controls | Menarche age, age at first live birth, No. of biopsies, family history, ethnicity | Any age | Ethnicity |
Banegas [40] | Longitudinal study | General women | Hispanic | Invasive breast cancer | Cox proportional hazards regression | Relative risk | 6,353 cases 128,976 controls | Age, age at first live birth, menarche age, No. of first-degree relatives with breast cancer, No. of breast biopsies | Post-menopausal participants aged ≥ 50 | None |
Pfeiffer [31] | Prospective study | White over 50 years old | White and non-Hispanic Caucasian | Invasive breast cancer | Cox proportional hazards regression | Relative and attributable risks | 7,695 cases 240,712 controls | BMI, oestrogen and progestin MHT use, other MHT use, parity, age at first birth, pre-menopausal, age at menopause, benign breast diseases, family history of breast or ovarian cancer, and alcohol consumption | 50 and above | None |
Park [23] | Case–control | General women | Korean–Asian | Invasive breast cancer | Unconditional Logistic regression | Absolute risk | 3,789 cases 3,789 controls | Family history, menarche age, menopausal status, menopause age, pregnancy, first full-term pregnancy age, No. of pregnancies, breastfeeding duration, OC usage, HRT, exercise, BMI, smoking, drinking, No. of breast examinations | Any age | Age (< 50 and > 50) |
Lee [33] | Case–control | General women | Asian | Invasive breast cancer | Conditional logistic regression | 2,291 cases and 2,283 controls | First full-term pregnancy age, children No., menarche age, BMI, family history, menopausal status, regular mammography, exercises, oestrogen exposure duration, gestation period, menopause age | Any age | Age (< 50 and > 50) |
Title | Size of study | Population | First author | References | |
---|---|---|---|---|---|
Included in this review | Projecting individualized probabilities of developing breast cancer for white females who are being examined annually | 2,852 cases 3,146 controls | Caucasian | Gail 1989 | [37] |
Reproductive risk factors in a prospective study of breast cancer: the Nurses’ Health Study | 2,341 cases, 91,523 controls | Caucasian | Rosner 1994 | [42] | |
Nurses’ health study: log-incidence mathematical model of breast cancer incidence | 2,249 cases, 89,132 controls | Caucasian | Rosner 1996 | [25] | |
Cumulative risk of breast cancer to age 70 years according to risk factor status: data from the Nurses’ Health Study | 1,761cases 56,759 controls | Caucasian | Colditz | [50] | |
Estimation of individualized probabilities of developing breast cancer for Japanese women | 376 cases 430 controls | Asian | Ueda | [38] | |
Contribution of three components to individual cancer risk predicting breast cancer risk in Italy | 2,569 cases 2,588 controls | Caucasian | Boyle | [39] | |
Determining the Main Risk Factors and High-risk Groups of Breast Cancer Using a Predictive Model for Breast Cancer Risk Assessment in South Korea | 384 cases 270 controls | Asian | Lee | [36] | |
Breast cancer risk assessment in the Czech female population–an adjustment of the original Gail model | 4,598 matched pairs | Caucasian | Novotny | [24] | |
Projecting individualized absolute invasive breast cancer risk in African American women | 1,607cases 1,647 controls | African | Gail | [32] | |
Projecting individualized absolute invasive breast cancer risk in Asian and Pacific Islander American women | 589 cases 952 controls | Asian | Matsuno | [51] | |
Evaluating breast cancer risk projections for Hispanic women | 6,353 cases 128,976 controls | Hispanic | Banegas | [40] | |
Risk Prediction for Breast, Endometrial, and Ovarian Cancer in White Women Aged 50 y or Older: Derivation and Validation from Population-Based Cohort Studies | 42,821 cases 114,931 controls | White, non-Hispanic women aged 50+ | Pfeiffer | [53] | |
Korean risk assessment model for breast cancer risk prediction | 3,789 cases 3,789 controls | Asian | Park | [23] | |
Computational Discrimination of Breast Cancer for Korean Women Based on Epidemiologic Data Only | 2,291 cases and 2,283 controls | Asian | Lee | [33] | |
Excluded from this review |