Breast cancer screening is an important strategy to allow for early detection and ensure a greater probability of having a good outcome in treatment. Robust predictive models based on data which may be collected in routine consultation and blood analysis are sought to provide an important contribution by offering more screening tools. In this work we aim to assess how models based on data which can be collected in routine blood analyses - notably, Glucose, Insulin, HOMA, Leptin, Adiponectin, Resistin, MCP-1, Age and Body Mass Index (BMI) - may be used to predict the presence of breast cancer. We believe that these parameters are a good set of candidates, as we recently verified a deregulation in their profile in obesity-associated breast cancer, [
1].
Several candidates for biomarkers of breast cancer have been reported in the literature, [
2]. In 2008 serum levels of tissue polypeptide-specific antigen, breast cancer-specific cancer antigen 15.3 (CA15–3), and insulin-like growth factor binding protein-3 (IGFBP-3) were introduced as predictors on a logistic regression. A subsequent receiver operating characteristic (ROC) analysis yielded an area under the ROC curve (AUC) value of 0.86, sensitivity 85% and specificity 62% when distinguishing controls from patients with breast cancer, [
3]. BMI, Leptin, CA15–3 and the ratio between Leptin and Adiponectin used together were assessed as a biomarker for breast cancer in [
4] (2013). Though very high values are presented for the specificity (80%) and the sensitivity (83.3%), the confidence intervals reported were [29.9%, 99.0%] and [36.5%, 99.1%], respectively. The lower bounds reported for the confidence intervals suggest that the prediction is not robust. Dalamaga et al. [
5] assessed serum Resistin as a predictor of postmenopausal breast cancer and found an AUC value of 0.72, 95% CI [0.64, 0.79]. In 2015, a similar analysis was performed for Leptin, Resistin and Visfatin, [
6]. The 95% confidence intervals for the AUC values found were [0.72, 0.87], [0.82, 0.93] and [0.64, 0.80], respectively. In terms of specificity and sensitivity, the values reported were 95.1 and 88.2% for leptin, 98.8 and 72.1% for Resistin and 97.6 and 92.6% for Visfatin. However, these values are inconsistent with the ROC curves plotted in the article, [
7]. Also in 2015, serum Irisin levels were found to discriminate breast cancer patients with 62.7% sensitivity and 91.1% specificity, [
8]. It is noteworthy that in the analysis of each of all articles mentioned in this paragraph, the data was not split into a training set and a test set. This implies that the models generated were assessed on the same data on which they were based, which is not necessarily a good indicator of performance on future data, [
9]. In [
10] the authors did indeed use a test set to evaluate potential biomarkers (promotor methylation of the tumour-suppressor genes SFRP1, SFRP2, SFRP5, ITIH5, WIF1, DKK3 and RASSF1A in cfDNA extracted from serum) for blood-based breast cancer screening. The sensitivity and specificity achieved using ITIH5, DKK3 and RASSF1A promoter methylation to distinguish between women with breast cancer and healthy controls was 67 and 69%, respectively, with the 95% confidence interval for the AUC being [0.63, 0.76].
Besides studies evaluating potential biomarkers for diagnosis, other authors have looked at breast cancer from other perspectives. In 2012 ten potential cancer serum biomarkers (Osteopontin, Haptoglobin, CA15–3, Carcinoembryonic Antigen, Cancer Antigen 125, Prolactin, Cancer Antigen 19–9, α-Fetoprotein, Leptin and Migration Inhibitory Factor) were studied to predict early stage breast cancer in samples collected before clinical diagnosis, but it was not possible to accurately differentiate samples from controls from those patients, [
11]. In [
12] a prediction model for breast cancer patients pathologic response before neoadjuvant chemotherapy was built and assessed. The predictors were tumour haemoglobin parameters measured by ultrasound-guided near-infrared optical tomography in conjunction with standard pathologic tumour characteristics. Several authors focused on assessing the risk of breast cancer, [
13‐
15]. Finally, artificial intelligence and machine learning techniques were applied to databases made publicly available in the UCI Machine Learning Repository. In particular, there has been an extensive amount of work published on the Wisconsin Breast Cancer Dataset (WBCD), the Wisconsin Diagnosis Breast Cancer (WDBC) and the Wisconsin Prognosis Breast Cancer (WPBC), see for example [
16‐
19]. In the same order, they provide cytology data which can be used for distinguishing malignant from benign samples, features computed from a digitized image of a fine needle aspirate of a breast mass again used for classifying as malignant or benign and follow-up data for breast cancer patients that can be used to predict cancer recurrence.
The models proposed in this work are based on a population with early-diagnosed breast cancer, whose extension to larger and more heterogeneous populations should subsequently be assessed. The description of the data collected and statistical methods used in the article are presented on the Methods section. The Results section is split into three subsections: first the characteristic features of the sample are described, then a univariate analysis is performed to assess the diagnostic value of each one of the nine aforementioned parameters and finally a multivariate analysis is performed wherein predictors are combined. The results are then discussed on a separate section and finally the main conclusions are presented.