Background
Breast cancer is the most common cancer among women worldwide [
1]. Known modifiable hormonal and lifestyle risk factors, however, are estimated to be responsible for only around 30% of breast cancers in high-income countries [
2‐
8], so a better understanding of the etiology of the disease and of the biological mechanisms is needed.
The metabolome reflects endogenous processes and environmental and lifestyle factors [
9‐
13]. Metabolomics can detect subtle differences in metabolism; therefore, it is a promising tool to identify new etiological pathways. Previous prospective studies of breast cancer which have employed metabolomics have used both targeted (analyses of a pre-defined panel of metabolites) [
14] or untargeted (where as many metabolites as possible are measured and then characterized [
15]) approaches [
16‐
18]. In previous studies, lysophosphatidylcholine a C18:0 [
14], various lipids, acetone, and glycerol-derived compounds [
16], 16a-hydroxy-DHEA-3-sulfate, 3-methylglutarylcarnitine [
17], and caprate (10:0), were associated with breast cancer development [
18]. The number of cases included in these studies was, however, limited (from 200 to 621) and heterogeneity by subtype was investigated in only one study [
18].
In the current study, we employed a targeted metabolomics approach to prospectively investigate the associations between 127 metabolites measured by mass spectrometry in pre-diagnostic plasma samples and risk of breast cancer, overall, and by breast cancer subtype, accounting for established breast cancer risk factors.
Methods
Study population, blood collection, and follow-up
EPIC is an ongoing multi-center cohort study including approximately 520,000 participants recruited between 1992 and 2000 from ten European countries [
19]. Female participants (
n = 367,903) were aged 35–75 years old at inclusion. At recruitment, detailed information was collected on dietary, lifestyle, reproductive, medical, and anthropometric data [
19]. Around 246,000 women from all countries provided a baseline blood sample. Blood was collected according to a standardized protocol in France, Germany, Greece, Italy, the Netherlands, Norway, Spain, and the UK [
19]. Serum (except in Norway), plasma, erythrocytes, and buffy coat aliquots were stored in liquid nitrogen (− 196 °C) in a centralized biobank at IARC. In Denmark, blood fractions were stored locally in the vapor phase of liquid nitrogen containers (− 150 °C), and in Sweden, they were stored locally at − 80 °C in standard freezers.
Incident cancer cases were identified through record linkage with cancer registries in most countries and through health insurance records, cancer and pathology registries, and active follow-up of study subjects in France, Germany, and Greece. For each EPIC center, closure dates of the study period were defined as the latest dates of complete follow-up for both cancer incidence and vital status (dates varied between centers, from June 2008 to December 2012).
All participants provided written informed consent to participate in the EPIC study. This study was approved by the ethics committee of the International Agency for Research on Cancer (IARC) and all centers.
Selection of cases and controls
Subjects were selected among participants who were cancer-free (other than non-melanoma skin cancer) and had donated blood at recruitment into the cohort. Cancers were coded according to the Third Edition of the International Classification of Diseases for Oncology (code C50). Women diagnosed with first primary invasive breast cancer at least 2 years after blood collection and before December 2012, for whom estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) statuses of the tumors were available, were selected as cases for the current study.
For each breast cancer case, one control was chosen at random among appropriate risk sets comprising all female cohort members who were alive and without cancer diagnosis (except non-melanoma skin cancer) at the time of diagnosis of the index case. Using incidence density sampling, controls were matched to cases on center of recruitment, age (± 6 months), menopausal status (premenopausal, perimenopausal, postmenopausal, surgically postmenopausal [
20]), phase of the menstrual cycle [
20], use of exogenous hormone at blood collection, time of the day (± 1 h), and fasting status at blood collection (non-fasting (< 3 h since last meal), in between (3–6 h), fasting (> 6 h), unknown).
Initially, 1626 cases and 1626 controls were eligible for the study, but after the exclusion of pregnant women at blood collection, a final population of 1624 cases and 1624 controls were included in the analysis.
Laboratory measurements
All plasma samples were assayed in the Biomarkers laboratory at IARC, using the AbsoluteIDQ p180 platform (Biocrates Life Sciences AG, Innsbruck, Austria) and following the procedure recommended by the vendor. A QTRAP5500 mass spectrometer (AB Sciex, Framingham, MA, USA) was used to measure 147 metabolites (19 acylcarnitines, 21 amino acids, 13 biogenic amines, 79 glycerophospholipids, 14 sphingolipids and hexoses). Samples from matched case-control sets were assayed in the same analytical batch. Laboratory personnel were blinded to case-control status of the samples.
Metabolites were analyzed in samples from 3247 distinct subjects (one subject included in 2 pairs). Completeness of measures and coefficients of variation (median = 5.3%, interquartile range = 1.4%) are shown in Additional file
1: Table S1
. Values lower than the lower limit of quantification (LLOQ), or higher than the upper limit of quantification (ULOQ), as well as lower than batch-specific limit of detection (LOD) (for compounds measured with a semi-quantitative method: acylcarnitines, glycerophospholipids, sphingolipids), were considered out of the measurable range. Metabolites were excluded from the statistical analyses if more than 20% of observations were outside the measurable range (
n = 20). A total of 127 metabolites (8 acylcarnitines, 20 amino acids, 6 biogenic amines, 78 glycerophospholipids, 14 sphingolipids and hexoses) were finally retained for statistical analyses. Of these 127 metabolites, 113 had all values included in the measurable range. For the remaining 14 metabolites, values outside the quantifiable range (all lower than LLOQ or LOD) were imputed with half the LLOQ or half the batch-specific LOD, respectively.
Statistical analysis
Characteristics of cases and controls were described using mean and standard deviation (SD) or frequency. Geometric means were used to describe non log-transformed metabolite concentrations among cases and controls. Log-transformed metabolite concentrations were used in all other analyses. Partial Pearson’s correlations between metabolites, adjusted for age at blood collection, were estimated among controls.
We used conditional logistic regression to estimate the risk of breast cancer per standard deviation (SD) increase in metabolite concentration. The analysis was conditioned on the matching variables. Likelihood ratio tests were performed to compare linear models with cubic polynomial models in order to assess departure from linearity. Multiple testing was addressed by controlling for family-wise error rate at
α = 0.05 by permutation-based stepdown
minP adjustment of
P values, as this method better accounts for the dependence of the tests [
21,
22]. For comparison with previous studies, we also adjusted the raw
P values using Bonferroni correction (
P < 0.05/127) and controlling for the false discovery rate (FDR) at
α = 0.05 [
23]. All statistical tests were two-sided.
Metabolites showing a statistically significant association with risk of breast cancer after correcting for multiple testing were categorized into quintiles based on the distribution of the concentrations among controls, and odds ratios (OR) for risk of breast cancer were estimated in each category. For tests of linear trend, participants were assigned the median value in each quintile and we modeled the corresponding variable as a continuous term. To identify potential confounders, models of the metabolites of interest (continuous and quintiles) were adjusted separately for each potential confounder and estimates obtained were compared with estimates from models with matching variables only. Only variables that changed parameter estimates by more than 10% were retained in the multivariable model. Variables tested were as follows: age at first menstrual period (continuous), number of full-term pregnancies (0/1/2/≥ 3), age at first full-term pregnancy (never pregnant/quartiles), breastfeeding (ever/never/never pregnant/missing; duration in quintiles), ever use of oral contraceptive (yes/no), ever use of MHT (yes/no/missing), smoking status (never/former/current), level of physical activity (Cambridge index [
24]: inactive/moderately inactive/moderately active/active), alcohol consumption (nondrinkers/> 0–3/3–12/12–24 g/day), education level (no schooling or primary/technical, professional or secondary/longer education), energy intake (continuous, quintiles), height (continuous, quintiles), sitting height (missing/quartiles), weight (continuous, quintiles), body mass index (continuous, quintiles), waist circumference (continuous, quintiles), hip circumference (continuous, quintiles), and hypertension (yes/no). For these variables, missing values were assigned the median (continuous variables) or mode (categorical variables) if they represented less than 5% of the population, or were otherwise classified in a “missing” category (breastfeeding, ever use of MHT, sitting height). Only waist circumference (continuous), hip circumference (continuous), and weight (continuous) were included in the final models. Given the correlations between these variables (> 0.77), these variables were included separately in three different models.
For those metabolites showing a significant association with breast cancer risk after controlling for multiple testing, heterogeneity was investigated by menopausal status at blood collection, use of exogenous hormones at blood collection, fasting status at blood collection, age at diagnosis (age 50 or older/younger than age 50), breast cancer subtype (ER+PR+/−HER2+, ER+PR+/−HER2−, ER−PR−HER2+, ER−PR−HER2−), time between blood collection diagnosis (2–8.6 years/more than 8.6 years), and at recruitment waist circumference (WC) (< 80 cm/≥80 cm), BMI (< 25 kg/m2/≥25 kg/m2), and country, by introducing interaction terms in the models. Subgroup analyses were conducted on the raw models. For WC, unconditional logistic regression adjusted for each matching factor was used. P values were not corrected for multiple tests since heterogeneity was investigated only for metabolites showing statistically significant associations with risk overall, after correction for multiple testing.
A sensitivity analysis of all 127 metabolites was performed on hormone non-users (1124 cases and 1124 controls) and by cancer subtype.
Analyses were conducted using SAS software for Windows (version 9.4, Copyright© 2017, SAS Institute Inc.) and R software (packages Epi and NPC) [
25,
26].
Discussion
In this prospective analysis that investigated the association of 127 circulating metabolites with breast cancer incidence, among women not using hormones at baseline, and after control for multiple tests, acylcarnitine C2 was positively associated with risk of breast cancer, while levels of a set of phosphatidylcholines (ae C36:3, aa C36:3, ae C34:2, ae C36:2 and ae C38:2) and the amino acids arginine and asparagine were inversely associated with disease risk. In the overall population (hormone users and non-users), only C2 and PC ae C36:3 were associated with risk of breast cancer independently from breast cancer subtype, age at diagnosis, fasting and menopausal status at collection, or adiposity.
Acylcarnitine C2 plays a key role in the transport of fatty acids into the mitochondria for β-oxidation [
27,
28]. In human intervention studies, plasma concentration levels have been seen to vary according to the activity of the fatty oxidation pathway [
28,
29]. High C2 levels are associated to other known mechanisms involved in breast cancer development, such as hyperinsulinemia and insulin resistance [
30], consistent with some studies showing increased plasma concentrations of acetylcarnitine in pre-diabetic or diabetic women [
31‐
33]. An explanation for the associations observed only in women not using hormones, for C2 and for other metabolites, could be that due to their increased exposure to estrogens, MHT users are already at a higher risk of breast cancer than non-users [
34], similarly to what is observed for BMI and postmenopausal breast cancer risk [
35].
Phospholipids are a major component of cell membranes and play a major role in cell signaling and cell cycle regulation. Previous studies of phospholipids showed that PC ae C36:3 concentrations were decreased in type 2 diabetes [
36,
37] and that lower serum levels were predictive of future diabetes [
38]. Lower concentrations of PCs ae C38:2 and ae C34:2 were also observed in diabetic men compared to non-diabetics [
37]. A biological basis for such inverse associations could rely on observed antioxidant effect of PCs [
39].
In line with the inverse association observed between arginine and risk of breast cancer in hormone non-users, decreased plasma concentrations of arginine has been observed in breast cancer patients [
40] compared with controls. Both human [
41] and animal [
42] studies have observed a reduction in anti-tumor immune responses in the context of arginine depletion in breast cancer, suggesting a link between arginine and immunity. In addition, higher plasma concentrations of arginine were correlated with lower estradiol and insulin-like growth factor 1 concentrations in premenopausal women [
43], linking arginine to known mechanisms leading to breast cancer development. Regarding asparagine, a recent animal and in vitro study suggested that reduced asparagine bioavailability resulted in slower disease progression [
44]. However, the role of asparagine in cancer development is not clear.
Prospective data on metabolomics and risk of breast cancer are limited [
14,
16‐
18], and differences in approaches (targeted or untargeted metabolomics), analytical methods (NMR or MS), and samples (serum or plasma) make comparisons of the results difficult. Only one previous analysis used a similar targeted metabolomics approach with measurement of the same metabolites [
14] and showed that lysophosphatidylcholine a C18:0 was inversely associated with risk of breast cancer after Bonferroni correction of
P values, and that an inverse association close to statistical significance was observed for PC ae C38:1. However, none of the metabolites identified in the present work were associated with risk of breast cancer in this previous study, which did not investigate heterogeneity by use of hormones.
In a previous study applying NMR-based metabolomics analyses in the SU.VI.MAX cohort [
16], several amino acids, lipoproteins, lipids, and glycerol-derived compounds were identified as significantly associated to breast cancer risk, suggesting that modifications in amino acid metabolism and energetic homeostasis in the context of setting up of insulin resistance could play a role in the disease. Results from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening (PLCO) study, based on an MS-based metabolomics approach in serum samples, indicated that some metabolites correlated with alcohol intake (androgen pathway metabolites, vitamin E, and animal fats) [
18], and with BMI (metabolites involved in steroid hormones metabolism and branched-chain amino acids) [
17], were also associated with breast cancer risk.
Heterogeneity by subtype was investigated only in the PLCO study, showing that some metabolites (allo-isoleucine, 2-methylbutyrylcarnitine [
17], etiocholanolone glucuronide, 2-hydroxy-3-mthylvalerate, pyroglutamine, 5α-androstan-3β, 17β-diol disulfate [
18]) were associated with risk of ER+ breast cancer, but not with breast cancer overall, indicating that the etiology of breast cancer differs by subtype. In our work, however, we did not observe any heterogeneity of results according to receptor status of the cancers.
This study is the largest prospective investigation of metabolomics and risk of breast cancer to date. Strengths of this work include its large sample size, which allowed us to examine associations by breast cancer subtype. In addition, the exclusion of cases diagnosed less than 2 years after blood collection reduces the risk of reverse causation in our findings. Finally, the assessment of numerous lifestyle factors and anthropometric measures allowed us to examine and control for potential confounding.
A potential limitation to our work is that blood was collected from participants at one time point only. Nevertheless, the reliability of plasma metabolites analyzed here has been shown to be relatively stable over 4 months to 2 years, leading to the conclusion that a single measurement might be sufficient [
45,
46,
47]. In addition, although fasting samples might be preferable over non-fasting samples, in our study, cases and controls were matched on fasting status and the results did not differ by fasting state. Another limitation is that the technologies that were used for some of the metabolites (such as PCs and lysoPCs) do not allow for a precise identification of the compounds measured, since the signal observed is not specific and may correspond to several compounds. Lastly, it is important to note that the aim of the present work was to screen metabolites associated with risk, but that further work is needed to identify the factors that influence biological levels of the metabolites associated with risk and to understand their biological connection with breast cancer development. Future studies should also integrate other molecular markers known to be linked to breast cancer to gain insight into biological mechanisms.
The opinions expressed in this article are those of the authors and do not necessarily reflect the views of the WHO, its Board of Directors, or the countries they represent.
Open Access This article is licensed under the terms of the Creative Commons Attribution 3.0 IGO License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the WHO, provide a link to the Creative Commons licence and indicate if changes were made.
The use of the WHO’s name, and the use of the WHO’s logo, shall be subject to a separate written licence agreement between the WHO and the user and is not authorized as part of this CC-IGO licence. Note that the link provided above includes additional terms and conditions of the licence.
The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.