Background
Metabolomics allows the simultaneous measurement of a large variety of compounds present in biological samples, such as human blood [
1,
2]. Circulating metabolite levels can reflect both endogenous and exogenous processes, providing a snapshot of biological activity [
3,
4]. As a result, metabolomics may facilitate the identification of biological mechanisms involved in the development of chronic diseases. For example, prior metabolomics studies have identified metabolites associated with the risk of various chronic conditions, including type 2 diabetes (T2D) [
5‐
7], cardiovascular diseases (CVD) [
8‐
10], and different site-specific cancers, including cancers of the breast [
11], prostate [
12,
13], endometrium [
14], kidney [
15], colorectum [
16‐
18], hepatocellular carcinoma (HCC) [
19], and others [
20,
21].
Several shared biological mechanisms are known to underlie multiple chronic diseases. Obesity, physical inactivity, and adherence to a Western-type diet, as well as chronic inflammation and insulin resistance, are recognized risk factors for cardio-metabolic diseases, including T2D, CVD, and several site-specific cancers [
22‐
24]. Metabolomics may help uncover novel etiological mechanisms that are common to several chronic diseases as well as those that are disease-specific. One recent study identified metabolites associated with the risk of multimorbidity, defined as the simultaneous presence of multiple chronic conditions within one individual [
25]. Focusing on a pre-defined panel of metabolites, a targeted metabolomics study of breast, prostate, and colorectal cancers in a German population found that circulating levels of the phosphatidylcholine PC ae C30:0 and several lysophosphatidylcholines, including lysoPC a C18:0, were predictive of the development of any of these three cancers [
26], suggesting that some etiological mechanisms could be shared across multiple cancer types.
In this work, we extended this concept by leveraging targeted metabolomics data available within nested case-control studies on eight cancer types (breast, colorectal, endometrial, gallbladder and biliary tract, kidney, localized prostate and advanced prostate cancers, and HCC) previously acquired in the European Prospective Investigation into Cancer and Nutrition (EPIC) [
11,
12,
14,
15,
19]. The data-shared lasso [
27‐
29], a penalized multivariate approach specifically designed for the investigation of a set of shared risk factors across different disease outcomes, was used to carry out a multivariate pan-cancer analysis to identify mutually adjusted metabolites associated with cancer risk and to identify those metabolites with consistent or heterogeneous patterns of associations across the eight cancer types.
Discussion
Using available metabolomics data from eight cancer-specific matched case-control studies nested within the EPIC cohort, we investigated the relationship between pre-diagnostic blood levels of over one hundred metabolites and risks of breast cancer, colorectal cancer, endometrial cancer, gallbladder and biliary tract cancer, HCC, kidney cancer, and localized and advanced prostate cancers. In our main analysis, we found nine metabolites associated with cancer risk across different cancer types, suggesting the existence of shared metabolic pathways, as well as fourteen cancer type-specific associations. These identified associations were found to be robust after extensive sensitivity analyses: in particular, they were not attenuated after exclusion of the first years of follow-up, hence were less likely to be due to reverse causality, were not attenuated after adjustment for relevant cancer risk factors, were not modified by BMI, and did not deviate significantly from linearity. In additional analyses, in particular those based on bootstrap samples, we identified several additional metabolites possibly associated with the risk of specific cancer types or with cancer risk across different cancer types.
Our results suggested that concentrations of glycerophospholipids (phosphatidylcholines and lysophosphatidylcholines) could be linked to the risk of cancer overall as well as to specific cancer types. The role of glycerophospholipids in carcinogenesis is not fully understood but could be related to their documented anti-inflammatory properties, protection from oxidative stress, inhibition of cell proliferation, and induction of apoptosis [
50‐
52]. We observed a consistent inverse association between cancer risk with lysoPC a C18:2 as well as three clusters of phosphatidylcholines across all studied cancer types, except localized prostate cancer for which the association with lysoPC a C18:2 and one cluster of phosphatidylcholines was absent, or positive. An inverse association was previously reported between lysoPC a C18:2 with T2D in different studies [
7,
53] as well as with risks of breast, colorectal, and prostate cancers in the pan-cancer analysis conducted in the EPIC Heidelberg study [
26]. Our results regarding the three clusters of phosphatidylcholines were in line with many previously reported inverse associations between cancer and phosphatidylcholines [
11,
12,
15,
16,
20,
54]. Besides, we identified a positive association between the cluster that included PC aa C28:1 and cancer risk across all studied cancer types. This cluster also comprised PC ae C30:0, for which a positive association was reported with risks of breast, colorectal, and prostate cancers in the EPIC Heidelberg study [
26]. Cancer type-specific positive associations were found for the cluster containing PC aa C36:5 with breast cancer, PC ae C36:0 with colorectal cancer, and the cluster containing PC aa C40:2 with HCC. These three clusters were correlated with one another (Pearson correlation greater than 0.48), indicating that higher levels of these phosphatidylcholines might contribute to the development of these three cancer types.
We also observed robust associations between specific circulating amino acids and cancer risk. Our results suggested that proline was positively related to cancer risk across all studied cancer types, except breast cancer and possibly HCC (see Additional file
2: Fig. S5). A positive association between proline and prostate cancer risk was previously reported in EPIC [
12]. In addition, a drosophila model of high-sugar diet [
55] recently highlighted the possible role of proline in tumour growth, and proline was also found to distinguish colorectal cancer patients from those with adenomas [
56] and to be associated with metastasis formation [
57]. In the body, proline is generally synthesized via the glutamate/pyrroline 5-carboxylate pathway [
58]. Glutamate was also found to be positively related to the risk of all cancer types except for breast cancer in our analysis. Moreover, glutamate is formed from the degradation of glutamine, which was inversely associated with overall cancer risk. Although prior studies of the French E3N and SU.VI.MAX cohorts reported a positive association between glutamine and premenopausal breast cancer [
59,
60], our results regarding glutamine and glutamate were consistent with those of many previous studies that reported inverse associations between glutamine and risk of colorectal cancer [
18], HCC [
19,
61], and T2D [
7,
25] and positive associations between glutamate and risk of premenopausal breast cancer [
60], kidney cancer [
15], HCC [
19,
61], and T2D [
7]. Lower serum levels of glutamine were also observed in kidney cancer [
62] and ovarian cancer [
63] cases compared to controls. Glutamine is an energy substrate for cancer cells and makes a major contribution to nitrogen metabolism. Alterations in glutamine-glutamate equilibrium often reflect energetic processes related to cancer metabolism [
64]. It is possible that altered levels of glutamine and glutamate in individuals subsequently diagnosed with cancer may reflect ongoing metabolic processes related to cancer development and as such may serve as an early biomarker of cancer risk. However, the inverse association between glutamine levels and overall cancer risk observed in our analysis was only slightly attenuated after excluding, in turn, the first 2 and the first 7 years of follow-up suggesting that changes in the glutamine-glutamate may precede cancer development.
Our analysis additionally identified two positive and two inverse cancer type-specific associations with circulating amino acids. We observed an inverse association between colorectal risk and the cluster containing histidine, for which previous studies reported inverse associations with risks of colorectal cancer and T2D [
25], while a positive association was reported with breast cancer [
60]. Also, lower serum levels of histidine were previously reported in ovarian cancer cases compared to controls [
65]. Our results further suggested an inverse association between endometrial cancer risk and the cluster composed of glycine and serine, in line with previous results from the EPIC cancer-specific study of endometrial cancer [
14]. Previous studies also reported inverse associations between glycine and/or serine with risks of T2D [
25]. Finally, our analysis suggested a positive association between arginine with risks of colorectal and kidney cancers (Table
4). Arginine plays a key role in nitric oxide production and polyamines synthesis [
66]. Both have been found to be associated with tumour growth, with polyamines enhancing it and nitric oxide inhibiting it. Arginine’s influence on tumour growth thus might be related to the relative activity of those two pathways. For instance, arginine was previously found to be positively associated with breast cancer in the E3N cohort [
60], while an inverse association with breast cancer was reported in EPIC [
11].
Regarding the biogenic amines, we found a positive association between serotonin levels and colorectal cancer risk, consistent with previous results from the CORSA case-control study and a previous EPIC analysis of colon cancer [
67]. We also found a consistent inverse association between spermine and the risk of the eight studied cancer types. Like other polyamines, spermine is involved in cell proliferation and differentiation and has antioxidant properties [
68], and dysregulation of polyamine metabolism is characteristic of multiple types of tumours [
69]. It was previously reported that polyamine supplementation, in particular spermidine, which acts as an intermediate in the conversion of putrescine to spermine, could be related to reduced overall and cancer-specific mortality [
70‐
72].
In our analysis, localized and advanced prostate cancers were considered as two different outcomes as previous results suggested that metabolic dysregulation might be predictive of advanced or aggressive prostate cancers only [
12]. In fact, we observed some differences between the metabolites associated with risks of localized and advanced prostate cancers, respectively. Specifically, and as previously reported [
12,
13], our results suggested that hexoses, glycerophospholipids, octadecenoylcarnitine (acylcarnitine C18:1), and/or octadecadienylcarnitine (acylcarnitine C18:2) could help differentiate the respective mechanisms involved in the development of aggressive and localized prostate tumours. On the other hand, the positive association with decanoylcarnitine (acylcarnitine C10), which was observed with risk of all cancer types, and in particular with both localized and advanced prostate cancer risk, was notably attenuated when including the unknown stage prostate cancer pairs: it was only detected in 44% of the bootstrap samples generated from that extended sample (see Additional file
2: Table S2), in line with the inverse association between decanoylcarnitine and unknown stage prostate cancer that was observed in 80% of the samples (Additional file
2: Table S3). Overall, these results suggested that the positive association between decanoylcarnitine and prostate cancer identified in our main analysis might not be real and might be due to an association between decanoylcarnitine and cancer stage missingness in our prostate cancer study.
Some metabolites identified in our study were previously associated with established cancer risk factors, such as obesity [
33,
34]. In particular, a recent metabolomics study of BMI reported inverse associations with glutamine, lysophosphatidylcholine a C18:2, and phosphatidylcholine PC aa C38:0 (which was clustered with PC aa C36:0 in our analysis) and a positive association with glutamate. Directions of the associations with BMI were consistent with those identified in our study with cancer risk after adjustment for BMI, indicating that these metabolites might be mediators of the obesity-cancer relationship.
Our study has several strengths. First, it relied on a large sample of pre-diagnostic metabolomics data acquired among 5828 case-control pairs in nested studies on eight cancer types within a large prospective cohort, on average 6.4 years before cases developed cancer. Second, in a context where some metabolites might be predictive of cancer risk for multiple cancer types, the data-shared lasso used in our analysis automatically accounted for or ignored cancer types when assessing the association between each metabolic feature with cancer risk, depending on whether heterogeneity among the cancer type-specific associations was supported by the data for that particular feature. The comparison of results produced by the standard univariate analyses and the data-shared lasso illustrated the interest of the latter. First, the data-shared lasso benefited from the increased statistical power of the pooled analysis for the identification of metabolites that could be involved in cancer development for multiple cancer types: for example, butyrylcarnitine (acylcarnitine C4) was not associated with cancer risk in any of the cancer type-specific univariate analyses, while it was in the univariate pooled analysis and in the data-shared lasso analysis. Moreover, unlike the simple pooled analysis, the data-shared lasso would not necessarily mask cancer type-specific associations: for example, the data-shared lasso identified a positive association between the cluster containing tetradecenoylcarnitine (acylcarnitine C14:1) and breast cancer risk, as the univariate analysis of the breast cancer study did, while the univariate pooled analysis could not. Another key difference between the standard univariate analyses and the data-shared lasso is that the latter allowed the investigation of mutually adjusted associations, hence the identification of metabolites or clusters of metabolites whose association with cancer risk could not be explained away by other metabolites included in our analysis. Furthermore, mutual adjustment revealed associations that could not be detected in minimally adjusted models, such as the one between arginine and colorectal cancer risk, which was not apparent in models not adjusted for glutamine and histidine. Another strength of our study stemmed from the extensive sensitivity analyses that we carried out.
On the other hand, identifying cancer risk factors is particularly challenging when candidate risk factors are strongly correlated with one another. Here, we clustered the most strongly correlated metabolites together prior to applying the data-shared lasso. As a sensitivity analysis, the data-shared lasso was applied to the original set of 117 metabolites, thus ignoring the clustering step, and the results were largely consistent with those of our main analysis (Additional file
2: Fig. S7). Moreover, because strong correlations remained among some of the metabolites produced by the hierarchical clustering (Additional file
2: Fig. S8, Additional file
2: Fig. S9), we applied the data-shared lasso to multiple bootstrap samples to gauge the robustness and specificity of the associations identified in our main analysis. Although most of the identified associations were replicated in a large proportion of bootstrap samples, a few of them were less robust, hence more questionable. For example, the identified inverse association between HCC risk and the cluster that included lysoPC a C20:3 was replicated in 32% of the bootstrap samples only. This lack of robustness could be due to the strong correlation between this cluster and the other three studied metabolites related to lysoPCs (Pearson correlation greater than 0.65; Additional file
2: Fig. S8). As a matter of fact, an inverse association between HCC risk and at least one of the four metabolites related to lysoPCs was identified in 78% of the bootstrap samples. Overall, these results were suggestive of a stronger inverse association with features related to lysoPCs for HCC compared to the other cancer types, but our analysis failed to unambiguously identify which specific lysoPCs might underlie this stronger inverse association. An additional limitation for interpreting the lipid results is the lack of specificity for lipids measured with the AbsoluteIDQ p180/p150 kits as a result of the FIA method [
73,
74], which does not allow for unambiguous identification of the compounds measured since the signal observed could correspond to several compounds. Moreover, the limited sample size for some of the studied cancer types (in particular, gallbladder and biliary tract cancer and HCC) was a limitation for the identification of cancer type-specific deviations. In this respect, we complemented our analysis by the inspection of estimates computed under models derived from the one identified by the data-shared lasso but that further allowed fully type-specific associations (Additional file
2: Fig. S5). Another potential limitation of our study was the lack of repeated measurements, yet previous studies suggested that blood levels of metabolites were relatively stable and that a single measurement might be sufficient to capture medium-term exposure [
75‐
77].