Background
Colorectal cancer is the third most common cancer for both sexes, with an estimated number of 153760 new cases and 52180 estimated deaths in the United States in 2007 [
1]. Despite advances in diagnosis and surgical treatment of colorectal cancer, incidence as well as mortality of colorectal cancer has decreased only slightly in the last twenty years. While the disease is curable in early stages, the risk of recurrence and metastasis is substantially higher for advanced tumors.
In addition to genetic alterations, the development of colorectal cancer has been linked to a dysregulation of energy homeostasis which is induced by hyperalimentation and adipositas as well as to influences from nutritional intake [
2‐
6]. Recently it has been suggested to use this metabolic deregulation as the basis for new therapeutic interventions [
7].
While genetic alterations have been extensively characterized in colon cancer, the metabolic remodeling that occurs downstream from genomic and proteomic alterations have not been analyzed to a great extent, so far. This is in part due to the fact that low molecular metabolites are comparably difficult to analyze in a large-scale -omics approach. Comparably new technologies allow comprehensive and quantitative investigation of a multitude of different metabolites, which is called "metabolomics" in analogy to the terms "transcriptomics" and "proteomics" [
8‐
10] currently, the analysis by gas chromatography (GC) coupled with time-of flight (TOF) mass spectrometry is used as a standard method for studying primary metabolism [
11‐
13]. While these methods are state-of-the-art in the analysis of plant metabolism-, the translation to cancer research for the characterization of tumor types in molecular pathology is still a challenge [
14].
For cancer research, the ultimate aim is to define metabolic profiles in normal, precancerous and cancerous tissues and to link the dysregulation of tumor metabolism to clinical outcome and treatment response. To reach this aim, three steps are necessary. First, it is necessary to demonstrate that metabolite levels can be determined in human tissue samples that were collected during routine surgical procedures. Second, it is necessary to identify as many metabolites as possible, to go beyond a simple pattern analysis towards an understanding of the functional alterations of metabolism in tumor tissue. Third, we have to develop tools that allow us to easily interpret metabolic data by connection with the existing knowledge on metabolic pathways that is described in databases such as the Kyoto encyclopedia of genes and genomes (KEGG) database [
15].
In human samples, there may be general differences in metabolism that are related to the genetic background or – more likely – to the nutritional intake of the individuals studied. It is therefore not clear, if the metabolic differences between tumor tissue and normal tissue are strong enough to be detected despite the different genetic, metabolic and nutritional background of individual patients.
Therefore, it was one aim of this study to investigate if a set of paired samples of normal colon tissue and colorectal cancer tissue from individual patients can be used for metabolic profiling with GC-MS to detect and interpret molecular changes in tumor tissue and to detect metabolic patterns associated with different biological entities.
We evaluated different bioinformatical strategies to detect those metabolites that are differentially expressed in normal tissue and cancer tissue. To facilitate the interpretation of the results with respect to changes in metabolic pathways, we have developed a new method that projects the metabolite interactions from the multidimensional KEGG interaction lattice to a one-dimensional axis (PROFILE). This method uses the relational information from all metabolic pathways described in the KEGG database but focuses on those metabolites and reactions that can be observed in our investigation and thus builds the bridge to functional interpretation of the metabolomic changes in colon cancer.
Methods
Study Population and histopathological examination
For GC-TOF analysis 45 colon samples were examined at the Institute of Pathology, Charité Hospital, Berlin, Germany. The tissue specimens included 27 primary colon carcinomas and 18 normal mucosa samples. For 15 cases paired samples of cancer tissue and normal tissue were available from the same patient. The tissues were dissected by a senior pathologist in the operating room and was immediately frozen in liquid nitrogen and stored at -80°C. Additional H&E sections were performed for histopathological evaluation.
GC-TOF analysis
Fresh-frozen biopsy tissues (approximately 5 mg fresh weight) were prepared by grinding in 2 ml Eppendorf tubes for 30 s at 25 s
-1 using 3 mm i.d. metal balls in a MM300 ball mill (Retsch, Germany). Subsequent extraction was carried using 1 ml of a one phase mixture of chloroform:methanol:water (2:5:2, v/v/v) at -20°C for 5 min. Tubes were centrifuged for 30 s at 14,000
g and the supernatant was collected and concentrated to complete dryness. Samples were derivatized for GC-TOF analysis as previously published [
11]. In order to avoid cross-contamination of samples, 1.5 μl of the derivatized solution was injected in the split less mode into a thermodesorption unit (DTD, ATAS GL, Zoetermeer, Netherlands) equipped with automatic exchange of liners and micro inserts. The sample was introduced at 40°C using a programmable temperature vaporization OPTIC3 injector (ATAS GL, Zoetermeer, Netherlands) and heated to 290°C using a 4°C/min ramp. Mass spectrometry analysis was carried out using an Agilent 6890 gas chromatography oven (Hewlett-Packard, Atlanta, GA, USA) which was coupled to a Pegasus III time of flight (TOF) mass spectrometer from Leco (St Joseph, MI, USA). A MDN-35 fused silica capillary column of 30 m length, 0.32 mm I.D. and 0.25 μm film thickness was used for separation using a start temperature of 85°C which was ramped by 15°C/min to 360°C. Mass spectra were acquired for a scan range of 83–500 m/z and an acquisition rate of 20 spectra per second. The ionization mode was electron impact at 70 eV. The temperature for the ion source was set to 250°C.
As an additional quality control we used stable isotope labeled cholesterol as internal standard for lipophilic and high boiling compounds for every sample as further control on analytical variation. The subsequent analysis of quality control charts of this deuterated cholesterol standard did not show trends over the duration of data acquisition.
Data processing and normalization
Two stages of raw data processing were employed. In the initial step, automated peak detection and mass spectral deconvolution was performed by the Leco ChromaTOF software (v2.32). For each sample, around 700 spectra were exported with absolute intensities (peak heights). These spectra were further processed by the in-house programmed database BinBase [
16]. All known artifact peaks such as internal standards, column bleed, plasticizers or reagent peaks were excluded from the result sheets. The algorithm underlying BinBase effectively removes inconsistent signals and noise peaks, yielding a total number of 206 metabolic signals that were reliably determined across the entire sample set. Of these peaks, 107 were structurally identified as known metabolites by comparison to mass spectra and retention indices of customized reference mass spectral libraries that were acquired with authentic standard compounds under identical data acquisition parameters. For metabolites that were below detection limits at a given sample, or for which mass spectra did not match the quality criteria underlying the BinBase algorithm, the data set resulted in missing values. In order to minimize the number of such missing values, only compounds were taken into account that were consistently detected in at least 85% of samples. The metabolite data were normalized relative to the sum of the 107 known metabolites in each sample and transformed to the log scale keeping the missing values in the data set. These missing values were further kept in the dataset for metabolite-wise analyses (t-test, fold change calculation). For collective analyses (PCA, clustering, classification) we have replaced missing metabolite measurements by the corresponding arithmetic means over all samples. The entire data analysis was performed within programming and visualization environment R [
17]. In additional to the normalization strategy described above, additional analysis were performed using the raw (unnormalized) data as input.
Principal component analysis (PCA)
Principal component analysis was performed using the function prcomp() in the R package stats. In principal component analysis the original set of metabolites is reduced to a new set of principal components that retain the variance-covariance structure of the data, but use lesser dimensions of data space. Out of the 45 principal components, the fist (PC1) as well as the second (PC2) turned out be significantly different between colon carcinomas and normal mucosa (Welch's t-test, Bonferroni corrected p-values). The values of the first two PCs were plotted with designation of the cases as colon carcinoma or normal mucosa. Further box plots served as visualisation of the discriminative power of the first and the second PC.
Alterations between colon carcinomas and normal mucosa were evaluated by thresholds on the fold change and Welch's t-test p-values. The results of three different selection procedures (p < 0.05, p < 0.01, p < 0.00024) were validated by repeated (n = 1000) random permutations of the samples. The false discovery rates (FDR) for the corresponding metabolite lists were estimated as the ratio (nexp/nobs) between the number of observed significant metabolites between colon carcinomas and mucosa tissues (nobs) and the number of metabolites that were expected to be significant by chance from the permutation distribution (nexp).
Prediction of the tissue type
Predictive models were derived in a two-step procedure consisting of a feature selection step followed by the proper construction of the classifier. We have applied the nearest centroid classification (NCC) and compared it to more complex methods like linear discriminant analysis (LDA) and linear support vector machines (SVM). We used an own implementation of the nearest centroid rule, the implementation of LDA in the R package MAS and the implementation of SVMs in the R package e1071. Feature selection was performed by ranking according to the result of Welch's t-test on the training set. The top 2, 3, ..., 206 metabolites were used as input for the construction of the classifier and classification results were monitored in dependence of the number of features. The classification results were obtained in a leave-one-out approach and reported separately the carcinomas and normal tissues. Beyond that we have studied the stability of the classification results under different choices of the training data. To this end we have employed a protocol similar to that in [
28] and trained the classifier on balanced data sets of different size.
Functional analysis of gene signatures by PROFILE clustering
Pro jection
f rom
i nteraction
l attic
e is a new method for the interpretation of metabolomic changes by the integration of pathway information. Based on the KEGG REACTION data base [cf. ref. [
15]] a metabolism network was build by joining two compounds A and B with an edge whenever there was a reaction that converted A into B. Such reaction was assumed to exist if and only if A and B where annotated as reaction pair with the qualifier "main" in KEGG REACTION. The final network consisted of 4206 metabolites that were annotated as "main" in at least one reaction pair. Based on these concepts, the bio-chemical distance between two metabolites was defined as the minimal length of a joining path within the network. The function johnson.all.pairs.sp() from the R package RBGL was used to calculate the bio-chemical distance between all pairs of the 4206 metabolites. The RBGL package is an interface to the Boost C++ library for graph algorithms.
This result was useful for the interpretation of metabolomic changes in colon tissue. Our study included 206 measurements of which 107 could be mapped to chemical compounds and metabolite names. Out of these compounds, 84 were registered in the KEGG database and 71 were a main reaction partner in at least one of the reactions annotated in KEGG REACTION. After projection to these 71 metabolites, the network turned out to be composed out of a single connection component, i.e. pair of metabolites could be joined by a path inside the network. Next we performed an agglomerative hierarchical clustering of these compounds with respect to the bio-chemical distance to project the complexity of the metabolomic network to an one-dimensional axis. In doing so the average linkage method was used to calculate distances between clusters. Finally, the fold changes between cancer and normal tissues were plotted against the functional axis.
Discussion
Our study demonstrates that large scale metabolic profiling using GC-TOF mass spectrometry and database annotation yields numerous significant differences between colon carcinoma and normal colon mucosa. We have utilized both unsupervised and supervised approaches to investigate these metabolic differences. The metabolite signatures are capable of predicting the status (normal tissue or colon carcinoma) of a previously unknown test tumor at sensitivity and specificity around 95%. Importantly, we could show that the classification results are robust against different choices of the classificator and the training set (figure
6). Regarding the extent of changes detected, it is important to note that – from a tumorbiological point of view – the comparison of normal tissue and carcinoma tissue means that two completely different entities are compared. Therefore we would expect that comparison of those two tissue types leads to a large set of completely different biomarkers. Similar results have been reported in gene expression analysis. For example in the study by Hlubek et al. [
23] 39% of transcripts were differentially expressed between colon tumor center and colon normal tissue. Since metabolites are regarded as an amplified output of a biological system, the expected metabolite changes could be even more prominent compared to genomics, as shown in our analysis.
Many of these metabolic events can be ascribed to known metabolic dysregulation in cancer thus validating the method itself. Metabolites involved in the citric acid cycle were generally found at lower amounts in cancer tissues compared to normal colon samples, in accordance with results published earlier [
24]. Purines were detected at increased levels in malignant tissues as indicator for higher capacity for DNA synthetic capacity. Similarly, we found almost all amino acids to be up regulated in carcinoma tissues, which may be interpreted as reflecting cellular needs for higher turnover of structural proteins. This finding is in agreement with earlier publications for select amino acids, notably glutamate and aspartate [
25]. Similarly, the high GABA contents had been described in colon cancer tissues in a previous study [
26]. Certain amino acids are synthesized by mammalian metabolic routes, often using TCA intermediates as precursor such as alpha-ketoglutarate for glutamate and its derived amino acids, and oxaloacetate for aspartate-derived amino acids. With higher needs in amino acids but lower use of the TCA cycle, an alternative route is needed to deliver carbon backbones for such TCA-derived intermediates. Such higher import may be accomplished by up regulation of amino acid transporter, facilitating higher cellular needs for energy metabolism as well as delivering carbon backbones for biosynthesis of cellular molecules. This interpretation is supported by our finding of increased levels in urea cycle intermediates in colon carcinoma tissues, indicating higher turnover of amino acids. Interestingly, beta-alanine was found as the most upregulated (f.c. = 4.9) metabolite in carcinoma tissues with very high statistical significance (p = 5.8e-13). In humans, beta-alanine is a unidirectional catabolic product from aspartate in a decarboxylation reaction (EC:4.1.1.15) or by catabolic routes from pyrimidine metabolism (EC:3.5.1.6). However, the eventual fate of beta-alanine in humans is yet unclear, since no enzymes are known that would transfer its backbone into acetyl-CoA or towards pantothenate metabolism, as it occurs in other species. We therefore suggest that beta-alanine might be important for metabolic alterations in colon cancer. In addition, we found that not all amino acids were up regulated in the same manner. In fact, the glutamate/glutamine ratio was greatly altered in comparison to normal colon tissue, indicating a lesser role of aminotransferase reactions utilizing glutamine or less need for transport of nitrogen across cells.
A limitation of this study was found in the need of normalizing the raw data to the total sum of known metabolites. The normalization strategy was developed in analogy to gene expression studies and was chosen because frozen tissue sections (as detailed in the methods section) should not be weighed on fine balances in order to preserve the cold chain and to prevent reactivation of metabolism prior to extraction. We found that the raw data for carcinoma tissues were significantly higher (p = 0.03) than those for normal tissue, relating to roughly a 33% increase in overall metabolic levels. However, this might be due to either a higher number of tumor cells per area of tissue or for generally enhanced metabolism. More detailed studies would be needed to address this question. In an additional validation using the raw (unnormalized) data as input the major metabolic differences between both tissue types could be detected, as well, suggesting that the major metabolite differences are not dependent on the normalization strategy.
The total time between surgery and freezing tissues was kept as minimal as possible due to the fact that the frozen section pathology laboratory was directly adjacent to the operating room. Nevertheless, clinical and pathological workflows do not allow for exact measures and timing of tissue dissection parameters for samples collected during routine surgical interventions. In addition, depending on the surgical technique there is a variable amount of intraoperative tissue ischemia due to surgical ligation of blood vessels. This fact may account for the inability to quantify glycolytic intermediates which have such a high turnover in non-frozen tissues that these are found to be depleted if metabolism is not immediately quenched after disruption of blood flow. To minimize unrelated technical noise related to surgical procedures we have chosen to compare tumor tissue and normal tissue collected during the same surgery.
We have used a metabolomic approach by GC-TOF mass spectrometry in order to gain a broad overview over primary metabolism at limited costs but at high sensitivity and selectivity. In total more than 100 compounds could be identified by chemical structure from as little as 5 mg fresh tissue, which compares favorably to reports using one dimensional 1H-NMR data acquisition. Specifically, we here demonstrate for the first time the efficacy of an automated annotation using a customized database approach. On the one hand, the BinBase database unambiguously identifies chemically or biochemically known compounds that are utilized for pathway mapping. On the other hand, the database also facilitates adding novel and potentially unique metabolic signals that yet are to be structurally identified but that nevertheless were often found to be differentially regulated at high significance levels. These compounds are stored in the database by unique identifiers combining mass spectra and retention index information that enable re-using these database entries for later studies aimed at validating initial biomarkers or at structural identification of these metabolic signals.
In the study presented here, we have focused on using information from identified compounds by developing and applying a new biochemical mapping method, PROFILE. This method maximizes the interpretability of results by facilitating physiological and biochemical understanding of metabolic alterations in carcinoma. Specifically, PROFILE leads to simplified output of results than mapping on single pathway maps from KEGG which would focus on a small number of select metabolites rather than taking into account the relative distances of metabolites across the metabolic network.
As a conclusion, our results show that metabolic signatures as well as individual metabolites can be detected from fresh-frozen tumor tissue of colon cancer and that these alterations can be linked to relevant biochemical pathways. Based on our results, we suggest that metabolomics is a promising approach complementary to transcriptomics and proteomics for analyses of changes in the malignant phenotype. As metabolites constitute the amplified output of a biological system, their quantitative and qualitative analysis will be relevant for tumor biology in different types of investigations. Databases such as the one presented here will enable comparisons of findings across studies and laboratories. Metabolomics can be used for biochemical classification of different tumor types and for comparison of malignant tumors with their corresponding normal tissue. Recently, it has been suggested that therapeutic approaches directed against metabolic abnormalities may be useful in the treatment of malignant tumors [
27,
28]. In this context, the metabolic profiling approach described here may be useful to monitor the complex changes in tumor metabolism that may occur under these treatments. Furthermore, analysis of metabolic alterations may be used as a new method for molecular pathology to develop classifiers for therapy response prediction, which may ultimately lead to the identification of new prognostic markers. With the combination of advanced instrumentation, standardized database algorithms and the development of tools for interpretation of data, our study provides a methodological basis for these further investigations.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
CD conceived the study and participated in its design, participated in the statistical evaluation, performed the histopathological evaluation and helped to draft the manuscript. JB performed the profile clustering and the statistical evaluation and was involved in drafting the manuscript. WW carried out the design of the study and performed the histopathological evaluation. GW, MS, and TK performed GCTOF analysis and the identification of metabolism. SN, AN, and AB helped to draft the manuscript and participated in the histopathological evaluation. MD participated in the design of the study. OF participated in the design of the study, performed statistical analysis and did GCTOF analysis and performed the identification of metabolism. All authors read and approved the final manuscript.