Introduction
Epithelial-mesenchymal transition (EMT) is a cellular process that allows transdifferentiation of cells with a polarized epithelial phenotype to gain mesenchymal characteristics. It is a highly coordinated process that is regulated at genetic, epigenetic and protein levels by different regulators [
1‐
3]. Epithelial cells show inherent plasticity that covers a range of changes in cellular behaviour and differentiation characteristics with epithelial integrity at one end and a complete mesenchymal transition on the other end [
4]. Epithelial cells may simultaneously express varying levels of both epithelial and mesenchymal characteristics depending on the tissue and signalling context, exhibiting a partial EMT phenotype and exist in an intermediate cell state [
5,
6]. In our previous study, we employed an EMT scoring method to compute the generic EMT scores from transcriptome datasets. Our study revealed intermediate EMT phenotype in circulating tumor cells (CTCs) across cancers [
7]. In our recent work, we have identified 5 categories of CTCs ranging from E (exclusively) to E > M, E = M, M > E and M (exclusively) suggesting dynamic changes in epithelial and mesenchymal composition supported by other published work in the field [
8,
9]. Thus, it is of paramount importance to understand the EMT spectrum in cancers.
Several signalling cascades and downstream transcriptional regulators such as SNAIL, TWIST and ZEB are known to be associated with EMT [
10,
11]. Advanced technology and cell biology-based approaches have immensely improved our understanding of molecular mechanisms of EMT over the past decades [
12]. Nevertheless, such approaches are usually restricted in the number of targets that can be simultaneously monitored. High throughput technologies such as transcriptomics dominated the investigation of EMT models in numerous studies [
13‐
16]. However, mRNA levels estimation may not correlate with protein expression due to a range of post-translational regulations [
17‐
19]. Thus, investigating protein expression changes that are associated with changes in cellular phenotype would provide us an exceptional understanding of mechanisms and functionalities related to EMT.
Mass spectrometry and antibody arrays have been used to assess protein expression dynamics. Although mass spectrometry-based proteomics studies offered us to estimate quantative differential expression of many proteins associated with EMT process under different biological contexts [
20,
21], these platforms are limited in their range and sensitivity as well as their ability to consistently detect the absolute protein quantification [
22‐
24]. Thus, establishing a robust method to effectively monitor proteomic changes associated with EMT is essential for further understanding of the complex regulation involved in EMT.
In previous studies most of the approaches employed either the transcriptomics analysis or the mathematical modeling and were focused on classifying the dynamic state of the cellular phenotypes [
25,
26]. However in this study we intended to identify the global changes at the protein levels using parallel reaction monitoring (PRM)-based targeted proteomics assay as a tool for the absolute quantification of the proteins involved in these dynamic changes. A mass spectrometry-based, targeted proteomics strategy would be relatively fast and highly reproducible [
27,
28]. This method allows quantification down to attomole range in a straightforward way without any prior enrichment or fractionation approaches [
27,
29]. We observed the relative expression of the established panel of EMT-related proteins that distinguishes between epithelial and mesenchymal cellular phenotypes. Most of the cell lines showed synergism between protein expression and gene expression. However, some cell lines exhibited distinguished protein expression compared to gene expression. Further, our study also showed that this method can be applied to tumor tissues as well for the characterization of tumor phenotype.
Discussion
EMT is a dynamic change in cellular architecture that leads to changes in cell migration and invasion. Its role has been well documented in developmental process and closely associated with tumor dissemination and metastasis [
43,
44]. Several genetic, epigenetic, and proteomic regulators are known to coordinate this highly complex process. Various studies have reported the gain and loss of cellular protein components related with EMT. For example loss of expression of epithelial marker E-cadherin is regulated by differential expression of transcriptional repressors such as SNAI1/2, ZEB1/2,TWIST1/2 etc. [
2]. A comprehensive study using the transcriptomics data by Tan et al. showed the interplay between EMT across cancer types [
25]. They established a method to compute EMT score using published EMT signatures. Similar effort to define and predict EMT phenotype based on scoring matrices using transcriptomics data was published by Guo et al
. [
45]
, and George et al
. [
46]. Another study by Mak et al. derived pan-cancer EMT gene signature that encompasses core EMT markers functioning across different tumors and calculated EMT score for 11 available distinct tumor types datasets [
47]. However, these approaches lack the assessment of EMT at the protein level. In the present study, we aim to develop a method based on a targeted proteomics approach to assess the expression of a panel of EMT-related markers across different cancer types. We employed parallel reaction monitoring (PRM) based targeted proteomics strategy to quantify EMT markers. The established proteome panel and the targeted method in our study will help to monitor changes in EMT expression profile and characterization of tumor phenotype. PRM allows selective targeting of predefined precursor ions for fragmentation. Signal abundance of fragment ions indicates abundance of corresponding peptides in each sample. Proteotypic peptides from EMT markers were selectively targeted and monitored across samples. This strategy allowed quantification of EMT markers with high accuracy. To this effect, we curated a panel of 37 proteins belonging to molecular classes such as transcription factors, cytoskeletal proteins, and cell adhesion molecules. Gene ontology-based classification of biological processes associated with these proteins demonstrated that they are associated with EMT-related processes such as escape from programmed cell death, epithelial cell differentiation and cell migration etc. To the best of our knowledge, this is the first effort to define absolute quantification of the proteins involved in EMT event.
We have also analyzed pan-cancer transcriptomics data from 1037 cell lines in the CCLE database [
30]. Organization of cells based on their transcriptome profile on t-SNE maps showed that the cell lines clustered largely according to their tissue of origin irrespective of the oncogenic transformation. Similar results were reported by Koplev et al
. at both transcript and proteome levels [
42]. A false coloured t-SNE map of cell lines based on epithelial or mesenchymal gene expression demonstrated that cell lines showing high expression of epithelial genes show a low expression of mesenchymal genes and vice versa. Besides, these cell lines are also organized in two distinct clusters based on the expression of either epithelial or mesenchymal genes alone. Koplev et al. have also demonstrated similar bimodal segregation of cell lines based on the expression of E-cadherin at both the protein and transcript level [
42]. These results indicate that the expression of epithelial and mesenchymal genes play a deterministic role in defining cellular phenotype across cancer types, irrespective of the tissue of origin.
The advent of advanced high throughput proteomic techniques has made it possible to study cellular proteome in context to cellular plasticity. Since then it has been repeatedly noted that transcriptome and proteome abundances do not correlate adequately to be considered as proxies for each other [
36‐
38]. The discordance of the data at transcriptome or at proteome levels could be because of the post-translational regulations of cellular proteins. However, large-scale proteomic data sets akin to the CCLE transcriptome data are not available for the expression of EMT-related proteins, to enable the study of their association with cellular phenotype and corresponding changes under different cellular contexts. Thus, effective methods to monitor changes in proteins related to EMT are needed to elucidate these cellular processes.
To this end, we have developed a PRM-based targeted proteomics method for the quantitative evaluation of several proteins related to EMT. We observed a higher abundance of epithelial phenotype-related proteins in known epithelial cell lines such as MCF7, Cal27 and FaDu along with a lower abundance of mesenchymal related proteins. Similarly, we observe a lower abundance of epithelial phenotype proteins in mesenchymal cell lines such as MDAMB231, J82 and UMUC3. These observations confirm that these cell lines generally exhibit a differentiating pattern of expression of EMT related proteins based on their cellular phenotype. Further, we observe a similar PCA-based clustering of both epithelial and mesenchymal cell lines into 2 distinct groups related to their phenotype and EMT scores with either transcriptome or PRM-based targeted proteomics data. This indicates that the PRM-based targeted proteomics data is largely concordant with the EMT scoring matrices that are based on transcriptomics.
Further, we observed clustering of gall bladder cancer cell lines (G415, NoZ, and OCUG1), which are not represented in the CCLE transcriptome database, with other mesenchymal cell lines. OCUG-1 and NOZ have been characterized as moderately invasive cell lines while G-415 has been characterized as highly invasive [
48,
49]. In contrast, we observed clustering of the A549 lung adenocarcinoma cell line with epithelial cell lines in proteomics data and mesenchymal cell lines in transcriptomics data. Tan et al. has also assigned this cell line a score of 0.37 using their EMT scoring matrix indicating a mesenchymal phenotype. However, this cell line is known to be an epithelial cell line based on multiple reports of its non-invasive characteristics, along with the expression of epithelial markers such as E-cadherin [
50,
51]. Our findings may underpin the propensity of these cells for EMT induction heterogeneity and plasticity associated with therapy resistance [
52]. Further, we also observed clustering of SW780 and VMCUB1 cells, which had EMT scores of − 0.6 and − 0.23, respectively, with mesenchymal cell lines in the proteomics data. However, these cell lines clustered along with other epithelial cell lines based on transcriptomics data. Interestingly, SW780 and VMCUB1 have shown higher migration capability and a moderately invasive nature compared to RT112; an epithelial cell line [
53]. Further, only VMCUB1 cell line has been reported to undergo EMT upon lentiviral transduction of HDAC5 or overexpression of lncRNA HOTAIR compared to other epithelial urinary bladder cancer cell lines such as RT112 and 5637 [
54,
55]. Indicators of EMT are also observed in certain bladder cancers in vivo
, including cancers progressing from basal-squamous molecular subtype exemplified by cell lines such as VMCUB1 [
56]. Further, we observed a low correlation score between mRNA and protein expression in A549, VMCUB1 and SW780 cell lines compared to both epithelial (RT112 and MCF7) and mesenchymal (UMUC3 and MDAMB231) cell lines. Our observation thus reflects that certain subtle changes related to EMT might be more visible at the protein level and may be useful in complementing the insights available from other omics data.
Cytokeratins are structural proteins that enable cellular integrity. Downregulation of the KRT8/KRT18 keratin pair is known to induce an increase in cell motility and invasion [
57]. We observed higher protein abundance of keratins 8 and 18 in epithelial cell lines RT112 and MCF7 compared to mesenchymal cell lines such as UMUC3 and MDAMB231. Further, we observed separate clustering of mesenchymal and epithelial cell lines for these keratins at the protein level but not at the mRNA level. We also observed discordance between proteins and mRNA abundance for SW780 and VMCUB1 cell lines relative to other epithelial cell lines such as RT112 and MCF7, where the protein abundance of these epithelial markers was more in line with the cellular phenotype of moderate invasiveness and higher migration capabilities. Similarly, we observed SW780 and VMCUB1 expressing higher abundance of the mesenchymal protein vimentin closer to the range shown by mesenchymal cell lines UMUC3 and J82. Interestingly, the mRNA abundance of vimentin in these cell lines is higher than other epithelial cell lines but lower than the mesenchymal cell lines. Our observation suggests the significance of quantitating protein abundances to predict the cellular plasticity with respect to the epithelial/mesenchymal/hybrid states.
To further explore how EMT-related proteins are expressed in clinical samples across multiple cancer types we analyzed quantitative proteomics data from the CPTAC database [
41]. Based on the proteome profile, tumor samples from different cancers organized into tight clusters according to their tissue of origin, indicating that akin to cell line samples, tumor samples also retain their molecular and cellular identity irrespective of oncogenic transformation. Further, we observed that tumor samples were organized in distinct clusters based on the expression of epithelial and mesenchymal proteins. With respect to tissue of origin we observed that clear cell renal cell carcinoma (ccRCC) samples clustered in region with low expression of epithelial proteins and high expression of mesenchymal proteins based on the expression of epithelial and mesenchymal proteins respectively. Transcriptomics data from CCLE also showed that renal cancer cells show high mesenchymal gene expression and low epithelial gene expression. These results indicated a mesenchymal phenotype for renal carcinoma samples. Similarly, we observed that colorectal cancer tumor samples primarily clustered in the regions of high epithelial protein expression and low mesenchymal protein expression indicating an epithelial feature of these samples. Tan et al
. has also reported a similar finding for colorectal cancer in terms of both tumor tissue and cell line samples, based on transcriptomics data. They have further hypothesized that these features of certain cancer types exhibiting epithelial or mesenchymal characteristics may be associated with embryonic ectodermal or mesodermal origins of these organs [
25]. Thus, our analysis emphasizes the importance of proteomic analysis compared to the transcriptome-only approaches. Overall, we demonstrated that the expression of EMT-related genes is associated with the oncogenic transformation of cancer cells in both cell line models as well as tumor samples. We further showed that protein abundance data can be leveraged in addition to gene expression data to elucidate complex phenomena underlying EMT as well as its correlation with cancer progression and chemotherapeutic resistance. We believe the targeted proteomics strategy employed in our study can be used as a general-purpose tool for accurate estimation of EMT, and could be used to more accurately determine the impact of EMT effectors or drugs and assess changes in cellular phenotype.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.