Background
Eukaryotic ribosomes are among the most highly evolutionarily conserved organelles, comprised of four ribosomal RNAs (rRNAs) and approximately 80 ribosomal proteins (RPs). Responsible for translating mRNA into proteins, ribosomes were long believed to be nonspecific “molecular machines” with unvarying structures and function in different biological contexts. Recent evidence has shown, however, that some individual RPs are expressed in tissue-specific patterns and can differentially contribute to ribosome composition, affect rRNA processing, and regulate translation [
1]. Despite the complexity of RP assembly in ribosomes, early studies of ribosome function revealed that the catalytic activity responsible for peptide bond formation might depend only on the presence of rRNAs and a small number of core RPs [
2]. This finding, in conjunction with the observation that some RPs are expressed in a tissue-specific manner, has led to speculation that one purpose for the evolutionary emergence of RPs may have been to confer translational specificity and adaptability [
1,
3].
An increasing body of evidence continues to show that RPs do, in fact, have important roles in imbuing ribosomes with mRNA translational specificity. During embryonic development, RPs are expressed at different levels across tissue types, and loss of RPs due to mutation or targeted knockdown produces specific developmental abnormalities in plants, invertebrates, and vertebrates. The tissue-specific patterning that occurs as a consequence of individual RP loss suggests that some RPs serve to guide the translation of specific subsets of transcripts in order to influence cellular development. While the mechanism(s) by which RPs confer translation specificity are not entirely known, one may involve the alteration of ribosome affinity for transcripts with specific
cis-regulatory elements, including internal ribosome entry sites (IRES) elements and upstream open reading frames (uORFs) [
1].
RPs also participate in a variety of extra-ribosomal functions. In normal contexts, ribosome assembly from individual rRNAs and RPs is a tightly regulated process, with unassembled RPs undergoing rapid degradation. Disruption of ribosomal biogenesis by any number of extracellular or intracellular stimuli induces ribosomal stress, leading to an accumulation of unincorporated RPs. In some cases, these free RPs may participate in a variety of extra-ribosomal functions, including the regulation of cell cycle progression, immune signaling, and cellular development. Many free RPs bind to and inhibit MDM2, a potentially oncogenic E3 ubiquitin ligase that interacts with and promotes the degradation of the TP53 tumor suppressor. The resulting stabilization of TP53 triggers cellular senescence or apoptosis in response to the inciting ribosomal stress. Additional extra-ribosomal functions of RPs are numerous, and have been recently reviewed [
4,
5].
Given their role in regulating gene translation, cellular differentiation, and organismal development, it is perhaps unsurprising that altered RP expression has been implicated in human pathology. Indeed, an entire class of diseases has been shown to be associated with haploinsufficient expression or mutation in individual RPs. These so-called “ribosomopathies,” including Diamond-Blackfan Anemia (DBA) and Shwachman-Diamond Syndrome (SDS), are characterized by early onset bone marrow failure, variable developmental abnormalities and a life-long cancer predisposition that commonly involves non-hematopoietic tissues [
6,
7]. The loss of proper RP stoichiometry and ensuing ribosomal stress result in increased ribosome-free RPs, which bind to MDM2 and impair its ubiquitin-mediated degradation of TP53 [
6,
8‐
10]. The resulting TP53 stability is believed to underlie the bone marrow failure that affects the erythroid or myeloid lineages in DBA and SDS, respectively. The developmental abnormalities of the ribosomopathies are variable and associate with specific RP loss or mutation. For example, RPL5 loss in DBA is specifically associated with cleft palate and other craniofacial abnormalities whereas RPL11 loss is associated with isolated thumb malformations [
11].
Ribosomopathy-like properties have also been observed in various cancers. We have recently shown that RP transcripts (RPTs) were dysregulated in two murine models of hepatoblastoma and hepatocellular carcinoma (HCC) in a tumor-specific manner and in patterns unrelated to tumor growth rates [
12]. These murine tumors also displayed abnormal rRNA processing and increased binding of free RPs to MDM2, reminiscent of the aforementioned inherited ribosomopathies.
Perturbations of several individual RPs have been found in numerous human cancers, including those of the breast, pancreas, bladder, brain and many other tissues [
13‐
25]. Mutations and deletions of RP-encoding genes have also been found in endometrial cancer, colorectal cancer, glioma, and various hematopoietic malignancies [
26‐
28]. Indeed, the Chr. 5q- abnormality associated with myelodysplastic syndrome and the accompanying haploinsufficiency of RPS14 is considered one of the prototype “acquired” ribosomopathies that are often classified together with DBA, SDS and other inherited ribosomopathies [
6]. Although many free RPs can induce cellular senescence during ribosomal stress via the MDM2-TP53 pathway, not all RPs possess such tumor suppressor functions. RPS3A overexpression, for example, actually transforms NIH3T3 mouse fibroblasts and induces tumor formation in nude mice [
29].
A recent attempt to summarize the heterogeneity of RPT expression in human cancers was limited to describing expression differences of single RPTs among cancer cohorts, without accounting for larger patterns of variation that might better distinguish tumors from one another [
3]. RPT expression patterns were, however, examined in normal tissues using the dimensionality-reduction technique Principal Component Analysis (PCA) in the aforementioned study. These results provided hints of cell-specific patterning in the hematopoietic tissues examined, but not all cell types clustered into obviously distinct groups.
In the current work, we leverage a machine learning technique known as
t-distributed stochastic neighbor embedding (
t-SNE) to identify distinct patterns of global RPT expression across both normal human tissues and cancers. Like PCA,
t-SNE is a dimensionality reduction technique used to visualize patterns in a data set [
30]. With either technique, patterns shared between data points are represented with clustering. However,
t-SNE differs from PCA in that it performs particularly well with highly dimensional data and is able to distinguish non-linear relationships and patterns. With this technique, we show that virtually all normal tissues and tumors can be reliably distinguished from one another based on their signature RPT expression patterns. Tumors differ from normal tissues, but retain sufficient remnants of normal tissue patterning to allow for their origin to be easily discerned. Finally, we show that a number of cancers possess subtypes of RPT expression patterns that correlate in readily understandable ways with molecular markers, various tumor phenotypes, and survival.
Discussion
By investigating expression patterns of individual RPTs and utilizing more traditional and less powerful linear forms of dimensionality reduction such as PCA, previous studies have found modest evidence of tissue-specific patterning of RPT expression in some normal tissues and even less evidence in malignant tumors [
3]. The failure to reproducibly identify recurrent and convincing patterning is presumably due to the complex regulation of RPT expression and the fact that many of the RPT relationships are non-linear. As shown here, however, the machine learning algorithm
t-SNE provides a more elegant and robust dimensionality reduction that better highlights the distinct underlying patterns of RPT expression in both tumors and the normal tissues from which they originate.
Consistent with the more restricted and tentative conclusions of previous findings, our results using
t-SNE clearly demonstrate that RPT expression patterns are not only tissue-specific but provide the ability to define tissue and tumor differences with a heretofore unachievable degree of resolution. The small cluster of 77 neoplasms that did not associate with their respective tissue clusters (Additional file
1: Figure S4) may represent either a subset of tumors that have lost control of their underlying tissue-specific expression patterns or that originated from a minority subpopulation of normal cells whose RPT expression is not representative of the remainder of the tissue.
In addition to their tissue-specific patterning, virtually all tumors showed perturbations of RPT expression that readily allowed them to be distinguished from the normal tissues from which they originated. For some cancers, the tumor-specific patterning of RPT expression was relatively homogeneous and could not otherwise be subcategorized. Most cohorts, however, were comprised of subgroups of tumors with distinct RPT expression patterns, all of which nonetheless remained distinguishable from normal tissue. The fact that many of these patterns correlated with molecular and clinical features implicates RPT expression patterns in tumor biology.
Aside from potentially altering translation, the notion that altered RP expression might influence the behaviors of both normal tissues and tumors is not new. In the ribosomopathies, the binding of any one of about a dozen RPs to MDM2 with subsequent stabilization of TP53 is thought to underlie the bone marrow failure that accompanies these disorders [
6,
9,
10]. It has been proposed that subsequent circumvention of this TP53-mediated senescence by mutation and/or dysregulation of the p19
ARF
/MDM2/TP53 pathway is responsible for the propensity for eventual neoplastic progression [
38]. In cancers, the binding of free RPs to MDM2 has been shown to mediate the response to ribosomal-stress-inducing chemotherapeutics such as actinomycin D and 5-fluorouracil [
20,
39,
40].
Individual RPs have also been associated with specific tumor phenotypes. For example, RPL3 expression is a determinant of chemotherapy response in certain lung and colon cancers. RPL3 also associates with the high-risk neuroblastoma subtype and may have a role in the acquisition of lung cancer multidrug resistance [
19‐
21]. Breast cancers with elevated expression of
RPL19 are more sensitive to apoptosis-promoting drugs that induce endoplasmic reticulum stress [
13].
RPS11 and
RPS20 have been proposed as prognostic markers in glioblastoma [
16] and the down-regulation of
RPL10 correlates with altered treatment response to dimethylaminoparthenolide (DMAPT) in pancreatic cancer [
22].
Our results also significantly extend the findings of previous studies by demonstrating that, in the vast majority of cancers, subsets of RPTs are expressed coordinately and have additional interpretive power when examined in the context of global RPT expression patterning. This suggests that further insights into the roles RPTs have in tumor development may be revealed by evaluating RPT relative expression. For example, the regulation of chemotherapy response by RPL3 described above may be found to occur in other cancer types once the expression of
RPL3 relative to other RPTs has been accounted for. The apparent crucial role of RPT patterning in tumors may explain why a previous study found conflicting results when examining the expression of individual RPs in tumors [
14].
Our results suggest a more ubiquitous role for RPL3 in regulating tumor phenotypes, beyond that already described in colorectal carcinoma, lung cancers, and neuroblastoma [
19‐
21]. Of the recurring RPT expression patterns discovered by
t-SNE, the pattern associated with
RPL3 down-regulation occurred most frequently and involved tumors from nine cancer cohorts. Many clusters of tumors with down-regulated
RPL3, including HCC, kidney clear cell cancer, and brain cancer, possessed inferior survival. The fact that relative down-regulation of
RPL3 occurred in these tumor clusters with predictable expression of 11 other RPTs suggests that RPL3 may be acting in concert with these other identified RPs to exert its effects.
Other recurrent RPT expression patterns across cancer cohorts involved
RPS4X, RPL13, RPL8 and
RPL30 (Table
1). Altered
RPS4X expression, found in six cancer cohorts, associated with unique expression of nine other RPTs, strongly suggesting an underlying coordinated expression, the mechanism of which remains to be identified. As with
RPL3, deregulated
RPS4X has been previously associated with various tumors and tumor phenotypes, including subgroups of colorectal carcinoma, a myelodysplasia risk signature and poor prognosis in bladder cancer [
15,
18,
41]. Interestingly, some of our tumor clusters with altered
RPS4X expression were comprised of a greater proportion of females than males (Table
1 and Table
3), perhaps reflecting
RPS4X’s residence on the X chromosome. Although the cause of perturbed
RPS4X expression in these tumor clusters is unknown, altered methylation patterns on chromosome X have been described in different subsets of cancers [
42,
43] and could be responsible for the expression patterns detected by
t-SNE.
Unlike
RPL3 and
RPS4X,
RPL13’s role in tumor development is less clear. RPL13 activation has been described in a subset of gastrointestinal malignancies and correlated with greater proliferative capacity and attenuated chemoresistance [
44], but further evidence for a role of RPL13 in tumor development is lacking. Furthermore, clinical correlations of the prostate, uterine and kidney cancer
t-SNE clusters described here with relative overexpression of
RPL13 were inconsistent. Uterine cancers with high relative
RPL13 expression tended to correlate with favorable survival, whereas prostate cancers with high
RPL13 showed no differences in prognosis or clinical features. In contrast, kidney clear cell carcinomas with high
RPL13 expression tended to be of higher pathologic grade and were associated with significantly poorer survival (Tables
1 and
3, and Fig.
3b). The fact that these clusters shared similar patterning of 42 other RPTs suggests that the inciting factors responsible for higher
RPL13 expression are not only shared by these tumors but coordinately regulate a common subset of RPTs, with different biological outcomes likely reflecting other tissue-specific factors.
In some cases, RPT expression patterns could be accounted for in part by CNVs, as exemplified by the recurrent
RPL8 and
RPL30 overexpression pattern (Tables
1 and
2). Virtually all tumors with this expression pattern possessed co-amplification of a region on 8q22–24 that includes
RPL8,
RPL30, and the oncogenes
MYC and
PVT1. Amplification of this region has been previously described in breast cancers and correlates with chemoresistance and metastasis [
36,
37,
45‐
47]. Our results indicate that this amplification and the ensuing overexpression of
RPL8 and
RPL30 also occurs in subsets of melanoma, liver, prostate, lung, and head and neck cancers. CNVs of
RPL19 and
RPL23 in breast cancer (Table
2) likely occur due to their co-amplification with
ERBB2 on 17q12. Overexpression of
RPL19 has previously been described in a subset of breast cancers [
13]. The small cluster of 144 tumors that did not group according to tissue of origin (Additional file
1: Figure S4), comprised of tumors from 15 cohorts, also shared amplification of this region on 17q12, indicating that this CNV is not restricted to breast cancers and ultimately affects global RPT expression patterning. Amplification of a region on 11q13 that contains
RPS3, occurring in a cluster of breast cancers and HCCs, has been previously described in both cancers and is thought to confer unfavorable prognosis due to amplification of the adjacent oncogene
EMS1 [
48,
49]. The co-deletion of 19q13 along with 1p, which together includes 12 RP genes, has been described in low-grade gliomas and confers a favorable prognosis [
50,
51].
The co-overexpression
RPS25 and
RPS4X detected in one cluster of AML (Fig.
2) has been previously identified as contributing to the poor risk signature in myelodysplastic syndrome [
41]. This also associated with significant differential expression of 37 RPTs, which is consistent with our finding that
RPS25 and
RPS4X overexpression occur within the context of a larger and coordinated pattern of RPT expression. The
RPS25 and
RPS4X overexpressing AML cases likely possess a similar molecular alteration to those with the poor risk signature in MDS.
Collectively, our findings provide strong evidence to support the notion that RPT regulation by both tumors and normal tissues is complex, ordered, and highly coordinated. Although the means by which altered RPT patterns influence the pathogenesis and/or behavior of tumors remain incompletely understood, several non-mutually exclusive mechanisms can be envisioned. First, changes in RP levels may influence overall ribosome composition, thereby affecting their affinity for certain classes of transcripts and/or the efficiency with which they are translated. One such class of transcripts may be those with IRES elements,
cis-regulatory sequences found in the 5′-untranslated regions of more than 10% of cellular mRNAs. IRES elements are found with particularly high frequency in transcripts encoding proteins involved in cell cycle control and various types of stress responses. Efficient translation of these IRES-containing transcripts has been shown to depend on specific RPs, notably RPS25
, RPS19 and RPL11 [
52‐
54]. Changes in ribosome affinity for IRES elements have been shown to reduce translation of tumor suppressors such as p27 and TP53 and to promote cancer development [
55].
RPs may also influence cancer development via extra-ribosomal pathways. In addition to their stabilization of TP53 mediated by binding to and inactivating MDM2, specific RPs have been shown to inactivate Myc; to inhibit the Myc target Lin28B; to activate NF-κB, cyclins, and cyclin-dependent kinases and to regulate a variety of other tumorigenic functions and immunogenic pathways [
4,
5].
In addition to providing evidence that tumors may use RPs to direct tumor phenotypes, our findings have allowed us to leverage the tissue- and tumor-specificity of RPT expression to generate highly sensitive and specific models that allow for precise tumor identification and sub-classification (Additional file
1: Table S2). Clinically, these might be useful for determining the tissue of origin of undifferentiated tumors and for predicting long-term behaviors in otherwise homogeneous cancers such as kidney clear cell carcinoma and those of the central nervous system (Fig.
3b). With more samples and further refinement to ANN structures, future iterations of these models will likely have even greater discriminatory power.
A limitation of using data from TCGA is the fact that transcript expression does not always correlate with protein expression, particularly in cancers [
56‐
58]. Thus, it is difficult to predict how the different tissue-specific RPT expression patterns we identified correlate with actual protein expression in these cancers and/or with the numerous post-translational modifications that can alter RP behaviors [
59,
60]. As this is a cross-sectional study, we also recognize that causality cannot be inferred, and it remains unknown whether altered RPT expression is an early or late event in tumorigenesis despite its predictive value. Furthermore, while RPT expression patterns appear to have significant predictive value in the large dataset we have analyzed, further cross-validation with additional transcriptional data in both primary tumors and metastatic lesions will be important in confirming potential clinical utility. Finally, additional molecular analyses of the identified
t-SNE clusters with whole-transcriptome sequencing data, pathway analysis, whole-genome DNA mutation data, and DNA methylation patterning may offer additional insights into the biological mechanisms that link altered RPT expression with tumor phenotypes.