Background
Ubiquitination is the most common post-translational protein modification [
1], during which a small protein, ubiquitin (Ub), is covalently attached to the substrate in a three-enzyme cascade reaction catalyzed by subsequent activation (E1), conjugating (E2) and ligation (E3) enzymes [
2,
3]. Ubiquitination can either direct proteins for degradation, mediated by the proteasome system [
3], or modulate their intracellular localization, vesicular trafficking [
4], activation of signaling pathways and alteration of DNA transcription [
2,
5]. Enzymes responsible for transferring ubiquitin to a given protein are E3 Ub-ligases [
6]. According to the conserved domains and the mechanism of the Ub transfer to the substrate, E3 Ub-ligases have been divided into three basic classes: really interesting new genes (RINGs), homologous to the E6-AP C terminus (HECTs), and RING between RINGs (RBRs) [
1,
7]. Individual Ub-ligases recognize their targets in a strictly regulated manner without any respect to their sequence similarities. To ensure high specificity during selection of target proteins, it has been predicted that more than 600 genes encode E3-Ub-ligases in the human genome [
1], whilst there are only two E1 and 30–50 E2 genes, respectively [
8]. Depicting the regulatory roles of Ub-ligases within complex regulatory networks can be hampered by strong parallel compensation mechanisms, Ub-ligases can often recognize the same substrate or affect different nodes of same regulatory pathway. This makes it difficult to predict alternative compensatory enzymes in reverse genetics approaches. For this reason, more functional classification of Ub-ligases is needed.
One of the ways E3 ubiquitin ligases are classified is according to function using their Gene ontology [
9], which describes three aspects of the biological domain through molecular function, cellular component, and biological process in which it is involved [
6]. Nowadays, one of the most popular methods that employs this type of classification is Gene Set Enrichment Analysis (GSEA) [
10]. However, the ability of this method to describe complex biological phenomena is limited by the format of its input and output [
11]. To improve this descriptive power, Semantic analysis method [
12] and the sem1R algorithm [
11] were introduced. These methods enable the determination and description of semantically comprehensive gene biclusters using a conjunction of ontological terms from various ontologies. Since the hypothesis language is thus extended, the method provides a more complete picture of functional gene classification for specific cell types in the tissue.
The gastrointestinal tract (GIT) is a system with a high rate of regeneration. It consists of a variety of diverse epithelial cell populations with varying morphology and function, such as nutrient absorption, hormone production, barrier function, responding to microorganisms, coordination of immune response, as well as self-renewal [
13,
14]. In addition to a diverse population of epithelial cells, stem cells and mucosa-associated lymphoid tissue can be found along the GIT [
15,
16]. Tissue specific stem cells of epithelial origin which continuously divide, proliferate and differentiate to ensure the turnover of cells and the overall tissue homeostasis [
17]. There are multiple signaling pathways, such as Wnt, Notch, or EphrB3, which have been known to be critical for regulation of the stem cell niche and differentiation of progeny cells [
14,
18]. These features are determined by unique gene signature and regulatory pathway cooperation that is individual to each specific cell type, and can be found in their RNA profile [
19]. Therefore, GIT represents a valuable model system to study parallel regulatory networks in the context of tissue homeostasis, regeneration, and response during pathogenic processes.
However, little is known about how ubiquitin ligases are involved in such physiological regulatory processes, either in GIT or in other systems, despite increasing evidence that an aberrant function or dysregulation of the expression of the E3 Ub-ligases can cause pathological changes resulting in dysplasia, metaplasia or even cancer [
20]. In this regard, understanding the function of individual Ub-ligases in tissue context would help to understand development of pathological conditions and eventually their therapeutic targeting [
2,
21]. Thereby, in this study we aim to identify GIT specific Ub-ligases and ubiquitination-related genes, their role in tissue homeostasis and their possible contribution to alternative compensatory networks.
Here, we introduced a semantic clustering method [
11,
12] combined with the expression profiles of E3 Ub-ligases and ubiquitination-related genes in the stomach, small intestine and colon, aiming to specify the dominant biological role of individual E3s, as well as potentially predict their secondary compensatory roles in different parts of GIT during tissue homeostasis and regeneration. In addition, by using already published single-cell RNA sequencing data [
22], we attempt to identify cell-type specific Ub-ligases in the colon. We demonstrate that an individual Ub-ligase may be typical for several cell types, although its expression is determined by the current status of the tissue and could differ during injury response or regeneration.
Methods
Animals
For this study were used C57BL/6NCrl mice (Charles River Laboratories). For the expressional profiling, three 12-week-old C57BL/6NCrl males were used. Stomach, small intestine, and colon were dissected and immediately proceeded for RNA isolation.
RNA isolation and reverse transcription
RNA was isolated with the TRI reagent (Sigma-Aldrich, USA) according to the manufacturer's recommendations. Parts of small intestine (duodenum, jejunum and ileum) or colon (proximal, distal) were pooled together into one sample from all animals. The appropriate RNA was transcribed into cDNA using GoScript™ Reverse kit (Promega, USA).
For expression profiling of Ub-ligase genes, kit RT2 Profiler PCR Arrays (Qiagen, USA) was used with different specific primer pair for each gene of E3-Ub ligase placed in each position that allows to measure 370 genes simultaneously. Chosen genes represent Ub-ligases and that could be potential drug targets. This array includes ubiquitin ligases from all major E3 families and important regulatory ubiquitination-related genes with suggested redundant or compensating functions that are not Ub-ligases by the nature, but could possibly cause lower drug efficacy or off-target effects. Internal controls for reference genes and detecting genomic DNA contamination were included within the plate: housekeeping genes (GAPDH, HSP90, beta-actin, Gus-B, beta-2 microglobulin), genomic DNA contamination control and positive PCR controls (see manufacturer's handbook).
Mouse model of epithelial mucosa damage
Twelve-week-old C57BL/6NCrl males (Charles River Laboratories) were used for this study. For the epithelial damage studying the dextran sulfate sodium (DSS) mouse model was used [
23]. Three males were treated with 2% DSS (w/v) (TdB Consultancy, Sweden) for five days. On the evening of the 5th day, DSS was exchanged for drinking water overnight to reach the peak phase of acute inflammation. Two animals were used as controls with plain drinking water only. On day 6, mice were sacrificed and intestinal tissues were processed for histology and RNA extraction. Stomach, small intestine and colon from sacrificed mice were dissected, fixed for 24 h in 10% buffered formaldehyde (v/v) (Thermo Scientific, USA) at 4 °C, embedded in paraffin and sectioned with automatic rotating microtome RM2255 (Leica Biosystems, Nussloch, Germany).
RNA probe preparation and in situ hybridization
Primers for in situ hybridization were designed in Primer-BLAST (NCBI) covering at least one exon:exon border of the gene. The list of primers is shown in Additional file
3: Table S1. The DNA template was transcribed according to the probe sequence in plasmid (pGEM®-T Easy Vector Systems, Promega, USA) into mRNA probe by In vitro transcription with T7 or Sp6 RNA polymerase (Promega, USA) following the manufacturer’s protocol.
In situ hybridisation was performed on 7 μm paraffin tissue sections of the distal colon based on the protocol of Wilkinson, et al. [
24]. Sections were deparaffinized, permeabilized with 10 µg/ml Proteinase K (Sigma-Aldrich, Germany), post fixed with 4% PFA and washed. Next, the sections were acetylated with acetic anhydride (Sigma-Aldrich, Germany) and washed. Slides were then treated with hybridization buffer for 1 h containing Formamide, 20 × SSC, pH 7.0 (Thermo Scientific, USA), 50 × Denhardt’s solution, 10% Tween-20, tRNA 10 mg/ml, Heparin 50 mg/ml and Salmon sperm DNA 10 mg/ml (all purchased from Sigma-Aldrich, Germany). Hybridization with specific anti-sense mRNA probes (2 ng/μl, denatured for 3 min at 80 °C) was done O/N in moisten chamber at 70 °C.
Thereafter, unspecific binding of mRNA was washed off the sections with 5xSSC-Formamide, pH 7.0 and then 2 × SSC, pH 7.0 (Thermo Scientific, USA) at 70 °C in water bath. Afterwards, slides were washed 4 times with TBS solution. The endogenous alkaline phosphatase was blocked with the Blocking reagent for 1 h (Roche, Switzerland). Digoxigenin-labelled mRNA probes were detected with anti-digoxigenin Fab fragments conjugated with alkaline phosphatase (Roche, Switzerland) at dilution 1 µl/5 ml of TBS, 4 °C O/N. The antibody was washed out with TBS. The visualization of signal was performed with BM-Purple solution (Roche, Switzerland). Post fixation was done with 4% buffered PFA and slides were mounted with Aquatex mounting medium (Merck Millipore, Germany).
Statistical analysis
qPCR data were normalized on HSP90 gene expression. Missing data were replaced by maximum value + 2 for a given gene, recalculated to relative quantities and log transformed. The ANOVA test with Tukey post-test was used for analyzing different gene expression in different GIT parts. As significance level we used p = 0.01. Comparison of DSS treated and untreated distal colon was not performed due to small sample size. As primary criterion for selection potential interesting genes, the absolute difference higher than 1.25 delta Cq was used and all values from one had to be higher/smaller compared to any value from the second group. Fisher test was used for comparison of category data (distribution of ontology terms in different tissue and structural groups).
Ontology analysis and semantic analysis
Ontologies that were used in all experiments are the following: Gene ontology [
9], Pathway ontology [
25], and KEGG Brite database [
26]. These ontologies contain 45044, 2601, and 63263 ontological terms, respectively. We note that each term in ontology represents one biological knowledge and therefore the size and numbers of ontologies appended to the semantic analysis affect the ability to explain biological phenomena. In other words, ontologies with higher numbers of terms have a larger potential to describe a hypothesis (e.g. processes in genes) more precisely since their hypothesis language is more extensive. On the other hand, it has a negative impact on run time of algorithms. The ontological terms are arranged hierarchically, which means that one term might be more general then the others. For example, term “regulation of biological process” is more general than term “regulation of cellular process”. This hierarchical order might help to understand relations among the ontological terms and their biological meanings at different levels of specificity. To perform the semantic analysis and afterwards a semantic clustering for the specific part of GIT, the entire gene set was split into three groups—significant or not significant for each comparison (Small intestine vs colon—Group A, stomach vs colon Group B and stomach vs small intestine Group C). Then, the enrichment score (statistical significance) of each ontological term was calculated for each group of significant and non-significant differentially expressed genes, i.e. Group A, Group B, and Group C. For this analysis,
computeTermsEnrichment function of sem1R algorithm [
10] was used.
Semantic cluster analysis
For semantic cluster analysis the sem1R algorithm induces a set of predictive rules that describe coherent biclusters using ontology terms from input data. In this case, the input data means a gene set of significant and non-significant genes for each comparison (Small intestine vs colon—Group A, stomach vs colon Group B and stomach vs small intestine Group C), and a set of ontologies. We note that the input data, i.e. groups of genes and ontologies, are the same that were used in the ontology and semantic analysis. Here, each rule was formulated as a conjunction of ontology terms, where a group of genes covered by the rule had to be associated with all ontology terms appearing in that rule. An example of such rule might be the following rule: cellular protein metabolic process ∧ protein phosphorylated amino acid binding.
The rule above defines a set of genes that are simultaneously associated with cellular protein metabolic process and with protein phosphorylated amino acid binding. A graphical representation of this rule is shown in Fig.
2D. Similar rules were used for computing Fig.
2B, F.
The concept of semantic cluster analysis is illustrated in Additional file
5: Fig. S1. The figure shows a process of inducing hypotheses for each set of significantly and non-significantly expressed genes of the original qPCR dataset that is divided into three groups of samples. Then, hypotheses in the form of a set of rules are induced using the sem1R algorithm.
Selection rules definition
The selected groups of genes were sorted according to the t-score and number of differences between significantly and non-significantly expressed genes (minimum difference was set up arbitrarily equal to 3) for each ontology level. For each group (Group A, Group B, and Group C) we ran the sem1R algorithm that is restricted to find a maximum of 10 best rules (groups of genes) according to an evaluation function. To get more different rules and consequently more different covered groups of genes, all supported evaluation functions (ACC, AUC, and F1-score) were used in the process of rule learning. To control a level of specificity of rules, ‘minLevel’ parameter was set up to 0, 2, 3, 4, 5, and 6 for all runs of the sem1R algorithm. Defining a minimal level of specificity prevents to induce too general or too specific rules that cover too many or too few genes, respectively. From all of these runs of various settings, interesting rules and consequently corresponding groups of genes were selected.
Discussion
To date, there have been many published reports on E3 Ub-ligases based on in vitro investigations. This provides valuable data regarding cellular physiology and homeostasis such as proliferation, cell growth, apoptosis, nucleic acids maintenance, metabolism, cell cycle etc., with either overexpressed or absent E3 Ub-ligases [
1,
6,
43]. However, contextual information about their effect on a complex tissues, organs and organisms, including reciprocal regulations within a subpopulation of cells is missing in such models. Therefore, studying E3 Ub-ligases in vivo gives more information about the biological role of these enzymes and their implementation in the physiology of the entire organism. Yet, in vivo models are subjected to strong regulatory mechanisms relying on compensatory effects of alternative pathways.
The ability of a biological system to maintain homeostasis in the presence of mutations is determined by the term genetic robustness. This feature is evolutionarily essential for the organism’s survival in the case of gene dysfunction and can be achieved via regulatory pathway intercommunication [
44,
45]. However, this could cause difficulties to analyze the animal models, when gene targeting does not lead to the expected abundant or severe phenotype. After first being reported in
Drosophila as transcriptional dosage compensation of the X chromosome [
46], genetic robustness was then described in many model organisms from yeast [
47] to mammals [
48]. To explain the genetic robustness phenomenon, researchers proposed several mechanisms, such as functional redundancy of homologous genes [
49], adaptive mutations [
50], rewriting of genetic networks [
51], genetic compensation, and transcriptional adaptation [
44].
To gain a deeper understanding of genetic and functional compensation, we propose the use of Semantic clustering analysis [
11,
12] to statistically predict and describe semantically coherent gene bi-clusters in the context of functional gene classification for specific cell type in the tissue. To test a model of semantic clustering analysis, we compared expression of E3 ubiquitin ligases and ubiquitination-related genes in three main segments of the gastrointestinal tract, i.e. in stomach, small intestine and colon. As a first outcome, the small intestine appears to possess all the ligases expressed at the lowest level. Knowing this we used the expression in small intestine as a reference level for stomach and colon for the ontology analyses, dividing expressed genes according to their function in cells and tissues. These analyses revealed that the small intestine is characterized by genes involved in the maintenance of the immune system, and that genes playing roles in the catabolic processes are typical for the colon.
It has been discussed if compensatory activity of redundant genes may or may not correlate with their similarities in sequence or structure and in common origin [
52]. These facts complicate compensatory pathway identification. Applying the theory above, we were able to reveal ten groups of Ub-ligases that share the same ontologies, but that carry the GIT specific expression pattern. Notably, the genes from the same ontology combination group have not described as redundant before, which gives an interesting hint for a detailed study of those genes in signaling pathway networks.
In order to test the possible identified parallel networks in a biological system, we used a mouse model of epithelial regeneration. We hypothesized that genes involved in tissue regeneration might be masked by steady state homeostasis, but they may expose their functions after changes in tissue function under challenged conditions. Therefore, we induced epithelial damage by treating mice with DSS. We observed that epithelial damage in the colon activated intracellular signaling transduction with the activation of particular genes functions that differ from their normal role in homeostasis. This suggestion was also supported by our approach to classify Ub-ligases based on their cell specificity. We did not observe any strict cell specificity and the tested Ub-ligases were found to be present in various cell types. This observation refers to the ability of Ub-ligases to participate in the regulation of several signaling pathways in specific clusters. Yet, such regulation can be significantly different depending on tissue type, developmental stage and homeostatic condition.
Taken together, the semantic clustering analysis of GIT specific Ub-ligases and ubiquitination-related genes allows the ability to statistically define compensatory gene clusters consisting of the same genes involved in the distinct regulatory pathways vs a few different genes playing roles in functionally similar signaling pathways. Such an approach could find potential application, for instance, in cancer therapy development as redundancy/substitution of certain genes has also been described during cancerogenesis. In this case, redundant genes cover the potential harmful effects of mutation, and cancer progression depends on the effective functional setup between defective genes and their compensatory partners [
52]. The most illustrative expression pattern in GIT semantic ontologies combinations showed members of the Socs family. Besides their role in immune response regulation as suppressors of cytokine signaling, some members of the Socs family were described to participate in tumor progression [
53]. For instance, SOCS1 downregulation was described in hepatocellular carcinoma [
54], cervical [
55], ovarian and breast cancer [
56]. Aberrant expression of SOCS1 and SOCS3 has been described in human colorectal cancer, where SOCS3 overexpression inhibits proliferation, migration and invasiveness of tumor cells [
57], while SOCS1 overexpression has pro-oncogenic activity [
58]. In this manner, it would be meaningful to further study Socs genes together with other genes from the same ontology group in terms of compensatory potential during cancerogenesis and other GIT disease progression.
Having obtained an overview of Ub-ligases clustering, it would be interesting to apply the semantic clustering approach for studying the redundancy of these enzymes within their families, such as proteases, phosphatases, kinases etc. Their important biological roles indirectly suggest their high compensatory potential. Operating with the knowledge of ontology relationship among genes will help to choose the relevant animal model for study of a particular disease and future therapy development.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.