Background
Colorectal cancer is a commonly occurring cancer worldwide [
1]. Metastatic colorectal cancer is clinically significant, as colorectal cancer is one of the major causes of cancer-related deaths [
2]. Metastatic progression in colorectal cancer is a multistep process, beginning with the formation of adenomatous polyps, which develop into locally invasive tumors [
3]. This process involves phenotypic changes associated with the acquisition of new functions, such as cell-type transition, cell migration, and tissue invasion in the tumor cells [
3]. An improved understanding of the molecular alterations associated with metastatic progression may contribute to the development of novel and effective targeted therapies for colorectal cancer [
4].
Gene expression profiling provides a scalable molecular method for investigating genetic variation, associated with ectopic gene expression, in tumors. Also, the identification of differentially expressed genes offers great potential for the discovery of clinically useful biomarkers in tumor cells. The complexity of the cancer transcriptome is attributable to differential pre-mRNA processing, including alternative promoter and splicing, which is involved in the production of cancer-specific transcripts and proteins [
5]. Fusion transcripts are common cancer-specific RNAs, which are obtained by genomic rearrangements or transcription-mediated mechanisms, such as novel
cis or
trans splicing [
6]. The formation of gene fusions may lead to the disruption of tumor suppressor genes or the activation of oncogenes, thereby triggering tumorigenesis [
7]. Furthermore, fusion transcripts and proteins have been useful in cancer diagnosis, prognosis, and direct target therapy.
Massively parallel RNA sequencing (RNA-seq) is a useful method for annotation of the cancer transcriptome with great efficiency and high resolution [
8] RNA-seq has enabled a comprehensive understanding of the complexity of the cancer transcriptome, via genome-wide expression profiling and identification of novel and fusion transcripts [
9]. Recently, RNA-seq has been used to annotate the cancer transcriptome in breast [
10], lung [
11], gastric [
12], and colorectal cancers [
13,
14]. However, despite the availability of high-throughput sequencing technology, the transcriptional differences including fusion genes between primary colorectal carcinomas and liver metastases not fully understood.
In this study, we compared the transcriptomes of five sets of quadruple-matched tissues (primary carcinomas, liver metastases, normal colon, and liver). First, we found a similar gene expression pattern between primary and metastatic colorectal carcinoma. Second, we identified a novel gene fusion event specifically in primary and metastatic colorectal cancer tissue, and experimentally confirmed the fusion product. In addition, we demonstrated the cell growth-promoting effect of this fusion transcript.
Methods
Collection of specimens
Matched fresh-frozen samples, including 5 paired primary, metastatic colorectal carcinoma, normal colon and liver, who received resection of the primary tumor at the Korean National Biobank of Pusan National University Hospital (PNUH) were obtained from the Korean National Biobank of PNUH. This series of studies was reviewed and approved by Institutional Ethics Committees of Pusan National University Hospital. All of the patients that were used in this study and their characteristics were summarized in Additional file
1: Table S1.
cDNA library preparation and high-throughput paired-end RNA sequencing
Total RNA was isolated from fresh-frozen tissues of the conditioned volunteers and patients (NC, normal colon; PC, primary colon carcinoma; LM, colon-liver metastases; NL, normal liver) using TRIzol reagents (Invitrogen, USA), and subsequently treated with RNase-free DNaseI for 30 min at 37 °C, to remove residual DNA. Libraries were prepared according to the standard Illumina mRNA library preparation (Illumina Inc, USA). Briefly, Purified mRNA was fragmented in fragmentation buffer and we obtained short fragments of mRNA. These short fragments served as templates to synthesize the first-strand cDNA, using random hexamer primers. The second-strand cDNA was synthesized using buffer, dNTPs, RNase H, and DNA polymerase I, respectively. Double-stranded cDNAs were purified with QiaQuick PCR extraction kit (Qiagen Inc, USA) and resolved with EB buffer. Following the synthesis of 2nd strand, end repair, and addition of a single A base, Illumina sequencing adaptors were ligated onto the short fragments.
The concentration of each library was measured by real-time PCR. Agilent 2100 Bioanalyzer was used to estimate insert size distribution. Constructed libraries were sequenced (90 cycles) using Illumina HiSeqTM 2000 (Illumina Inc), according to the manufacturer’s instructions. HiSeq Control Software (HCS v1.1.37) with RTA (v1.7.45) was used for management and execution of the HiSeqTM 2000 runs.
RNA-seq data processing
Images generated by HiSeqTM2000 were converted into nucleotide sequences by a base calling pipeline and stored in FASTQ format, and the dirty raw reads were removed prior to analyzing the data. Three criteria were used to filter out dirty raw reads: Remove reads with sequence adaptors; Remove reads with more than 5 % ‘N’ bases; Remove low-quality reads, which have more than 50 % QA ≤ 10 bases. All subsequent analyses were based on clean reads.
Clean reads were mapped to reference
Homo sapiens transcriptome sequences from the UCSC website (hg19), using Bowtie2 and Tophat 2.0.1. Mismatches of no more than 3 bases were allowed in the alignment for each read. Reads matched with reference rRNA sequences were also mapped and removed. To annotate gene expression, fragments per kb per million fragments (FPKM) values of each gene were calculated, and differentially expressed genes (DEGs) were extracted using this value. The formula for calculating FPKM value was defined as below:
$$ \mathrm{FPKM}=\frac{10^9C}{NL/{10}^3} $$
In this formula, C represents the number of reads uniquely mapped to the given gene, N is the number of reads uniquely mapped to all genes, and L is the total length of exons from the given gene. For genes with more than one alternative transcript, the longest transcript was selected to calculate the FPKM value.
Expression profiling and analysis of differential gene expression
For clustering, genes with median of RPKM < 1.0 and coefficient of variation (CV) < 0.7 were excluded to remove genes non-informative. This resulted in a total of 7744 unique genes. Log
2 transformation and additional normalization was applied. Then, hierarchical clustering was done by Gene Cluster 3.0 with default parameters, correlation (uncentered), and complete linkage [
15]. The differential expression
P-values were adjusted using the false discovery rate (FDR) by the Benjamini and Hochberg procedure and set a cutoff of FDR < 0.05. Analyzed genes were functionally annotated in accordance with the Gene Ontology (GO) using the DAVID bioinformatics tool (
http://david.abcc.ncifcrf.gov) [
16].
Candidate gene fusion identification
SOAPfuse v1.26 (
http://soap.genomics.org.cn/soapfuse.html) [
17] was used for scanning of fusion RNAs using transcriptome data. Briefly, GRCh37.69.gtf.gz (
Homo sapiens) was downloaded from Ensembl and used as gene annotation reference information (gtf). For cytoband information, the human genome (hg19, Reference 37) from UCSC, as well as the complete HGNC gene family dataset (HGNC), was used. The pipelines were tuned using Perl.
Validation of fusion genes
Fusion genes were validated by reverse transcription-polymerase chain reaction (RT-PCR) amplification of fusion gene breakpoints, and Sanger sequencing. The PCR reactions were carried out for 4 min at 94 °C; 35 cycles of 40 s at 94 °C, 40 s at 55–58 °C and 40 s at 72 °C, and finally 7 min at 72 °C. The primer sequences are listed in Additional file
2: Table S4. PCR products were confirmed on a 2 % agarose gel, purified, and cloned into the pGEM-T easy vector (Promega, USA). The positive clones were selected for Sanger sequencing.
GAPDH was used as a standard control.
siRNA transfection
To suppress expression of
RNF43-SUPT4H1, DLD-1 and HT29 cells were transiently transfected with siRNAs of the fusion transcript, and negative siRNA, in 6-well plates (2×10
5 cells/well). The siRNAs sequences used against the
RNF43-SUPT4H1 fusion transcript variant 1 were candidate 1 in position 90 bp : 5′-CGA CAG CGC AAC AGA CUA U-3′ (sense) and 5′-AUA GUC UGU UGC GCU GUC G-3′ (antisense), and candidate 2 in position 97 bp: 5′-GCA ACA GAC UAU AGA CCA G-3′ (sense) and 5′-CUG GUC UAU AGU CUG UUG C-3′ (antisense) and negative siRNA were purchased from RNAi Co. (Bioneer, Korea). These siRNA candidates targeted fusion junction (Additional file
3: Figure S5). In each colorectal cancer cell line, 100 nM siRNA was treated using the RNAi MAX transfection reagent (Invitrogen), following the manufacturer’s instructions. The cells were harvested at 24, 48 and 72 h after transfection, and
RNF43-SUPT4H1 fusion transcript expression was analyzed by RT-PCR.
MTT assay
Cell viability was assessed by tetrazolium salt reduction using the MTT [3-(4, 5-dimethylthiazol-2-yl)-2, 5-diphenyl tetrazolium bromide] assay (Sigma-Aldrich, USA). After siRNA transfection, the cells were incubated for 0, 24, 48, and 72 h before the addition of MTT substrate. MTT stock solution was added at a final concentration of 0.5 mg/ml, and cells were incubated at 37 °C for 1.25 h. MTT crystal was collected and dissolved by incubation with DMSO. Absorbance was measured by spectrophotometry at 540 nm wavelength.
Access to data from this study
Discussion
In this study, we performed transcriptome analysis using RNA-seq, to compare the gene expression profiles of primary colorectal carcinoma and liver metastases. Our results revealed high concordance of gene expression between the primary carcinoma and liver metastases. Interestingly, we found that fusion transcripts are expressed differentially between the primary colorectal cancer and liver metastases. Our results also suggest that the fusion genes investigated may serve as potential new targets for primary colorectal carcinoma.
A recent study reported high genomic concordance between primary carcinoma and metastases in colorectal cancer [
18,
19]. In our study, the result of unsupervised clustering was in agreement with that of previous reports. These results suggest that primary tumor and metastases may share molecular profiles at different regions. Because cancer cells that leave the primary tumor can seed metastases in distant organs [
19,
20]. However, each patient clustering showed different expression patterns between primary cancers and their metastases (Additional file
9: Figure S1). In addition, we identified 14 statistically significant genes associated with liver metastases. We will further investigate the roles of DEGs in colon cancer metastasis.
In this study, we focused on the structure of the transcriptome and analyzed cancer type-specific fusion transcripts. Gene fusion events that result in genomic aberrations or transcription-mediated chimeric oncogenes are known to be involved in cancer development and progression. Fusion transcripts have been found in various cancers, including
EML4-ALK in lung [
21],
ETV6-NTRK3 in breast [
22], and translocation of genes in the ETS family in prostate cancer [
23]. The expression of these fusion transcripts influences cell growth, colony formation, migration, and invasion, which often results from the production of functional proteins [
7]. In colorectal cancer, however, fusion transcripts are not commonly reported [
24]. Investigating cancer type-specific gene fusion is useful for understanding the complexity of the cancer genome, and studying colorectal cancer development [
14]. In the present study, gene fusion events in primary colorectal carcinoma and liver metastases tissues were detected using RNA-seq technique. A total of 30 in-frame fusion transcripts were identified in primary carcinoma and liver metastases. Among these fusion transcripts,
GTF2E2-NRG1,
TMEM66-NRG1,
TNNC2-WFDC, and
HEPHL1-PANX1 fusion transcripts were found in both primary carcinoma and liver metastases from the same patient. It is considered that these fusion transcripts, with the exception of the
HEPHL1-PANX1 gene fusion, were generated due to genomic aberrations, e.g., inversion or deletion. However, common cancer type-specific fusion transcripts are generated by transcription-mediated mechanisms, including read-through and trans-splicing, allowing for high concordance between the genomes of primary tumors and metastases. The
ZMYND8-SEPT9 fusion transcript, which arises due to a fusion event involving genes on different chromosomes, is only present in primary carcinoma (Additional file
7: Figure S3A). Therefore, we suggest that the cancer type-specific fusion transcripts enable differentiation between primary carcinoma and liver metastases at the transcriptome level, regardless of genomic variation.
The Cancer Genome Atlas (TCGA) has recently reported genomic aberrations of colorectal cancer, using high-throughput sequencing [
13]. The TCGA study, which focused on translocation–mediated gene fusions, reported 18 interchromosomal translocation and in-frame events. Gene fusion events may additionally occur due to genomic rearrangements. Transcription-mediated gene fusions show high frequency, and recurrent functional gene fusions are suggested as candidate biomarkers and potential therapeutic targets. We detected not only genomic rearrangement-mediated gene fusion, but also transcription-mediated gene fusion events (Table
2). Among these fusion genes, the
CNN1A-TNFRSF1A fusion transcript, which is translated into fusion protein, has been reported in breast cancer [
25]. Furthermore,
DUS4L–BCAP29 fusion transcript has been reported in gastric cancer, which encodes a functional protein that is involved in cell proliferation [
26]. We report, for the first time, that knockdown of the
RNF43-SUPT4H1 fusion transcript reduces cell proliferation in live cells suggesting this fusion transcript plays a role in cancer cell growth. Therefore, we suggest that these fusion transcripts may serve as potential biomarker candidates and therapeutic targets.
The genomic loci of the
RNF43 and
SUPT4H1 genes are adjacent to each other, and the
RNF43-SUPT4H1 fusion transcript is found to occur frequently. As a result, the
RNF43-SUPT4H1 fusion transcript was categorized as a read-through chimera. This fusion transcript was detected in cancer tissues only (Fig.
3 and Additional file
8: Figure S4). We therefore hypothesized that
RNF43-SUPT4H1 fusion transcript acts as an oncogene, and confirmed this function (Fig.
4).
RNF43 encodes the ring finger protein 43 that is involved in cell growth, and is upregulated in human colon cancer [
27].
SUPT4H1 encodes the transcription elongation factor SPT4, which regulates mRNA processing and transcription elongation [
28]. We speculate that the
RNF43-SUPT4H1 fusion transcript is activated in colorectal cancer, affecting the expression of other genes. Future studies should focus on investigating the function of cancer type-specific fusion transcripts and developing methods for distinguishing between primary carcinoma and liver metastases.
Abbreviations
CRC, colorectal cancers; CV, coefficient of variation; DEG, differentially expressed gene; FDR, false discovery rate; FPKM, fragments per kb per million fragments; GO, gene ontology; LM, colon-liver metastases; MTT, 3-(4, 5–20 dimethylthiazol-2-yl)-2, 5-diphenyl tetrazolium bromide; NC, normal colon; NL, normal liver; PC, primary colon carcinoma; RNA-seq, RNA sequencing; RT-PCR, reverse transcription-polymerase chain reaction; siRNA, small interfering RNA; SRA, sequence read archive; TCGA, the cancer genome atlas
Acknowledgements
The biospecimens for this study were provided by the Pusan National University Hospital, a member of the National Biobank of Korea, which is supported by the Ministry of Health, Welfare and Family Affairs. All samples derived from the National Biobank of Korea were obtained with informed consent under institutional review board-approved protocols.