Background
There were an estimated 26,400 new cases of and 3620 deaths from laryngeal cancer in China in 2015 [
1]. Like other carcinomas of the respiratory system, carcinogen exposure via tobacco smoke causes DNA damage, and the accumulation of this DNA damage can alter genetic and epigenetic regulatory functions and thereby transform normal cells into cancer cells [
2,
3]. This cell transformation usually takes multiple steps to complete, and it is affected by the sensitivity of the individual and the degree of damage [
4]. This process is called tumorigenesis [
5].
Tumorigenesis often presents with chromosomal and DNA abnormalities, and one common chromosomal rearrangement is gene fusion [
6]. Some specific gene fusions have become important diagnostic markers of and therapeutic targets in cancer over the past several decades [
7]. These chimeric products are often associated with neoplastic behavior [
7,
8]. Typically, the
BCR-ABL1 fusion gene is rearranged via the t(8;14)(q24;q32) translocation in Burkitt lymphoma cells. This rearrangement is caused by this gene’s juxtaposition with regulatory elements of the immunoglobulin heavy chain gene at 14q32, where the
MYC gene is constitutively activated due to its expression, which is driven by immunoglobulin enhancers [
7,
9]. Other fusion genes, including
PRCC-TFE3 in papillary renal cell carcinoma [
10],
PAX8-PPARG in follicular thyroid carcinoma [
11],
FUS-CREB3L2 in soft tissue sarcoma [
12], and
TMPRSS2-ETS in prostate cancer [
13], have gradually been identified with various potential gene regulation mechanisms.
As in the fusion of two DNA genes, the two adjacent RNA genes, which are in the same orientation and are usually transcribed independently, are occasionally transcribed into a single fused RNA sequence. The various splicing mechanisms involved in such a transcription include RNA editing, alternative splicing (AS),
trans-splicing, alternative transcription start sites, and alternative polyadenylation transcription termination sites [
14‐
17]. This single fused RNA sequence is called a transcription-induced chimera (TIC) [
14]. Unlike a single transcript that can be translated into various proteins in prokaryotes, TICs usually do not produce chimeric proteins or independent transcripts. Instead, they have tumor-promoting properties as hybrid RNAs [
14]. For example, the expression of the chimeric transcript
HBx-LINE1 was associated with hepatocellular carcinoma development and correlated with poor survival [
18]. Also, the chimeric transcript
SLC45A3-ELK4, generated by
cis-splicing between the adjacent
SLC45A3 and
ELK4 genes, did not involve DNA rearrangements or
trans-splicing and could augment prostate cancer cell proliferation [
19].
In comprehensively analyzing novel TICs in transcriptomes in LC cells using a paired-end strategy for RNA deep sequencing, we found that
COL7A1-urocortin 2 (
UCN2) is a novel TIC. We could not elucidate the intrinsic genetic and epigenetic mechanism responsible for
COL7A1-UCN2 generation; however, both the
COL7A1 and
UCN2 genes had explicit suppressor roles in tumor regulation, specifically the regulation of the epithelial-mesenchymal transition (EMT) [
20‐
22]. Therefore, we hypothesized that
COL7A1-UCN2 may down-regulate the mRNA expression of both
COL7A1 and
UCN2 in LC tissues and that such down-regulation may promote tumor invasion via EMT regulation. Furthermore, we also speculate that
COL7A1-UCN2 generation can reflect the degree of DNA damages and that this TIC positivity may be associated with LC prognosis.
Methods
Patients and tissue samples
The Institutional Review Board approval for this laryngeal cancer research project (No. TRECKY 2009–33; Date: Jan, 2009) was obtained from the Beijing Tongren Hospital of Capital Medical University. A total of 23 patients who underwent surgery for pathologically confirmed LC from 2009 to 2016 were enrolled in this study. All patients received and signed a written informed consent. These patients had archived tumor specimens and data available, with a minimum of 36 months of cancer-free or censored-death follow-up after surgery. The follow up was completed through monitoring of their medical records or conducting telephone interviews. To confirm the diagnosis, the tumors’ histological classifications and differentiation were defined based on the 1999 World Health Organization’s histological classification standards for LC. Tumor staging was carried out using the 2009 TNM staging criteria of the Union for International Cancer Control. Clinicopathological data were available for all 23 patients (Table
1).
Table 1
Correlation of COL7A1-UCN2 expression with LC clinical characteristics
Age (years) | 0.685 |
≥ 60 | 15 (65.2) | 9 (39.1) | 6 (26.1) | |
< 60 | 8 (34.8) | 4 (17.4) | 4 (17.4) | |
Gender | 1.000 |
Male | 20 (87.0) | 11 (47.8) | 9 (39.1) | |
Female | 3 (13.0) | 2 (8.7) | 1 (4.3) | |
Tumor stage | 0.221 |
I─II | 8 (34.8) | 3 (13.0) | 5 (21.7) | |
III─IV | 15 (65.2) | 10 (43.5) | 5 (21.7) | |
Differentiation | 1.000 |
Well | 4 (17.4) | 2 (8.7) | 2 (8.7) | |
Moderate | 16 (69.6) | 9 (39.1) | 7 (30.4) | |
Poor | 3 (13.0) | 2 (8.7) | 1 (4.3) | |
LNM | 0.685 |
N0 | 10 (43.5) | 5 (15.6) | 5 (15.6) | |
N+ | 13 (56.5) | 8 (34.8) | 5 (15.6) | |
Treatment | 0.685 |
S only | 8 (34.8) | 4 (17.4) | 4 (17.4) | |
S + C/X | 15 (65.2) | 9 (39.1) | 6 (26.1) | |
Smoking | 0.435 |
Ever | 22 (95.7) | 13 (56.5) | 9 (39.1) | |
Never | 1 (4.3) | 0 (0.00) | 1 (4.3) | |
Alcohol | 1.000 |
Ever | 21 (91.3) | 12 (37.5) | 9 (39.1) | |
Never | 2 (8.7) | 1 (4.3) | 1 (4.3) | |
All tumor samples contained more than 50% tumor cells and were stored at − 80 °C until use. Paired LC and adjacent normal mucous membrane tissue samples (ANMMTs) were obtained from the 23 patients. Paired samples from three male patients with T4N2aM0 disease and various degrees of differentiation (well, moderately, and poorly differentiated) who were 61–63 years old, smokers, and alcohol drinkers and had undergone total laryngectomy with selective bilateral neck dissection and without preoperative chemotherapy or radiotherapy were prepared for transcriptomic analysis. The paired samples from the remaining 20 patients were used to validate the TIC using real-time polymerase chain reaction (RT-PCR). Adjacent normal tissue samples were obtained at least 5 mm from the tumor margins [
23].
Pathological review
Slides with hematoxylin and eosin staining were used to contain the paired frozen tumor and normal tissue sections. These slides were subjected to pathological examination twice to ensure that tumor tissues carrying high-density cancer foci (> 75%) were used and that the normal tissue samples had no tumor components. All samples were examined and reviewed by two pathologists independently, and disagreements between them were resolved via negotiation.
Preparation and sequencing of cDNA library
The total RNA was isolated from the fresh tissues using TRIzol reagent (Sigma-Aldrich, Missouri, St. Louis, US) according to the manufacturer’s instructions. Poly(A) mRNA was isolated from the total RNA using beads containing oligo(dT). A fragmentation buffer was used to fragment the purified mRNA. Using these short mRNA fragments as templates, random hexamer primers were applied to synthesize first-strand cDNA. The fragmentation buffer, RNase H, and DNA polymerase I were used to synthesize the second-strand cDNA. Short double-stranded cDNA fragments were purified using a QIA quick PCR extraction kit (Qiagen, Hilden, Germany) and eluted with EB buffer for end repair and the addition of an “A” base. The short fragments were then ligated to Illumina sequencing adaptors (San Diego, CA, U.S.A.). DNA fragments of a selected size were gel-purified and amplified using PCR. The amplified library of fragments was sequenced using an Illumina HiSeq 4000 sequencing machine.
Raw read filtering
The images of the nucleotides generated by the Illumina HiSeq 4000 sequencing machine were converted into nucleotide sequences using a base-calling pipeline. The raw reads of the nucleotide sequences were saved in FASTQ format. The dirty raw reads were removed before the data analysis. Three removal criteria were used in filtering out dirty raw reads: 1) reads with sequence adapters, 2) reads with more than 2% “N” bases, and 3) low-quality reads. This ensured that clean reads were used for the subsequent mapping to the human genome and transcriptome.
Reads mapped to the human genome and transcriptome
The Burrows-Wheeler Aligner software program was used to map clean reads to a reference genome, and the Bowtie software program was used to map them to a reference gene. The expression level of each gene was measured via the number of specific fragment reads mapped per kilobase exon model per million reads (RPKM). The formula used for mapping is as follows: \( \mathrm{RPKM}=\frac{10^9C}{NL} \). In this formula, C stands for the number of fragments specifically mapped to a given gene, N stands for the number of fragments specifically mapped to all genes, and L stands for the overall length of exons for the given gene. For genes with more than one alternative transcript, the longest transcript was chosen for the calculation of the RPKM. The RPKM calculation avoids the effect of differing gene lengths and sequencing discrepancies. Thus, the differences in the gene’s expressions between samples were directly compared using the RPKM.
Differentially expressed gene analysis
Differentially expressed genes were identified in the tumor and matched normal tissue samples according to two criteria — a false-discovery rate no greater than 0.001 and a log2 ratio of at least 1. This approach was chosen based on the significance of digital gene expression profiles.
Fusion of human gene detection
During the read alignment of the short RNA and the reference genome, when the reads were divided into two fragments, only some of them could be aligned. Two-segment alignments could be read to the reference genome using the gene fusion-detection doctrines of the SOAPfuse software program, which can detect gene fusions using span and junction reads [
24]. This basic method includes 1) comparing the reads to the reference genome alignment and the transcript
s to the notes; 2) using the local genome library, which contains an exhaustive algorithm, to construct the fusion site sequence; and 3) retaining highly credible fusion transcripts using a series of filtering means. The requirements for the alignment detection of the divided reads were as follows: a length of at least 8 bp for the shorter read segment and an intron boundary within one of the three canonical bounds (GT-AG, GC-AG, and AT-AC). Regardless of where the intron was derived, the boundaries always should be the same. For the DNA positive strand, for both read segment alignments, a maximum of one mismatch and an unmapped alignment was required. Based on the information on the alignments of the two segments, gene fusion sites identified from the mapping of the human genome and transcriptome were retrieved using a Perl script. A fused gene certainly existed if the fusion site was located at the known exon boundaries of the two genes, with at least one paired-end read supporting it [
25‐
27].
Detection of alternative splicing (AS)
AS is a fundamental mechanism of the generation of transcript diversity
. The base-calling pipeline used in this study to detect AS events in the transcriptome cDNA library consisted of two major steps. 1) SOAPsplice (Version 1.1) was used to map the reads to the human reference sequence and report the splice junctions according to the junction reads of the alignments [
24]. With SOAPsplice, the default parameters were used as much as possible; three mismatches were set for intact alignments, and no more than one mismatch was set for splicing alignments. 2) Abased on AS mechanisms, both the junctions of splicing [e.g., known splice junctions obtained from the National Center for Biotechnology Information RefSeq database (Bethesda, MD, US)] and the results derived from the mapping were applied for the detection of the four basic AS events: the skipping of exons, sites of alternative 5′ splicing, sites of alternative 3′ splicing, and the retention of introns.
By detecting the four types of AS events, those that occurred in the tumors, rather than in the matched normal tissue, were detected as specifically tumor-related AS events. The AS events that were detected in both LC and ANMMT samples were then filtered. Finally, for each sample, a list of highly reliable tumor-specific AS events was generated.
Validation of transcriptome cDNA library using RT-PCR
To determine the frequency of
COL7A1-UCN2 and
COL7A1 and
UCN2 mRNA expression, the other 20 paired LC and ANMMT samples were subjected to RT-PCR analysis
. The primer sequences used for this RT-PCR are listed in Table
2.
Table 2
Primer sequences used for RT-PCR in the study
COL7A1-UCN2
| F: 5’-CGCCAAGAGATGAGTCAGCAC-3’ |
R: 5’-GCACTCAGATCTGATATGACCTGC-3’ |
COL7A1
| F:5’-CGCCAAGAGATGAGTCAGCAC-3’ |
R:5’-CTCTGCAGGTAGGGCAGGGT-3’ |
UCN2
| F:5’-ATGACCAGGTGTGCTCTGCTGTTGC-3’ |
R: 5’-TCAGCAGTGGCCGACACG-3’ |
β-actin
| F:5’-TTGCCGACAGGATGCAGAA-3′ |
R:5’-GCCGATCCACACGGAGTACTT-3′ |
For the cDNA of COL7A1-UCN2 and COL7A1, the PCR conditions were 10 min at 95 °C, 30 cycles of 30 s at 95 °C, 30 s at 62 °C, 90 s at 72 °C, and 10 min at 72 °C. For UCN2 cDNA, the PCR conditions were 10 min at 95 °C, 30 cycles of 30 s at 95 °C, 30 s at 70 °C, 30 s at 72 °C, and 10 min at 72 °C. β-actin was used as a loading control. The RT-PCR products were analyzed using gel electrophoresis.
Quantitative analysis of PCR products was carried out using a Rotor-Gene 3000 (Corbett Research, Sydney, Australia) and a commercially available SYBR Premix Ex Taq Perfect Real-Time Kit (Takara Biotechnology, Dalian, China), which were used according to the manufacturer’s instructions. The primer sequences used were those described above. The PCR conditions were 30 s at 95 °C, 40 cycles of 5 s at 95 °C, and 30 s at 60 °C. The data were analyzed using the ΔΔCt method, and values were expressed as the fold difference from the housekeeping gene, β-actin.
Statistical analysis
Data were expressed as means ± standard deviation. Differences between the two groups were examined using Fisher’s exact test (two-sided, n < 40) or a paired or unpaired Mann-Whitney U-test. The Kaplan-Meier method and Cox regression model were used to perform the overall survival analysis of the 23 patients, who were grouped according to their positivity or negativity for COL7A1-UCN2. P-values less than 0.05 were considered statistically significant. The data were analyzed using the SPSS 20.0 statistical software program (IBM Corporation, Armonk, NY, USA).
Discussion
High-throughput transcriptome sequencing provides sufficient information with which to identify candidate oncogenic mRNA chimeras. These chimeric isoforms are usually generated by AS, which is a fundamental mechanism of transcript diversity generation [
26‐
31]. AS generated the TIC
COL7A1-UCN2 between neighboring genes, which is referred to as a read-through event [
32]. In the present study, we found
COL7A1-UCN2 positivity in 13 of 23 LC samples, whereas all 23 paired ANMMT samples were negative. This TIC was generated via alternative splicing in the cells of LC tissues. Furthermore, those LC tissues with
COL7A1-UCN2 positivity had lower levels of
COL7A1 and
UCN2 mRNAs as compared to negative LC tissues. Therefore, this TIC potentially down-regulated the expression of the
COL7A1 and
UCN2 genes during and after chimera fusion; and it is thereby associated with poor clinical prognosis because both
COL7A1 and
UCN2 possessed explicit suppressor roles in tumor EMT regulation.
In a previous study, low or nonexistent
COL7A1 expression was associated with the loss of the membrane basement, a specific extracellular matrix (ECM) component, and the promotion of the EMT process in cutaneous squamous cells (CSCCs) [
33].
COL7A1-produced type VII collagen (ColVII) is the primary component of anchoring fibril protein, which constructs the membrane basement that separates the epithelium from the stroma in epithelial and mucous cells. Invasive epithelial-mucous tumors can be distinguished from benign and pre-invasive lesions by the consistent loss of the surrounding linear basement membrane in a wide variety tissues [
33‐
39]. The breakdown of the basement membrane is a critical early step in EMT, in which oncogenic derivatives of epithelial stem cells are thought to act as intrinsic cancer stem cells that disrupt the basement membrane via the secretion of matrix metalloproteinases (MMPs) [
33]. In CSCCs, tumor cells with
COL7A1 knockdown manifested increased migration and higher invasiveness, accompanied by the alteration of EMT marker expression (the decreased expression of E-cadherin and the increased expression of MMP2 and vimentin). Furthermore, ColVII knockdown can decrease epithelial cancer cell differentiation and increase the expression of the chemokine ligand receptors CXCL10-CXCR3 and PLC-β4, which can further facilitate EMT and increase tumor invasion through an autocrine forward loop [
22].
In our present study,
COL7A1 mRNA levels were down-regulated in cancer tissues, and the
COL7A1-UCN2 chimera generation mechanism circumvented TGF-β1’s tumor-suppressive effects and thereby promoted tumor invasion and proliferation. TGF-β1 maintained normal tissue homeostasis and could both suppress and promote tumor proliferation in a time- and concentration-dependent manner [
20,
40,
41]. Within this homeostasis, TGF-β1 broadly controlled the ECM, providing transcription regulation for the following genes:
COL1A1, COL1A2, COL3A1, COL5A2, COL6A1, COL6A3, COL7A1, etc. The ECM is a dense latticework of collagen and elastin that serves as a selective macromolecular filter, it plays a role in mitogenesis and differentiation [
42,
43]. Therefore, abnormal ECM homeostasis is a hallmark of cancer. It may be associated with the dysregulation of various collagens and increased tumor invasion because
COL7A1-produced collagen VII is an essential component of various collagens [
20,
43]. TGF-β1 can up-regulate collagen VII in tissues given normal homeostasis, a high concentration, and long-term exposure to TGF-β1 [
42]. Collagen VII was found to be down-regulated in cancer tissues, and homeostasis was lost through epigenetic transcription regulation [
44], canonical pathway inactivation in TGF-β1 (i.e., TGFR mutation) in cancer cells [
45], or ECM alteration in the tumor microenvironment [
46]. In our study, we found that cancer tissues had significantly decreased
COL7A1 mRNA levels as compared to paired normal tissues, and we also found that cancer tissues with
COL7A1-UCN2 chimera positivity had significantly lower
COL7A1 mRNA levels than the cancer tissues with
COL7A1-UCN2 chimera negativity. These results might support that the
COL7A1-UCN2 chimera generation mechanism may be associated with the down-regulation of
COL7A1 mRNA
, which is reflected the degree of invasiveness found in tumor cells.
The activation of the UCN2/corticotropin-releasing factor receptor 2 (CRFR2) axis signaling can inhibit tumor vascularization, cell proliferation and invasion, and EMT [
21,
47], whereas the mechanism of
COL7A1-UCN2 chimera generation can potentially down-regulate
UCN2 mRNA and thereby cause the loss of its tumor suppressor role. Both UCN2 and CRFR2 belong to the CRH family, which is known to contain the principal neuroendocrine regulators of stress response in the central nervous system [
21,
47,
48]. However, previous studies found that the dysregulation of UCN2/CRFR2 signaling was associated with prostate cancer [
49], non-small cell lung carcinoma [
50], colorectal cancer (CRC) [
21], Lewis lung carcinoma (LLC) [
47], and human adrenal and ovarian tumors [
51]. Specifically, in vivo and in vitro studies found that UCN2/CRFR2 activation inhibited tumor vascularization and cell proliferation and invasion [
21,
47]. Furthermore, in CRC cell lines, the blockage of the UCN2/CRFR2 axis promoted EMT (the altered expression of EMT marker, decreased vimentin, and increased E-cadherin and glycogen synthase kinase 3β expression) via persistent interleukin-6/Stat3 signaling (colonic inflammation regulation) [
21].
The coding regions of both COL7A1 and UCN2 were disrupted or destroyed in
COL7A1-UCN2, and this TIC did not encode a fusion protein. COL7A1 protein includes a Kunitz domain, the deactivation of which induces tumorigenesis [
52]. In the rh
COL7A1 coding region, the Kunitz domain is the first 49 residues in the predicted amino acid sequence of
COL7A1-UCN2, whereas the remaining 96 residues of the Kunitz domain may be disrupted by
UCN2 sequence insertion. In the rh
UCN2 coding region,
UCN2 was frame-shifted, and a discontinuous sequence in the coding region may also disrupt normal UCN2 expression, although
COL7A1-UCN2 includes the complete nucleotides for encoding UCN2 (13–351 nt; 112 amino acids) (Figs.
5 and
6). Therefore, in line with the results of a previous study [
14],
COL7A1-UCN2 produced no fusion proteins or independent transcripts.
The presence of COL7A1-UCN2 in LCs was not the result of stochastic processes. Instead, it was a reflection of DNA damage to a severe degree, and thus it may be associated with poor prognosis. First, we found COL7A1-UCN2 positivity in 13 of 23 LC samples, whereas all 23 paired ANMMT samples were negative. Second, we found consistent, precise RNA junctions in every recurrent validation in all COL7A1-UCN2-positive patient samples. Third, highly expressed genes did not generate TICs randomly. Fourth, a Kaplan-Meier analysis revealed that patients who were positive for COL7A1-UCN2 had significantly worse overall survival times than did those who were negative.
This study had certain limitations. To validate the DNA rearrangements in chromosomes, the use of a standard fluorescence in situ hybridization (FISH) assay necessitated a minimum distance between the two fused genes (100–150 kb) [
53], but the distance between the adjacent ends of
COL7A1 and
UCN2 is less than 20 kb. Thus, we only used long-range RT-PCR to detect the occurrence of
COL7A1-UCN2 cDNA expression in the LC samples. Also, in the AS events, whether the intrinsic TIC-generation mechanism occurs via
cis-splicing or
trans-splicing remains unknown [
29]. Determining whether TICs function as noncoding RNAs or regulatory RNAs in cancer cell lines without protein participation requires further in vitro evidence. Finally, although our patient sample size was small and potential selection bias could exist, our findings on
COL7A1-UCN2 TIC may provide some novel information to help generate new hypothesis for our future study.