Discussion
The aim of this study was to compare transcriptional profiling data generated from colorectal cancer cell lines following treatment with 5-FU using either a leading generic genomic-based microarray (Plus2.0 array) or a disease-specific transcriptomic-based microarray (Colorectal DSA). The Colorectal DSA was developed based on the colorectal transcriptome, which was generated from large-scale in-house sequencing, public data mining and experimental investigation [
32]. The DSA array is a transcriptome based array as opposed to the Plus 2.0 which a genomic based array. Given the greater complexity of the transcriptome in comparison to the genome, it would be expected that an array of this type would detect a greater number of transcripts. When comparing the Colorectal DSA to the Plus2.0 array, the Colorectal DSA contains 37.5% unique information (23,089 probesets), which is not contained on the Plus2.0 array and the aim of the current study was to assess how important this unique information is. One of the benefits of the Colorectal DSA is that it is also based on the Affymetrix GeneChip technology meaning that cross-platform comparisons are possible.
The same experimental design was used for each microarray study, consisting of parental or 5-FU-resistant HCT116 cells either untreated or treated with 5-FU for 24 h. The resultant expression profile generated from the parental cells following treatment with 5-FU was termed as the sensitive experiment, while the expression profile generated from the resistant cells following 5-FU was termed as the resistant experiment. To assess the performance of each microarray platform we compared the complete content (all probesets) of the arrays based on detection (Flags, present or marginal) and detection plus differential expression.
Following analysis of the complete content of the microarrays, the Colorectal DSA outperformed the Plus2.0 array in terms of probesets detected and detected plus differentially expressed and also displayed a lower variance between sample replicates. In addition, the Colorectal DSA identified more pathways in both the sensitive and the resistant experiments when compared to the Plus2.0 array and also identified common pathways important for drug response and also drug resistance, cell cycle, insulin signaling, purine metabolism and pyrimidine metabolism. Indeed, it is not surprising that cell cycle, purine and pyrimidine metabolism pathways were altered following 5-FU treatment in sensitive and 5-FU-resistant cells given the mechanism of action of the drug. Interestingly, insulin signaling was also altered following 5-FU treatment in both sensitive and resistant settings. Previous studies have demonstrated that insulin signaling has an important role in colorectal cancer progression [
33,
34]. Dallas
et al demonstrated that colorectal cancer cells that are resistant to 5-FU and oxaliplatin, by repeated exposure to drug, are more responsive to IGF-1R inhibition than the parental cells [
35], suggesting that insulin signaling is deregulated during the process of acquiring drug resistance. There are a number of reasons that can account of the observed differences in pathway identification between the two platforms, firstly, in terms of the 'complete' probeset analysis, the Colorectal DSA detected more probesets and also more differentially expressed probesets than the Plus2.0 array. More importantly, in terms of those probesets that are unique to each array platform our analysis suggested that the Plus2.0 array detected more probesets than the Colorectal DSA. In terms of pathway analysis we are interested in specific genes, so when we assessed the percentage of probesets that coded for a single gene name, we found that the Colorectal DSA identified many more individual genes than the Plus2.0 array, which identified multiple probesets that coded for the same gene name. Overall, this suggests that the Colorectal DSA was identifying more differentially expressed 'unique' genes than the Plus2.0 array and this accounts for the observed differences in pathway identification between the two array platforms.
We also wanted to examine the microarray specific content of the Colorectal DSA, which was not present on the Plus2.0 array. We found that approximately 50% of the Colorectal DSA specific probesets are in the antisense orientation, which is much higher than expected. Upon further examination of the microarray-specific probesets, we demonstrated that some are expressed in either the sense or antisense orientations only, while a portion (up to 8.9%) are detected in sense:antisense (SAS) pairs. Recently, the publication of the ENCODE pilot project, which aimed to provide a detailed characterization of 1% of the human genome, demonstrated that there is a much higher level of transcription than originally thought and this includes the generation of a high number of non-protein encoding transcripts [
36]. In addition, the literature suggests that approximately 20% of human protein-encoding genes have an associated natural antisense transcript (NAT), however, recent studies suggest that this figure could be much higher [
23,
37‐
40]. NATs can be divided into either cis-acting or trans acting in nature [
41]. Cis-acting NATs are transcribed from the opposing DNA strand at the same genomic locus, while trans-acting NATs are transcribed from separate loci. The cis-NATs can also be further categorized according to their relative orientation and degree of overlap, either 5' to 5' (head to head), 3' to 3' (tail to tail) or fully overlapping [
37,
41]. NATs have been proposed to regulate the expression of their target genes at several levels, but as yet no experimental data has been provided to assign a definite function to NATs. However, some studies using RT-PCR, northern blotting or microarray profiling have validated the expression of antisense transcripts [
23,
38,
39,
42]. Interestingly, some SAS pairs are flanked by the same transcription factor binding sites, suggesting that the SAS pairs may be co-regulated [
41]. Analysis has demonstrated that SAS pairs can display concordant expression patterns, or discordant expression patterns [
37]. In addition, studies have demonstrated that targeting an antisense transcript using a siRNA approach can alter the levels of the sense transcript, by either up-regulating sense transcription or down-regulating sense transcription [
40,
43], so the results are not always as expected. However, the same studies have demonstrated that alterations of the sense transcript does not affect the antisense expression levels [
40,
43].
As previously described, the functional role of these antisense transcripts is currently unknown, but they have been implicated in transcriptional and translational interference, RNA masking, dsRNA-dependent mechanisms, alternative splicing, stability, cellular transport and chromatin remodeling [
37,
40,
41,
44]. However, the functional relevance of antisense transcripts is something that is now commonly accepted [
45‐
47]. Studies have demonstrated that long antisense transcripts function as epigenetic regulators of transcription in human cells [
46]. In addition, studies that have validated the functional relevance of antisense transcripts suggest that they are not a uniform group of regulatory RNAs, but rather that they carry out a wide variety of biological roles [
47]. The utility of a transcriptome-based approach has been demonstrated in the detection of these non-coding antisense transcripts, as this information could be important when examining pathway regulation. Further examination of these NATs may answer a number of important questions such as why when an upstream regulator of a pathway is highly up regulated at the mRNA level do we not see downstream mediators up regulated, or why do the changes observed at the RNA level not always correlate with protein expression? Obviously, a great deal of experimental work would need to take place to assess whether NATS do play a role in gene regulation, but if as we suspect at least some do, we need to not only examine the sense transcripts, but also the antisense transcripts at the same time to get a true view of what is happening in the cell, for example, following drug treatment.
We further examined the 45 SAS pairs that were detected as either present or marginal in the 5-FU sensitive experiment; we decided not to include a fold change filter at this stage as it is not necessarily to have both the sense and the antisense transcript altered to a certain level to see a functional effect. For example, the antisense may be up regulated which leads to the suppression of the sense, resulting in no change in the sense probeset. Overall, when we examined the intensities/expression of the probesets contained within the SAS pairs it was found that ~50% displayed similar intensities, therefore displaying no differential intensities between sense and antisense probesets. However, ~50% displayed discordant or differential intensities, therefore this group of SAS pairs may be the most functionally relevant, however, this will require more experimental testing. Gene ontology analysis demonstrated that these SAS pairs were involved in diverse biological processes, with the most statistically robust involved in oxidative phosphorylation, JAK-STAT signaling, phosphorylation, metabolism, cell death and splicing. We further chose two SAS pairs to examine at the sequence level, they were
SOCS6 and
IGF2BP2. Sequence alignment demonstrated that the full length
SOCS6 transcript aligned exactly with the
SOCS6 gene on the forward strand of chromosome 18. In addition, the full length antisense transcript aligned to the reverse strand of chromosome 18 and demonstrated good tail to tail sequence overlap with the full length sense sequence and the
SOCS6 gene. In terms of
IGF2BP2, the full length sense sequence aligned completely with the
IGF2BP2 gene on the reverse strand of chromosome 3. The full length antisense sequence aligned to the forward strand of chromosome 3 and again demonstrated good tail to tail overlap with the full length sense sequence and the
IGF2BP2 gene. The sequence alignment results demonstrate that the SAS pairs show good overlap in sequence and appear to be cis-NATS that are transcribed from the opposing DNA strand in the same genomic locus. Numerous novel SAS pairs have previously been identified on DSA microarrays and their existence validated with alternative technologies including strand-specific RT-PCR. Functional relevance has also been suggested through analysis of SAS pair expression patterns [
48]. Full characterization of the IGF2BP2 and SOCS6 antisense transcripts will require further work which forms the basis of future studies however; inspection of the sequences with the Ensembl Human Genome Browser supports their existence. Extensive EST evidence exists and appears to suggest a regular exonic structure. Numerous currently unclassified regulatory elements also occur in the region surrounding the sequences. Since both the EST sequencing used in DSA design and the experimental labelling process are polyA-based, it would suggest that the transcripts are polyadenylated, but since the ESTs represent only a fragment of the full transcript, analysis of precise polyA signal location and constitution (i.e. canonical or non canonical) is difficult.
To investigate the clinical relevance of SAS pairs we utilized microarray data generated from pre-treatment (irinotecan/5-FU) metastatic colorectal biopsies with full response data. Following detection filtering we demonstrated that 8 SAS pairs existed (4.8% of total antisense and 3% of total sense probesets). In addition, we demonstrated that 3 SAS pairs existed following detection plus differential expression filtering (4.5% total antisense and 3.4% total sense probesets). Upon examination of the probesets in the sense orientation, antisense orientation and those existing in SAS pairs between
in vitro experiments and clinical experiments, the results demonstrate that there is a high percentage of sense, antisense and SAS pairs that exist between
in vitro and clinical samples. The clinical experiments generated fewer sense, antisense and SAS pairs than the
in vitro experiments, however, a high percentage of those detected in the clinical experiment were also detected in the
in vitro experiments. Taken together, these results suggest that
in vitro experiments do highlight potentially clinically relevant information; however, these types of analysis would require further independent validation. These
in vitro and clinical analyses demonstrate in this disease setting that potentially up to 8.9% of all probesets could exist in SAS pairs; currently there is little investigation to the functional role that these SAS pairs may play. Interestingly, one SAS pair,
IGF2BP2, was found to be common between the
in vitro and the clinical analysis. IGF2BP2 has been demonstrated to regulate translation of IGF2 by binding to its 5'UTR [
49]. In addition, IGF2 is known to be overexpressed in cancer [
50,
51] and specifically, insulin signaling has been demonstrated to play a role in colorectal cancer [
35,
52‐
55]. Given the results from the pathway analysis also identifying the significance of insulin signaling, further experimental investigation into the identified SAS pairs, in particular
IGFBP2, should discover if some or all have functional relevance in this disease setting and whether they are disease-specific or have more widespread effects. The focus of future studies examining the SAS pairs identified from this study will also include questions such as what is their exact function within the cell, are they all functioning in the same way in this disease setting or is it dependent on the specific SAS pair.
One of the limitations of this analysis is that we compared the power of the two microarray platforms using data generated from a single 5-FU-sensitive and -resistant cell line model. While the main focus of the study was to directly compare the data generated from the two microarray platforms based on detected transcripts and pathways and for this a single model cell line would be appropriate, however, a secondary aim was to assess the biological relevance of the colorectal transcriptome and compare this to a generic genomic approach. In this respect the use of a number of CRC cell line models would have given greater insight into the power of such an approach as the problem of tissue homogeneity would have been addressed to some degree. It is widely accepted that cell lines models are not very representative of the primary tumour and to somewhat address these issues we identified the unique biological information, SAS pairs, that was generated using the colorectal transcriptome-based approach and assessed if these occurred in metastatic (liver) CRC patient biopsies. The cell line models identified 45 SAS pairs and when we examined the data generated from the clinical biopsies we found that not as many SAS pairs existed, 8 in total were detected. When we compared the SAS pairs from the cell lines and patient biopsies we found that 7 were in common, therefore ~87% of the clinical SAS pairs were also contained within the cell line SAS pairs list. This would suggest that many of the cell line SAS pairs are lost in the clinical samples probably due to the homogeneity of the cell line model and that those are occurring in the clinical samples may be the most biologically relevant, however, further analysis of these SAS pairs would be required.
Competing interests
Professor Patrick Johnston and Prof Paul Harkin are the Founders and Directors of Almac Diagnostics, Craigavon, UK. Gavin Oliver and Vitali Proutski are employees of Almac Diagnostics, Craigavon, UK.
Authors' contributions
WLA was involved in the conception and design of the study, the acquisition, analysis and interpretation of the data and drafted the manuscript, PVJ was involved in the conception and design of the study, the analysis and interpretation of the data and helped draft the manuscript, GRO carried out the sequence alignments and revised the manuscript critically for important intellectual content; IP carried out the microarray QPCR validations and revised the manuscript critically for important intellectual content; DBL revised the manuscript critically for important intellectual content, HJL revised the manuscript critically for important intellectual content, VP was involved in the conception and design of the study and revised the manuscript critically for important intellectual content, DPH revised the manuscript critically for important intellectual content, and PGJ conceived the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.