Background
Colorectal cancer (CRC) is the third most common form of cancer and the second leading cause of death among the cancers worldwide. Studies have shown that countries with medium and high human development index (HDI) are likely to show a rise in the incidence of CRC by 2030 [
1‐
3]. While most sporadic CRC arise through the adenoma-carcinoma sequence, UC-CRC arises through inflammation-associated dysplasia-carcinoma sequence. In either situation, the cancer develops from acquiring hallmark genetic changes in the epithelium of the colon. The genetic alterations that might lead to the development of CRC in either pathway have, by tradition, been largely categorized into chromosomal instability (CIN) and microsatellite instability (MSI) [
4‐
6].
Copy number variations (CNVs) in the cancer cell genome is one of the common mechanisms under CIN by which the expression of genes that contribute to cancer development is regulated and studying this can help in identifying tumor suppressor genes and oncogenes. CNVs are found frequently in the healthy population (common CNVs) too, but some of the CNVs associated with malignancy are known to harbor bona fide cancer-related genes [
7‐
12]. Although genomically altered regions are very common in human cancer, it is often difficult to identify the true cancer gene in such amplicons because of the multiplicity of genes affected [
13‐
15]. Genome-wide studies in different types of cancer, including CRC, have highlighted several important regions and genes involved in human cancer development, which have been significantly altered, by amplification or overexpression [
16‐
19]. Therefore, the comparative identification of such altered regions and the genes within those regions and their role in cancer is essential for better understanding of the pathogenesis of cancer and also for clinical translation.
The incidence and deaths from CRC can be reduced by the early detection and removal of treatable neoplasia but for the lack of established markers specific for both established cancer and precancerous lesions [
20]. Molecular stratification, combined with other strategies, may be suitable to distinguish those with preneoplastic changes from those with early neoplastic changes). Our previous study has shown that CNVs are progressively associated with the development and progression of UC to CRC [
21]. With this background, we analyzed the CNVs involved in UC-progressors and S-CRC as compared to those with nonprogressors, and validated their role in a subset of samples by qRT-PCR and IHC techniques for identification of neoplasia in two of the CRC pathways.
Discussion
From our previous study we observed that CNVs are progressively associated with the development and progression of different stages from UC to CRC [
21]. The present study has identified genome-wide altered CNV regions in tissues of UC-progressors, in comparison with S-CRC. An attempt was made to create a panel of markers, including two genes (
C-MYC and
CCND2) common to both the pathways, along with other correlated genes, which was evaluated in a larger cohort of either condition for their usefulness in the detection of neoplasia in both CRC conditions. The four noteworthy genes from the above qRT-PCR study were combined complimentarily with four reported markers in CRC and were together analyzed for their expression in a subset of both sporadic and UC neoplasia samples. The current study provides an overview of information on genomic aberrations present in UC associated and sporadic neoplasia and possible markers of importance of disease and molecular pathophysiology. These results can possibly help to better understand the CNVs and the genes involved in the adenoma-carcinoma and dysplasia-carcinoma progression.
The current study is from a region known for its lower prevalence of both UC and CRC, but showing an increasing trend in recent times, although the exact prevalence of these diseases is contentious [
22‐
25]. A recent estimation highlighted an increase of CRC by 2.7 % in developing countries like India [
1‐
3]. But clinical and molecular reports on S-CRN and UC-CRN are scarce from this region. The present study is one of its first types to study integrating aCGH, qRT-PCR and IHC analyses of neoplastic changes in both colitis-associated and sporadic neoplasms for identifying major genomic alterations across the two pathways of CRC development. The bioinformatics-based enrichment analysis along with the comparison with TCGA data showed many overlapping CNVs reinforcing the importance of these altered regions and genes associated with them.
Reports on the use of advanced microarray techniques for UC-CRC are uncommon and studies are lacking on the comparative analysis of CNVs in UC and S-CRC. Using aCGH, the present study has demonstrated important unique and common CNVs associated with neoplasia progression in both UC and sporadic neoplastic pathway. One of the comparative studies by Aust and colleagues [2000] on UC and S-CRC using chromosomal CGH highlighted differences in the frequency and timing of individual alterations suggesting various pathways that operate between the two groups [
26]. Earlier studies found that losses in 8p, 15q and 18q and gains in 8q, 13q and 20q were the most common copy number alterations associated with the progression of colorectal adenoma to carcinoma [
26‐
30]. In the current analysis, we found 13q and 20q amplifications in S-CRC alone, but 8q amplifications were present in both UC-P and S-CRC samples. In comparison with S-CRC data, UC-P had noticeably smaller CNV regions with more gain statuses (for example, in chromosomes 7, 8, 12 and 22). Interestingly 15q CNV was one of the common CNVs between the 3 sample groups amplified in UC samples, but deleted in S-CRC. Common CNV regions and genes emerged from integrated analysis of UC-P and S-CRC suggests a common molecular function is regulated in neoplastic epithelial cells. The chromosomal 8q and 12p regions comprises of important functional genes such as
C-MYC and
CCND2 oncogenes and may drive sporadic as well as inflammation associated carcinogenesis. Bioinformatics analysis and other studies too have highlighted the importance of these CNVs and genes [
11,
31]. Thus, these results may help broaden our understanding of the inter-related molecular pathways in the two conditions.
Studies on whole genome aberrations have been attempted to identify and test potential markers for translation, since few markers are currently being recommended for use in the clinical practice [
32]. The cancer genome atlas project (TCGA) is among the major initiative in this aspect and has reported a comprehensive genome-scale analysis of genetic variations across 276 CRC samples [
29]. The overlapping analysis of our aCGH based CNV results with TCGA data has shown many similar CNV regions and these CNVs can be tested across populations.
Much effort has also been devoted to the development of panel of markers based on genetic and epigenetic alterations in different cancers [
33,
34]. We attempted to establish a panel of markers from the CNV regions and validated the same in our patient’s cohort using qRT-PCR. Towards this effort, a 6-gene genomic instability signature for neoplastic changes was designed and validated in both the colorectal cancer types. The 3 genes (C-MYC, CCND2 and FNDC3A) were selected from our data and together with the previously published genes (
MYCN, CCND1 and
EGFR), we generated a panel of 6 genes for validation. Functional pathway enrichment analysis was carried out based on curated database using Ingenuity Pathway Analysis and cBioPortal using TCGA-CRC data. The current panel, considering alterations in at least one marker, was efficient in detecting neoplastic changes in more than 50 % of the samples in S-CRC but was comparatively less in UC-neoplastic samples. Combination of MSI and qRT-PCR panel did not significantly improve the sensitivity of detection. In correlation analysis, we found that
EGFR and
CCND1 raw copy number values are positively correlated with neoplastic changes in both UC and S-CRN samples.
There are several reports on the gene amplifications in CRC that has been correlated to gene expression [
13,
14,
35‐
37]. We tested by IHC using 8 markers which is a combination of previously reported markers and from our qRT-PCR study. Results conclude that
p53,
CCND1,
EGFR,
C-MYC and
FNDC3A were overexpressed more than 50 % of the time in S-CRN samples. Interestingly in UC-HR samples, it was observed that
p53 and
CCND1 were significantly expressed at higher frequency compared to tissues from preneoplastic stages, while C-
MYC and
ERBB2 were expressed at very low frequency.
EGFR and
AMACR expression was more specific towards neoplastic changes and showed a linear relationship with increasing disease frequency.
Fibronectin type III domain containing 3A (
FNDC3A) gene is shown to be involved in major biological function of cell-cell adhesion and is one of the genes from the widely reported 13q CNV region in S-CRC. However, very little is known about the role of this gene in cancer. FNDC3A gene showed amplified copy number status in both aCGH and qRT-PCR, and overexpressed in tissue samples of S-CRC. The functional significance of
FNDC3A warrants further study in adenocarcinoma. In accordance with our previous findings on p53 mutational analysis, to the current IHC results suggest that the p53 pathway is perhaps an early event and Wnt-pathway regulated changes in
C-MYC are in the later phase of colitis associated carcinogenesis [
38]. In clinical practice, assessment of the expression of these markers may help to identify patients with risk of neoplasia, thereby supporting the surveillance strategies and therapy.
Pooled sample-based analysis has been recognized as a cost-effective alternative approach for filtering genetic variance of higher significance, though chances of missing less frequent CNVs exist [
39,
40]. The success of sample pooling based arrays depends upon reducing the overall pooling error however, errors due to array specific variability remains. The important and major CNV regions (e.g. 8q, 13q, 20q amplifications) reported in this study across the CRC genome have been retained even after the pooling. Sampling biases due to tissue heterogeneity and multifocality of epithelium have been the limiting factors in CRC molecular analysis [
40]. MSI and CIN analysis by qRT-PCR could have been affected by these above factors. Another limitation of these assays is that their detection thresholds usually need clonal expansion and broad field effects of the targeted cell population being tested [
41]. The number of patients in each group was relatively low, which requires a careful interpretation of the results. Similarly in the IHC study, the degree of immunoreactivity of each antibody may frequently heterogeneously distributed throughout the tissue sample [
42]. To avoid selection bias during the scoring, we selected the area with the strongest immunoreactivity in each tissue sample [
42,
43]. In order to predict the prognosis and therapeutic outcome, series of studies have established biomarker panels for S-CRC. However, consensus on the suitable biomarkers for early diagnosis remains to be established [
14,
44]. In the current study, we have attempted to simultaneously analyze two CRC related using panel of markers to aid in further understanding of molecular pathogenesis. The study has integrated some of the well-known marker genes along with the relatively new loci from the current study in the analysis as a group and highlighted their importance in early phases of cancer development and detection. These may help in understanding and targeting the different stages of CRC development in UC patients who are on continuous follow-up for their disease evaluation. The surveillance program remains cumbersome and addition of these markers along with clinical follow up to increase the efficiency of neoplasia detection can lead to better and successful screening strategies. Of significance is that this is the only report from India and among a very few elsewhere, to have comparatively analyzed and validated CNVs and the genes together and the expression patterns of markers in both UC and sporadic colorectal neoplasia.
Methods
Experimental design
Study was approved by the Kasturba Hospital Ethics Committee (KHEC No.159/07), Manipal. All the patients provided written informed consent before participation. Tissue samples were obtained from biopsy of the patients, further divided into following groups UC-nonprogressors (UC-NP): 20 UC patients with high risk but without any dysplasia, UC-progressors (UC-P): 08 patients with dysplasia or cancer, and sporadic colorectal cancer (S-CRC): 20 patients. A pool of DNA from 20 (10 male and 10 female) endoscopically and histopathologically normal colon were used as the control samples for all the arrays. For all DNA based assays, DNA was isolated from the tissue using phenol-chloroform method. To search for genetic variations, the experimental design comprised of the hybridization of tissue DNA samples from above mentioned groups of patients against a control pool consisting of the non-tumor tissue.
For validation by qRT-PCR study, UC-HR group comprised of thirty-one patients with UC at risk of associated colorectal neoplasia (≥7 years of extensive colitis or ≥10 years of left-sided colitis) were included in the analysis. These samples were further classified as UC progressor (
n = 14) and UC non-progressor (
n = 17) based on the presence or absence, respectively, of neoplastic changes. The sporadic colorectal neoplasia samples were collected through colonoscopy from 98 patients, of whom 80 were adenocarcinomas and 18 were adenomas. The control group consisted of DNA extracted from 15 men and 15 women subjects with no organic colonic disease (Colonoscopicaly and histopathologically confirmed) (Table
5).
Table 5
Clinical details of the samples in the quantitative real-time PCR validation study
Controls | Normal | 30 | 15:15 | 52.5 (18–72) |
Sporadic Colorectal neoplasia (S-CRN) | Adenoma | 18 | 16:02 | 56.5 (19–79) |
Adenocarcinoma | 80 | 51:29 | 55 (07–88) |
Ulcerative colitis associated colorectal neoplasia (UC-CRN) | UC-Nonprogressor | 17 | 10:07 | 54 (18–68) |
UC-Progressor | 14 | 08:06 | 49.5 (20–69) |
For IHC-based expression analysis in UC-HR, group comprised of 38 samples. Out of these 18 were progressor and among these 18 samples LGD was found in 5, HGD in 9 and UC associated CRC in 4 samples. The comparative S-CRN group comprised of 14 patient samples out of which 4 were primary colorectal cancer and 10 adenoma samples. For IHC experiment, each sample was confirmed with initial Hematoxylin and Eosin (H&E) grading.
Those with S-CRN underwent endoscopic biopsies from affected and normal areas for histology and molecular analysis. The diagnosis of both UC and CRN was made according to established criteria, including clinical symptoms, colonoscopy and histopathology. Human colorectal cancer cell lines CACO-2, COLO-205, HT-29 and HCT-15 were obtained from National Centre for Cell Science (NCCS, India) and DNA extracted from them was used in the initial analysis. The overall study design has been elucidated in (Additional file
2: Figure S1). Briefly, to identify of genome wide CNVs contributing to both UC associated neoplasia and sporadic CRC development, we performed 244 k aCGH experiment. The aCGH results were analysed for common and unique CNVs to both the samples and enrichment of CNVs for functional annotation using bioinformatics tools that overlap with TCGA data and literature was performed. Three genes (
C-MYC, CCND2 and
FNDC3A) were selected from our data and together with previously reported (
MYCN, CCND1 and
EGFR) genes were validated using Taqman CNV based qRT-PCR assay on UC-high risk, sporadic colorectal neoplasia and compared against control samples. Subsequently, the four genes (
C-MYC, CCND1, EGFR and
FNDC3A) from the above qRT-PCR study were assessed along with four previously reported markers (p53, AMACR, ERBB2 and Ki67) for their expression by IHC in both UC and sporadic CRC sample.
aCGH was performed using the Agilent Human Genome Microarray Kit (Agilent Technologies, Santa Clara, CA) microarrays. This array contained 236,381 distinct biological 60-mer oligonucleotide probes, with 1,000 biological triplicates and 5,045 controls spanning coding and non-coding genomic sequences with median probe spacing of 7.4 and 16.5 kb, respectively. The average probe spacing was 6.4 kb was calculated by dividing total repeat-masked genome size by total microarray features. The probe sequences and gene annotations were based on NCBI Build 36.1 of the human genome and UCSC version hg18 released in May 2006.
Microarray analysis
Copy number variation (CNV) analysis of UC-nonprogressor, UC-progressor and sporadic CRC was performed using Agilent high-density 244 K microarray. Briefly, DNA samples were sheared using a cycle of 15 s ‘on’ and 15 s ‘off’ for 15 min in an ultrasonic processor (Thomas Scientific, NJ, USA) with a 2 mm probe with amplitude set at 40. The purified sheared DNA was differentially labeled, test samples DNA (test genome) with fluorescent Cy5 and the pooled normal reference (control genome) DNA with Cy3 dyes. Hybridization, washing and scanning of the arrays were performed according to the manufacturer’s protocol. Feature extracted data was analyzed with Genomic Workbench v5.0 software (Agilent Technologies, CA, USA) using ADM-2 aberration detection algorithm (threshold 5.0) and visual inspection of the log2 ratios (±0.25) [
45]. Gene enrichment, gene ontology and pathway analysis were carried out using GSEA, DAVID, PANTHER, cBioPortal and KEGG bioinformatics tools.
Multiplex PCR based Microsatellite Instability (MSI) Analyses
Microsatellite instability (MSI) status was examined using 5 microsatellite markers (National Cancer Institute, Bethesda Panel). The assay was carried out using the primer sequences and the corresponding fluorescent dyes and PCR as described elsewhere [
46]. In brief, multiplex PCR was performed in a Veriti thermocycler (Applied Biosystems, Foster City, CA) using the following cycling conditions: 95 °C for 2 min, followed by 30 cycles of 94 °C for 30 s, 55 °C for 30 s and 72 °C for 30 s, with a final 45 min, 60 °C extension to aid non-template adenine addition. The PCR products were analyzed using ABI 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA) along with GS500LIZ size standard according to the manufacturer’s instructions. The generated data were analyzed using Genemapper v.4.0 (Applied Biosystems, Foster City, CA). If there was a peak shift or presence of abnormal alleles at zero, one or more microsatellite loci tested compared with the normal control DNA from the same patient, the samples were graded as microsatellite stable (MSS) or microsatellite instable (MSI) respectively.
Copy number determination by quantitative real-time PCR (qRT-PCR)
The number of copies of C-
MYC,
CCND2 and
FNDC3A genes from our data were combined with their correlated interacting partners
MYCN,
CCND1 and
EGFR genes (these genes were found to be within the cut off log2 ratios in our aCGH data) in tumor cell lines and tumor tissue samples from cancer patients was determined by quantitative real time polymerase chain reaction (qRT-PCR). TaqMan® copy number assays (Applied Biosystems, Foster City, CA) were applied and the details of the genes are listed in (Additional file
1: Table S12). These assays were performed on the 7500 Fast Real Time PCR system with Sequence Detection System v2.4 (Applied Biosystems, Foster City, CA, USA) software. Amplification reaction mixtures (10 ul) for each target gene contained template DNA (10 ng), final 1x concentration of TaqMan® universal master mix, TaqMan® copy number assay reagent, and TaqMan® copy number reference assay (RNAseP) in a 96-well plate. The cycling conditions used were 10 min at 95 °C, followed by 40 cycles of 15 s at 95 °C and 60 s at 60 °C. After running each experiment in triplicates, data files containing the sample replicate Ct values for each reporter dye were exported from the real-time PCR instrument software into Copy Caller software v.1, which calculates each sample copy number values based on relative quantitation (comparative Ct method).
Immunohistochemistry (IHC) analysis
The four noteworthy genes
C-MYC, CCND1, EGFR and
FNDC3A from the above qRT-PCR study were combined with four reported markers
p53,
AMACR,
ERBB2 and
Ki67 (these genes were found to be within the cut off log2 ratios in our aCGH data) in CRC (Additional file
1: Table S13). Sections (5–7 micron thick) from formalin-fixed, paraffin-embedded tissue samples were applied to poly-L-lysine coated slides. The sections were dewaxed in xylene and rehydrated and an antigen retrieval step was done. After antigen retrieval by microwaving, immunostaining was performed using the biotin–streptavidin–peroxidase method. Counterstaining was carried out with hematoxylin. Immunostaining for all the antibodies was assessed according to the intensity of staining and divided into four categories: negative (-), weak (+), moderate (++), or strong (+++), with moderate or strong IHC staining being regarded as positive. For staining frequency of these antibodies, the number of positive (moderate or strong) cells were expressed as the percentage of the total number of cells per high-power field and categorized as 5 %–25 %, 25 %–50 %, 50 %–75 %, and >75 %.
Statistical analysis
Statistical significance was defined by P-values of ≤ 0.05. Correlations between copy numbers of the six amplified genes were calculated using Spearman’s rank correlation coefficient (r). Expression patterns of the individual IHC markers were compared between patients with progression to advanced neoplasia and those without progression and other subgroups using Fisher’s exact test or chi-square test, as appropriate. Statistical analyses were carried out using SPSS 15.0 (IBM) and GraphPAD InStat (California, USA) software.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
Conception and design: BMS, CGP and KS. Development of methodology: BMS, CGP and KS. Acquisition of data (provided animals acquired and managed patients, provided facilities, etc.): BMS, HR, SC, LR, VG, BVT, HK, R D, CGP and KS. Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): BMS, HR and SC. Writing, review, and/or revision of the manuscript: BMS, HR, SC, LR, VG, BVT, HK, R D, CGP and KS. Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): BMS, HR, VS, LR, CGP and KS. Study supervision: CGP, LR and KS. Contributed clinical information and patients: BMS, LR, VG, BVT, HK, RD and CGP. All authors read and approved the final manuscript.