Background
Cancer is a complex disease caused by mechanisms that disrupt cell homeostasis in many levels. Such mechanisms include aberrations affecting gene copy numbers, leading to altered gene expression and deregulation of critical signalling pathways. Neuroblastoma is an early childhood malignancy arising from undifferentiated neuroectodermal cells derived from the neural crest. These neural crest precursor cells are committed to differentiate into cells that make up sympathetic ganglia or the adrenal medulla. The most well known genetic alteration in neuroblastoma is the amplification of the
MYC-related oncogene (
MYCN) [
1,
2], which is still the only prognostically significant oncogene amplification in neuroblastoma [
3,
4]. Despite numerous other genetic alterations in neuroblastoma, such as deletions/losses/gains of 1p36, 1q, 2p13-p14, 3p21, 3p26, 3q24-p26, 4q33-q35, 6p11-p22, 11q23, 12q, 14q32, 17q and 19q [
5‐
17], none of these alterations has been consistently shown to have a definite independent value in treatment stratifications. Unfortunately, the main genetic alteration,
MYCN amplification, does not explain the poor outcome of all neuroblastoma patients, suggesting that additional biomarkers of disease progression are still needed.
Here, we present a systematic and rapid genomics data analysis framework, which integrates DNA, RNA and tissue data to identify clinically relevant biomarkers for neuroblastoma. In more detail, high-resolution aCGH was utilized to identify novel genetic alterations in two neuroblastoma cell lines, NGP and IMR-32. Through the integration of gene copy number and gene expression data, the impact of copy number changes on expression levels was determined. Fluorescence in situ hybridization (FISH) on a tissue microarray (TMA) format was used to assess the clinical significance of the identified copy number increase at 12q24.31 in neuroblastoma patients. Finally, we used in silico data mining of publicly available transcriptomics data, to evaluate the transcriptional consequences of the detected 12q24.31 alteration and to identify subsequently activated gene(s).
Methods
Neuroblastoma cell cultures and sample preparation
NGP and IMR-32 neuroblastoma cells were cultured in RPMI 1640 medium supplemented with 10% fetal bovine serum (FBS) and 2 mM L-glutamine, and Minimum Essential Medium supplemented with 10% FBS, 2 mM L-glutamine, 1% non-essential amino acids and 1% sodium pyruvate, respectively. mRNA was isolated from the samples using FastTrack 2.0 mRNA isolation kit (Invitrogen, Carlsbad, CA). Genomic DNAs were obtained from the same samples by swirling a glass rod in the cell lysate, followed by standard phenol-chloroform purification.
Oligonucleotide array-based comparative genomic hybridization
A 95K high-resolution oligonucleotide array (Agilent Technologies, Palo Alto, CA) was used for the detection of copy number changes in NGP and IMR-32 cell lines. Normal male DNA was used as a reference for both cell lines (Cat. # G1471, Promega, Madison, WI). Sample processing and hybridization was performed according to the August 2005 (version 2) protocol (Agilent Technologies), with minor modifications. Briefly, 10 μg of genomic DNA was digested overnight with AluI and RsaI (Life Technologies, Inc., Rockville, MD). Digested DNA samples were subjected to standard phenol-chloroform purification. 4 μg of digested tumor DNA and reference DNA were labelled with Cy5-dUTP and Cy3-dUTP (Perkin-Elmer, Wellesley, MA), respectively, in a random priming reaction using a BioPrime Array CGH Genomic Labelling Module (Invitrogen, Carlsbad, CA.). After labelling, tumor DNA and reference DNA samples were pooled, cleaned and hybridization cocktails were prepared according to the protocol. Hybridization and washes were also performed according to the protocol. A laser confocal scanner (Agilent Technologies) was used to obtain signal intensities from targets and Feature Extraction software (version 8.1.1.1, Agilent Technologies) was used in image analysis with settings recommended by the manufacturer (44K_CGH_0605). The CGH Analytics software was used for analysis of aCGH data (version 3.2.32, Agilent Technologies). Control hybridizations (male vs. male, male vs. female) were used to estimate the baseline variation in the hybridizations, and hybridization quality metrics provided by CGH Analytics were evaluated to ensure good data quality.
Amplifications and losses were defined at loci with log2 copy number ratios >2 (amplification) or <0.5 (loss). Loci with log2 copy number ratio > 3.5 were considered as high-level amplifications.
Gene expression analyses
Gene expression levels from NGP and IMR-32 cell lines were measured using Affymetrix GeneChip Human Genome U133 Plus 2.0 arrays (Affymetrix, Santa Clara, CA). In addition, a pooled sample consisting of 16 different cancer cell lines was analyzed as a reference sample. Sample processing and labelling were performed according to the protocol provided by the Affymetrix. Briefly, 3 μg of mRNA was used for one-cycle cDNA synthesis using a T7-oligo(dT) promoter primer, followed by RNase H-mediated second-strand cDNA synthesis and
in vitro transcription reaction with biotinylated ribonucleotide analogs. Biotin-labelled target cRNAs were fragmented and the quality of labelling procedures was assessed with GeneChip Test3 arrays. Hybridizations to U133 Plus 2.0 arrays were performed for 16 hours at 45°C, followed by automated array washing and staining procedures. Arrays were scanned immediately after staining using a GeneChip scanner (Affymetrix). Basic GeneChip array analysis was performed for the obtained image file. Data preprocessing was done using the MAS5 algorithm implemented in the Bioconductor package "affy" [
18].
Data from both aCGH and gene expression analysis has been deposited in NCBI's Gene Expression Omnibus (GEO) and are accessible through Series record GSE18144.
Integration of the aCGH and gene expression data
Expression ratios for the NGP and IMR32 cell lines were calculated as the log2 ratio of the cell line divided by the reference pool hybridization. A gene was considered over- or underexpressed if its expression ratio was within the top or bottom 7% of ratios in that sample, respectively. Expression and copy number data was integrated as follows: Affymetrix probe sets were mapped to Ensembl gene IDs or the base pair position given by Affymetrix, if no matching Ensembl gene id was found. A copy number ratio for each probe set was then calculated as the median of CGH array oligos located within 50 kb of the probe set's genomic position. In order to investigate the impact of copy number on expression ratios, genes were divided into bins based on their copy number and the frequency of over- and underexpressed genes in each bin was calculated.
Hotspot FISH on a neuroblastoma TMA
For the TMA construction, a total of 37 archival paraffin-embedded neuroblastomas were obtained from the Turku University Hospital and the Tampere University Hospital, Finland. These tumor samples represented a vast proportion of all neuroblastomas diagnosed in these two hospitals during 1967-2001. All the tumors were immunohistochemically stained and microscopically re-evaluated by two experienced pathologists. For histological typing and grading (excluding one intracranial tumor and two cytotoxically pretreated tumors), the INPC definition was applied [
19]. The 34 classified tumors (31 primary tumors and three metastases) comprised 31 neuroblastomas (14 undifferentiated, eight poorly differentiated, and nine differentiating), two ganglioneuroblastomas (one nodular, one intermixed) and one ganglioneuroma. Of these 34 cases, nine (26%) were with favourable, and 25 (74%) with unfavourable histology.
The most proliferative areas of the tumors,
i.e. hotspots, were selected to be sampled for the TMA using core biopsies with a diameter of 2 mm. More detailed TMA characteristics and data on the construction of hotspot neuroblastoma TMA have been described previously [
20]. Twelve bacterial artificial chromosome (BAC) clones from 12q13-q14 (RP11-846E20, RP11-66N19), 12q15 (RP11-1024C4, RP11-77H17), and 12q24.31 (RP11-44F24, RP11-87C12, RP11-94C5, RP11-512 M8, RP11-152E17, RP11-679G17, RP11-1059L20, and RP11-486O12) were labelled with digoxigenin-11-dUTP (Roche Applied Science, Basel, Switzerland) using random priming. A spectrum Green labelled chromosome 12 specific centromere probe (Vysis Inc. Downers Grove, IL) was used as a reference. FISH on the neuroblastoma TMA containing 37 samples was performed as described [
21], except that the fixation of the slides was performed using 7% formalin for 10 min. The BAC probes were detected with anti-digoxigenin-Rhodamine (Roche Applied Science), and nuclei counterstained with 0.1 M 4',6-diamidino-2-phenylindole. The fluorescence signals were scored from non-overlapping nuclei using Olympus BX50 epifluorescence microscope (Tokyo, Japan). The entire tissue area was evaluated and 20-60 representative non-overlapping nuclei were scored. A 1.5-fold increase in the test probe copy number relative to the chromosome 12 centromere was considered as gain in copy number.
MYCN amplification status was assessed using the chromogenic
in situ hybridization technique as described [
20].
Differences between groups in categorical data were analyzed by means of the Pearson chi-square test. Overall survival analysis was computed by means of the Kaplan-Meier method, and the difference between the curves was compared with the log-rank test. A Cox multivariate regression analysis was performed to assess whether age, stage and INPC could confound the results. All statistical analyses were performed with SPSS 16.0.2 for Windows and P values of < 0.05 were considered statistically significant.
In silicoscreening for deregulated genes at 12q24
In order to evaluate the
in vivo effect of the 12q24.31 gain in gene regulation, we performed
in silico data mining of a large collection of data from the integrated microarray data resource GeneSapiens, studying its neuroblastoma and healthy samples. Briefly, GeneSapiens
http://www.genesapiens.org/ is a collection of 9873 Affymetrix microarray experiments. All data is re-annotated and normalized with a custom algorithm so that all data in the database are directly comparable, with extensive biological and clinical annotation. The data is collected from various publicly available sources such as Gene Expression Omnibus and ArrayExpress. The data consists of five different Affymetrix microarray generations and covers 175 different cancers and tissue types. For a more in-depth description, see [
22,
23].
Gene expression levels across the data, including 308-445 healthy nervous system samples and 22-123 neuroblastoma samples, were examined. The exact number of samples available for gene expression data enquiry was dependent on the array generation that was used in the original study. We searched for a gene expression pattern showing higher expression in the neuroblastoma samples compared to the healthy nervous system samples, healthy peripheral nervous system samples, as well as healthy samples in general, indicating gene activation specifically in neuroblastoma. This was done using the Mann-Whitney-Wilcoxon rank sum method (MWW), which estimates the likelihood that the two sets of data come from the same origin. The expression patterns of the known oncogenes
MYCN,
MEIS1 and
ALK were used as positive controls. The data repository accession numbers and references to the corresponding publications have been indicated in Table
1.
Table 1
Origin of the gene expression data used for the in silico analysis.
Neuroblastoma samples
| E-MEXP-83 | 10 | |
| | GSE3960 | 22 | |
Healthy peripheral nervous system samples
| GSE1133 | 10 | |
| | GSE3526 | 8 | NA |
| | GSE96 | 2 | |
Healthy central nervous system samples
| E-AFMX-2 | 27 | |
| | E-TABM-20 | 8 | |
| | GSE1133 | 38 | |
| | GSE1297 | 9 | |
| | GSE2164 | 87 | |
| | GSE2361 | 9 | |
| | GSE3526 | 184 | NA |
| | GSE3960 | 1 | |
| | GSE5388 | 31 | |
| | GSE5389 | 11 | |
| | GSE6955 | 3 | |
| | GSE96 | 17 | |
Discussion
Our integrative microarray-based analysis of gene copy number and gene expression in two neuroblastoma cell lines characterized many previously known genetic aberrations, and revealed a previously poorly characterized amplification of 12q24.31 in neuroblastoma. The significance of the detected 12q24.31 amplicon in clinical samples was further validated using the hotspot neuroblastoma TMA format, which revealed low-level gain of 12q24.31 in 42% of neuroblastoma patients. Patients with 12q24.31 gain had an intermediate prognosis in comparison to patients with MYCN amplification (poor prognosis) and patients having neither of the two alterations (best prognosis) (P = 0.001). Moreover, although the NGP cell line was shown to contain both MYCN and 12q24.31 amplifications, we found only one neuroblastoma in the tissue array analysis with both genetic modifications. Whether the effects of the 12q24.31 gain and MYCN amplification represent actually two alternative routes in neuroblastoma progression, remains, however, to be investigated further in separate studies.
As neuroblastomas display vast intratumoral heterogeneity of various biological variables, including
MYCN copy number differences between cells [
24], an overall estimation of the prognostic impact of copy number changes and other genetic alterations in one randomly selected area of the tumor may be erroneous. Especially low-level copy number increases of prognostically significant genes or genomic regions are difficult to detect. Moreover, since cancer cell lines often originate from a single clone, the results of
in vitro studies with cancer cells lines are not necessarily applicable to intratumorally heterogeneous tumors, such as neuroblastoma. Therefore, on the basis that accelerated cell proliferation is associated with progression-related chromosomal changes, we measured the copy number of the 12q24.31 region in the most proliferative regions of neuroblastoma samples,
i.e. hotspots [
20]. With this sensitive methodology, low-level gain of 12q24.31 was shown to occur frequently and define a biologically distinct subpopulation of patients with intermediate outcome.
Supportive evidence of the 12q24 gain in neuroblastoma has recently been reported by Mosse and co-authors [
25]. In their BAC-array-based CGH study the authors report 32% of the analyzed 82 primary tumors to show low-level gain of 12q24 with minimal common region between 110.8-132.4 Mb from the p-telomere. The reported frequency resembles closely to that detected in our hotspot FISH analysis (42%). Moreover, Mosse and co-authors identified the low-level gain of 12q24 to correlate with aggressive clinical phenotype in a subset of tumors without MYCN amplification. These data are in excellent concordance with the data presented here.
In order to define activated genes within the detected subtle copy number increase, we performed an
in silico analysis of vast amount of existing gene expression data [
22], including a considerable collection of normal peripheral nervous system and neuroblastoma samples. Using this approach we identified elevated expression of five genes located at the 12q24.31 amplicon;
DIABLO, ZCCHC8, RSRC2, KNTC1 and
MPHOSPH9. Among these five genes,
DIABLO showed substantially elevated expression levels in a subset of the neuroblastoma samples, suggesting that it may have an oncogenic role in these tumors. DIABLO has been reported to have proapoptotic effects in cells and therefore overexpression of
DIABLO is believed to cause antitumoral activity via cancer cell sensitization to apoptotic cell death [
26,
27]. Even though the exact role of
DIABLO in neuroblastoma remains to be elucidated, previous contradictory results suggest that
DIABLO has both a direct and inverse correlation to prognosis in various cancers. In cervical and gastric cancer,
DIABLO activation has been shown to associate with unfavourable prognostic parameters [
28‐
30]. On the contrary, in renal cell carcinoma, lung cancer and hepatocellular carcinoma,
DIABLO expression associates with favourable prognostic factors [
31‐
33]. In this sense, the carcinogenetic role of
DIABLO is ambiguous, and needs to be examined more thoroughly, also in neuroblastoma.
In addition to DIABLO, also ZCCHC8, RSRC2, KNTC1 and MPHOSPH9 showed statistically increased expression in neuroblastoma samples when compared to the groups of healthy samples. Although the increase in expression levels of these genes were marginal, which could be argued to be due to impact of the detected low-level gain, and the activation was not neuroblastoma specific, these genes may also have a role in neuroblastoma. Moreover, as only 63% of the genes within the detected amplicon were informative in the in silico analysis, we cannot rule out the possibility that additional oncogenic genes may reside at 12q24.31.
Data integration at the DNA and RNA level provided evidence that several previously known oncogenes were activated due to DNA amplification at corresponding sites. These previously known oncogenes included
MYCN (in the IMR-32 and NGP cell lines), as well as
ALK, MEIS1 and
TEM8 (in the IMR-32 cell line) [
34‐
37], which all located at 2p. In addition to oncogene activations at 2p, distinct high-level amplification sites were mapped at high-resolution along chromosome 12 (at q14, q15, q24.11 and q24.31) in the NGP cell line. In addition to the genes previously implicated in the 12q14 and q15 amplicons, such as
SAS and
MDM2, we found amplification and overexpression of several other genes, including
CENTG1 and
AVIL at 12q14, and
FRS2, CPM and
CPSF6 at 12q15. In addition to
DIABLO, our data revealed several unreported, amplified and overexpressed genes in the 12q24.31 region, locus from which only restin (
CLIP1/RSN) has been previously shown to be amplified in NGP cells [
38]. Within the limits of this study, the detected gene amplifications may serve as additional data in future research on prognostically significant genetic alterations in neuroblastoma.
Recently, few large studies with high-resolution genome-wide copy number analysis have been published [
25,
39,
40]. Some of these studies include, similarly to this study, parallel gene expression analysis allowing the identification of concomitant alterations in copy number and expression level [
25,
39]. The obvious strength of these studies is the vast number of samples included in the analyses. Although the study presented here is limited by the sample size in the initial aberration discovery phase, we show that the utilization of only a few representative model systems can, nevertheless, be used to identify clinically relevant data. Furthermore, the
in silico data mining approach used here for evaluating transcriptional consequences of the detected aberration shows the power of utilization of previously published gene expression data providing a rapid and cost-effective tool for the discovery of new biomarkers.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
MW participated in the design of the study, carried out the aCGH and gene expression profiling, participated in the in silico data analysis and wrote the manuscript. MK carried out the statistical analysis of the FISH data, provided the MYCN status of the tissue samples, and revised the manuscript. RK carried out the FISH analysis. HE performed normalization of the gene expression data and integration of expression and aCGH data. SK and KO collected the in silico gene expression data used for this study and performed the in silico data analysis. AK participated in the design of the study and revised the manuscript. SM participated in the design of the study and revised the manuscript. HH carried out histopathological analysis of the neuroblastoma samples. All authors read and approved the final manuscript.