Background
Clear cell renal cell carcinoma (ccRCC) is the most common histological subtype accounting for 80–90% of all RCCs. Clear cell RCC is associated with few clinical symptoms; i.e. flank pain, hematuria or a palpable abdominal mass, but is nowadays mostly discovered incidentally due to extensive use of computed tomography (CT), ultrasound, and magnetic resonance tomography (MRT) [
1]. This has led to earlier discovery of tumors, and the number of patients with metastases at diagnosis has decreased to 18% in Sweden [
2]. If metastases are present at diagnosis, the probability of 5-year survival (pOS
5yr) may be as low as 10–15% [
3]. Among patients with local disease, the prognosis is better (pOS
5yr up to 90% with modern protocols), but still 20–30% of patients with local disease at diagnosis will develop metastasis after nephrectomy [
3]. In addition to TNM stage and morphological grade [
4,
5], there is a need for new molecular markers to identify patients with high risk of progress.
Genetic alterations in ccRCC show considerable heterogeneity. Loss of chromosome 3p and inactivation of the
VHL (von Hippel–Lindau) gene are frequently observed [
6]. Gain of chromosomes 1q, 3q, 7q, 8q, 20q; loss of chromosomes 1p, 4p, 4q, 9p, 9q, 13q, 14q and loss of whole chromosomes 4, 9, 19, 20 and 22, have all been reported in ccRCC [
7‐
12].
Several studies have aimed at identifying molecular markers that predict survival in ccRCC. Gene expression alterations have been associated with prognosis [
13‐
26], but none of these genes are currently clinically used.
DNA methylation has emerged as an important regulator of gene expression, and has been implicated in both cancer development and progression. DNA methylation on Cytosine-phosphate-Guanine (CpG) sites in promoter regions may alter the affinity of transcription factors for their binding sites, and may also, in combination with chromatin modifications, contribute to silencing of genomic regions [
27]. Altered DNA methylation has been identified as a prognostic marker, as well as a potential target for therapy, in several malignancies [
27,
28]. De novo methylated CpGs in ccRCC assumed to be of relevance for RCC tumorigenesis have been identified, but their clinical value requires further validation [
29,
30]. Arai et al., (2012) and Tian et al., (2014) identified CpG island methylator phenotype (CIMP) panels using the Infinium HumanMethylation27K array and MassARRAY, respectively, that predicted cancer-free survival and overall survival [
31,
32]. In 2015, Wei et al. presented a CpG-methylation-based assay using the Illumina HumanMethylation450K array, calculating a risk score that predicted overall survival independently of clinicopathological parameters in ccRCC [
33].
We have previously shown that genome-wide promotor methylation status can predict survival in ccRCC [
34]. Using Illumina HumanMethylation27K arrays, we found a stepwise increase in methylation with TNM stage and morphological grade. In the present study, we increased the number of patients and performed a detailed analysis of promoter associated CpGs by Illumina HumanMethylation450K arrays. Thereby, we further investigated the prognostic value of alterations associated with tumor progression. Identifying methylation patterns at diagnosis unique for non-metastatic patients with high risk of later progress is important since these patients may need adjuvant treatment and/or more frequent follow up to improve survival.
Methods
The aim with this study was to evaluate the prognostic relevance of DNA methylation in relation to clinical characteristics in ccRCC, with special focus on non-metastatic patients at diagnosis.
Patients and tissue samples
The study cohort consisted of 115 ccRCC patients, primary treated with radical or partial nephrectomy between 2001 and 2009, and diagnosed at the University hospital in Umeå, Sweden. None of the patients received neoadjuvant or adjuvant therapy. Eighty-seven patients were metastasis-free (M0), while 28 had metastases (M1) at diagnosis. Tumor free (TF) tissue samples were obtained from 12 surgically removed tumor bearing kidneys and were considered histologically normal by a pathologist. The tumor and TF tissue samples obtained were snap-frozen in liquid nitrogen, and stored in − 80 °C until analysis.
Patients were followed-up at least yearly by routine clinical and radiological examination in accordance with a scheduled follow-up program. Clinical follow-up data were extracted in August 2017. All patients have given informed consent and the study was approved by the regional ethical review board in Umeå (Dnr 2011–156-31 M, 20110523).
The publically available TCGA-KIRC dataset was used as a validation cohort and clinical information was downloaded from the Broad Institute’s Genome Data Analysis Center Firehose (
http://gdac.broadinstitute.org/). Only unique non-metastatic (M0) ccRCC samples (technical replicates excluded) analysed with Illumina HumanMethylation450K array were included in the analysis (
n = 230). All patients were treated with radical or partial nephrectomy and patients receiving neoadjuvant and/or adjuvant therapy were excluded.
Methylation array analysis
DNA was extracted from the tissue samples as described previously [
35] and was subjected to bisulfite conversion (500 ng of each sample) using the EZ DNA Methylation Kit (Zymo Research, Irvine, USA) according to the manufacturer’s protocol. Bisulfite DNA conversion was verified by MethyLight analysis of the
ALU gene with the ALU-C4 primer/probe set as described [
36]. Genome-wide assessment of DNA methylation was performed using HumanMethylation450K BeadChip arrays (Illumina, San Diego, CA, USA) according to manufacturer’s protocol. To each array, 200 ng of bisulfite-converted DNA was applied, and the arrays were scanned with a HiScan array reader (Illumina). The fluorescence intensities were extracted using the Methylation module (1.9.0) in the Genome Studio software (V2011.1). Pre-filtering and normalization steps are shown in Additional file
1: Table S1 and were performed as previously described [
37,
38], which excluded the X and Y chromosomes, CpGs with detection
p-value > 0.05 and CpG probes that aligned to multiple loci in the genome or were located less than 3 bp from a known single nucleotide polymorphism [
39]. The methylation levels (i.e., the β value) of each CpG sites ranges from 0, corresponding to completely unmethylated DNA, to 1, representing fully methylated DNA. The technical reproducibility of methylation array analysis was monitored by including a replicate sample on each array and the R
2-values ranged from 0.995 to 0.997.
The analysis was restricted to the
n = 155,931 CpGs located in gene promoter regions, i.e. located within TSS1500, TSS200 and 5’UTR, remaining after the initial filtration steps (Additional file
1: Table S1). Twelve TF tissue samples were included as reference samples. The TF-samples showed a high similarity in promoter methylation, the R
2-values ranged from 0.96 to 0.99. A CpG site was determined as differently methylated (DM-CpG) if the absolute value of the difference in beta value between tumor sample and the mean of TF samples (Δβ) was greater than or equal to 0.2. The DM-CpGs in the M0-PF (non-metastatic progression free), M0-P (non-metastatic with later progress) and M1 (metastasis at diagnosis) groups were analysed against known methylation quantitative trait loci (mQTLs) [
40] based on middle-aged individuals to evaluate genetic variants versus cancer specific alterations, and these sites were excluded from further analysis.
The epigenetic mitotic clock described by Yang et al., 2016 was used to estimate mitotic age [
41]. A prognostic Risk score was calculated for 114 out of 115 tumor samples using the CpG-methylation-based assay previously presented by Wei et al., (2015) [
33]. Patients with a Risk score higher than − 0.1 were defined as high risk and lower than − 0.1 were defined as low risk, as previously stated [
33].
DM-CpGs within the M0-PF, M0-P and M1 groups were selected for further analysis if differently methylated in at least 70% of samples (in Fig.
7). Hypermethylated CpGs showed increased methylation compared to TF whereas hypomethylated CpGs were less methylated.
The commonly DM-CpGs (n = 172) in the M0-P and M1 samples were defined as a Promoter Methylation Classification (PMC) panel. The hypomethylated CpGs (n = 51) were mirrored (1 – average beta) and thereafter an average beta of the 172 DM-CpGs were calculated for each sample. A ROC-curve was constructed with PMC average beta as test variable and tumor progression within five years as state variable. Youden index was used to determine the cut off for PMC groups (PMC high/low), and was set to PMC average beta 0.688 (specificity 0.85 and sensitivity 0.55).
The prognostic relevance of PMC classification was confirmed in a separate ccRCC cohort (
n = 230) within the KIRC project of TCGA [
42]. Clinical information along with methylation raw data (Illumina HumMeth450Karrays) were downloaded from the Broad Institute’s Genome Data Analysis Center Firehose (
http://gdac.broadinstitute.org/). Beta values were constructed using the genome studio definition and was normalized for different bead types using BMIQ. Missing values among the 172 CpG-sites in the PMC panel were imputed using the k-nearest neighbours method. Deaths without prior progression were counted as non-events and were censored.
The distribution of hyper- and hypomethylated CpGs within twelve genomic regions with frequent gain/loss in ccRCC (as defined by Köhn et al. 2014 [
9] and listed in Additional file
2: Table S3) was analysed to identify potential overrepresentation of hyper- and/or hypomethylated CpGs within these regions.
Heterogeneity analysis
In six individuals, multiple tumor samples were taken from different locations within the same kidney to study intratumoral heterogeneity. To confirm that the collected samples originated from the same individual, the methylation levels of the 65 built in SNP probes (in the methylation array) were analysed. One of the 6 patients (number 3) was excluded at this step due to signs of contaminated DNA (Additional file
3: Figure S1).
Genomic aberrations
The genetic aberrations were profiled using the total intensity signals of the raw data from the HumanMethylation450K arrays [
43,
44]. Briefly, copy number variation (CNV) analysis was performed in R (v3.4.1) using the Conumee package (v1.9.0) [
45] with data imported through minfi (v1.18) [
46]. Parameters and limits for calling deletion and gain were set for each sample individually through visual inspection.
The validity of this method to identify CNV was confirmed by comparing the methylation results with genetic aberrations identified by HumanCytoSNP-12 v2.1 arrays (Illumina) in a subset of samples (
n = 57 ccRCC) [
9]. The CNV analysis was performed in Genome Studio v1.8 using the Genotyping Module (Illumina). Cohen’s kappa test was used to compare the results from the two array types, which were significantly overlapping (
p ≤ 0.001 for all analysed aberrations), with quality of agreement moderate or good (Additional file
4: Table S2).
The genomic aberrations for all ccRCCs were summarized by investigating the minimal overlapping regions of commonly occurring CNVs in ccRCC [
9]. The CNVs are presented as percentage of samples where alterations across the entire regions were found (Additional file
5: Figure S2 and Additional file
2: Table S3).
RNA preparation and gene expression array analysis
RNA was extracted from 28 tumors using MagAttract RNA Universal Tissue M48 Kit (Qiagen, Hilden, Germany) according to manufacturer’s protocol using BioRobot M48. RNA concentrations were determined by spectrophotometry (NanoDrop, Thermo Scientific, Wilmingron, DE, USA) and quality was analysed using the 2100 Bioanalyzer (Agilent technologies, Santa Clara, CA, USA).
Two hundred ng of total RNA from each sample was used for cRNA production by the Illumina TotalPrep RNA amplification kit (Ambion Inc., St. Austin, TX, USA) according to the provided protocol. The quality of purified cRNA was evaluated using the RNA 6000 p kit (Agilent Technologies) in the Agilent 2100 Bioanalyzer (Agilent Technologies). A total of 750 ng biotinylated cRNA was hybridized to the human HT12 Illumina Beadchip gene expression array (Illumina, San Diego, CA, USA) according to manufacturer’s protocol and scanned using the Illumina Bead Array Reader (Illumina). Expression array data was analysed using the Illumina BeadStudio V2011.1 software and samples were normalized using the cubic spline algorithm. Gene expression levels of the MX2 (ILMN_2231928), SMAD6 (ILMN_1767068) and SOCS3 (ILMN_1781001) genes were extracted from the arrays.
For statistical analysis, the Statistical Package for the Social Sciences (SPSS Inc., Chicago, IL) software version 24, was used. The chi-square/Fisher’s exact test was used to compare differences between subgroups among categorical variables and the Mann–Whitney U test was used for continuous variables. Kruskal Wallis test was used for continuous variables when comparing three groups of samples. Mann–Whitney U test with Bonferroni correction was used in Additional file
6: Figure S4.
Estimates of 5-year cancer specific survival (pCSS5yr) rates and Cumulative incidence of Progress (CIP5yr) in subgroups of ccRCC were obtained from Kaplan–Meier survival tables, and the equality of survival distributions for the groups was compared using the log rank test. The significance level used in all tests was 0.05. In CIP analysis local, regional or distant metastatic progression was the endpoint. In the CSS analysis, ccRCC specific death was the endpoint.
Hierarchical clustering was performed using the Ward’s method and a Euclidean distance metric for clustering samples.
Principal Component Analysis (PCA) was performed in SIMCA version 14 (Umetrics, Umeå, Sweden) after centering the average beta values of 155,931 promoter associated CpG sites.
WebGestalt [
47] was used to analyse functional enrichment of the genes associated with progress of disease (specified in Fig.
7a), using all genes represented by the 155,931 promoter associated CpGs as the background list. The gene functions of potential relevance for ccRCC pathology were selected among the 20 most significant GO Terms (Additional file
7: Table S8).
Discussion
The aim with this study was to evaluate if genome-wide promoter associated methylation classification can be used as a prognostic tool to identify patients with non-metastatic ccRCC at risk for disease progression. These patients may benefit from alternative therapy approaches such as adjuvant therapy and/or more intense follow-up.
Cluster analysis of 155,931 promoter associated CpG sites divided the 115 ccRCCs into two groups (clusters A and B), where the less methylated ccRCC samples in cluster A clustered together with TF samples. Cluster B status associated with higher average promoter methylation, and this group contained a high fraction of metastasized tumors and patients with local disease who later progressed. This confirmed our previous finding of poor prognosis associated with higher promoter methylation status in ccRCC [
34]. Importantly, our current study showed that the prognostic relevance of DNA methylation was limited to M0 patients since the prognosis was very poor regardless of methylation status for patients with metastasized ccRCC at diagnosis.
In order to relate our data with previously described DNA methylation panels for risk stratification in ccRCC, we applied the five CpG-site risk panel defined by Wei et al., (2015) on our cohort [
33]. We found no significant association between cluster status and Risk group classification. The Wei Risk score could neither separate the survival nor the progress prognosis in non-metastatic tumors, but was predictive for CSS when including the metastatic tumors.
To predict incidence of progress in non-metastatic patients, genome-wide promoter associated cluster analysis seems to be more efficient than risk stratification according to the Wei Risk score restricted to five CpG sites .
The previously described CIMP panel by Arai et al., (2012) could not be applied on our cohort since not all CpGs in the CIMP panel were present in the Infinium HumanMethylation450K array used in our study [
31]. However, the Arai CIMP classification has previously shown that high methylation status was associated with poor prognosis, which is in line with our data.
Genomic aberrations are commonly observed in ccRCC and we performed an integrated analysis of methylation status and genomic aberrations in order to identify potential correlations. We used raw data from the methylation arrays to determine genomic aberrations in twelve regions previously defined as harboring common changes (loss or gain) in a subset of our cohort of ccRCC patients [
9]. Using methylation arrays to identify genomic aberrations might introduce systematic bias due to segmentation at the ends of several chromosomes. However, the correlation between the identified genetic aberrations by methylation array and SNP-array analysis revealed comparable results in the subset of samples analysed by both techniques. The number of identified genetic alterations were likely underestimated since we focused the analysis on previously defined regions with aberrations [
9]. Also, the number of DM-CpGs might be underestimated since we used histological normal tumor-adjacent tissue from a kidney with ccRCC as reference samples. Methylation differences have been reported between histological normal tumor-adjacent tissue and tissue taken from healthy individuals [
31].
There was a significantly higher frequency of deletions of 9p, 9q and 14q in cluster B tumors, and these three genetic aberrations have been associated with poorer outcome in ccRCC [
7‐
11,
49,
50]. Importantly, we did not find a general increased number of DM-CpGs in regions frequently lost or gained. Loss of chromosome 9q was the only genetic aberration that was more common in M0-P (and M1) patients compared to M0-PF patients. Previous studies have shown significant correlation of loss of chromosome 9q to both histological grade and TNM stage [
9,
49,
50] as well as to poor outcome [
9]. Patients with 9q loss also showed an enriched number of DM-CpGs within this region. The possible contribution of epigenetic alterations within this region, to poor prognosis, has to be further evaluated.
However, we observed a significant positive correlation of total number of hypermethylated DM-CpGs and total number of genetic aberrations, supporting previous results as shown in a review by Arai and Kanai [
51]. In that study a correlation between number of clones with CNVs and total number of DM-CpGs was shown in a subset of ccRCC samples with high methylation levels. The correlation between number of hypermethylated CpGs, number of genetic aberrations, and predicted mitotic age indicates an accumulation of alterations associated with number of cell divisions. Correlation between genetic and epigenetic alterations was shown previously in both chronic lymphatic leukemia [
52] and in breast cancer cell lines [
53], but less is known about correlations with mitotic age.
The fact that genome-wide promoter methylation cluster status separated the survival time of patients with non-metastatic disease at diagnosis, made us focus on identifying the specific methylation profiles associated with progress. Initially, PCA was used to investigate whether patient groups with different outcome could be separated based on general promoter methylation patterns. This analysis showed overlapping and heterogeneous methylation patterns within the M0-PF, M0-P and M1 groups, in contrast to homogenous methylation patterns within the TF samples. Also, patients with similar outcome showed large inter-individual variations in methylation patterns. This cannot be explained by intra-individual tumor heterogeneity since multiple samples taken within the same tumor showed similar methylation profiles.
By recognizing the commonly DM-CpGs in the M0-P and M1 samples, we identified a Promoter Methylation Classifier (PMC) consisting of 172 CpGs associated with progress. Classification of non-metastatic patients in PMC high/low subgroups showed strong prognostic relevance for progress in our cohort as well as in the validation TCGA-KIRC cohort. Importantly, the PMC panel remained a significant prognostic marker for progression free survival in a Cox regression analysis including PMC status, TNM, morphological grade, age and gender.
The genes in the PMC panel were associated with cellular processes including SMAD protein complex assembly and immune response. Genes of special interest previously associated with various types of cancer where the SMAD6 and SOCS3 genes coupled to aggressive kidney cancer [
54‐
56], and the MX2 gene with suggested role in melanoma pathogenesis [
57]. Interestingly these genes were differently methylated in the M0-P group, compared with TF and M0-PF samples and a significant difference in mRNA levels was observed for the SOCS3 and MX2 genes. These findings indicate a functional relevance of the methylation alterations in ccRCC pathogenesis but needs to be confirmed in larger samples cohorts.
In a systematic review by Joosten et al., 2017, a number of DNA methylation studies in ccRCC were summarized [
30] and the need for validation of identified prognostic markers was claimed. We could not confirm the prognostic relevance of the previously defined Wei risk score (based on five CpG sites) in our cohort. Instead, we could confirm the prognostic relevance of genome-wide promoter methylation cluster analysis (Cluster A/B, > 150 K CpGs), suggesting that larger panels are probably more robust. However, in contrast to genome-wide clustering, a defined set of CpGs (or genes) is likely more clinically suitable. In this study, we defined a PMC panel consisting of 172 CpGs, which was a strong prognostic marker for non-metastatic patients in both our cohort and in the validation cohort. Modern bioinformatics tools that combines DNA methylation classification with clinical prognostic markers is an important next step to implement epigenetic analysis in clinical practice.