Introduction
The incidence of kidney cancer is rising worldwide, especially in Western countries [
1‐
3].
Clear cell renal cell carcinoma (ccRCC), the most common subtype of renal cancer, is characterized by an especially poor prognosis [
1]. While the 5-year overall survival rate for patients with localized disease is 93% [
4], those with metastatic ccRCC have a 5-year survival of only 12% [
4]. Notably, however, approximately 30% of patients with localized disease also develop distant metastases during follow-up [
5,
6].
Surgery still represents the only curative option for patients with ccRCC [
7], but innovative medical therapies are rapidly emerging [
1,
5]. Since treatment effectiveness is contingent upon early discovery of the disease or its recurrence, it is critically important to accurately predict the risk of progression, to determine the frequency and type of follow-up, and to increase the chances of timely and successful therapy [
5].
A commonly used tool to predict ccRCC recurrence is the Leibovich score, which takes advantage of histological and clinical data to profile progression risk in individual cases [
8,
9]. Although this score correctly predicts the 5-year metastasis-free survival rate in 97% of low-risk cases [
8], contrary to prediction, a small number of apparently low-risk patients still develop progressive disease. Other assessment methods, also based on clinical and histological data, have similarly been shown to fail to reliably predict disease progression in sizeable groups of low-risk patients [
10].
Although these patients, classified as low-risk but developing progressive disease [
11], account for < 5% of low-risk ccRCC patients [
8,
12], assuming a broadly similar risk distribution among all European patients, they still represent an approximate annual number of 1500 patients. Failure to identify them prevents their further stratification into subtypes and the timely administration of potentially effective treatments. Once low-risk progressors have been singled out, stratification could also allow for a reduced follow-up for low-risk nonprogressors, freeing valuable resources.
Although initial attempts to characterize low-risk progressors have been made, these results require further validation [
12,
13].
In this context, the objective of our study was to identify prognostic biomarkers of potential clinical relevance for predicting ccRCC recurrence in apparently low-risk patients. To address this issue, we took advantage of next-generation sequencing of tumors from a cohort of patients with progressing tumors, despite a low-risk classification according to the Leibovich score, and from matched nonprogressors.
Patients and methods
Study design
This retrospective study was planned following REMARK biomarker-research guidelines [
14]. As required by REMARK guidelines, it also has to be disclosed that in addition to the experiments described below, RNA sequencing from serum of all participants was also attempted but failed due to RNA fragmentation.
The Regional Ethics Committee (REC) of Western Norway approved the study (REC no. 78-05), and permission for their inclusion was obtained from all participants.
Patients
Tumor tissues were collected from a cohort of 443 ccRCC patients from Haukeland University Hospital (Bergen, Norway). Each sample was initially examined and scored by an experienced renal pathologist according to Fuhrmann grade. Prior to inclusion in this study each patient was subsequently reassessed and rescored, also by an experienced renal pathologist. The second scoring was performed independently of the first score.
The inclusion criteria were low-risk ccRCC assessed by the Leibovich score (between 0 and 2 according to the 2003 version of the score) [
8,
15,
16] and available follow-up data of progression (later occurrence of metastases) or nonprogression (absence of tumor recurrence/metastases); see Additional file
1. Due to an updated Leibovich score being made public, the selected cases were rescored using the updated algorithm [
9]. No sample lost its status as low-risk in the new score.
We selected progressors (n = 8) and included two matched nonprogressors with similar Leibovich scores, Fuhrmann grades, tumor stages and sizes, similar creatinine levels, and similarly underwent surgical tumor removal per progressor sample as controls (n = 16). Patients who were not treatment naïve, had lymph node metastasis, suffered from heart failure (grade ≥ 3 according to the New York Heart Association Classification), used immunosuppressive drugs due to transplantation or suffered from severe rheumatic disease at the time of the biopsy were excluded from the study. All patients showed an estimated glomerular filtration rate (eGFR) > 45 ml/min/1.73 m
2 and a Charlson comorbidity index (CCI) > 1, except for one progressor with an eGFR of 36 ml/min and a CCI of 3. See Table
1 for patient details.
Table 1
Clinicopathological characteristics of all patients
Rcc9 | 51 | Female | Radical | 58 | T1a | 38 | 3 | 2 | 5 | 1544 | 1 |
Rcc8 | 63 | Male | Radical | 60 | T1a | 40 | 3 | 1 | 3 | 1994 | 2 |
Rcc3 | 66 | Male | Radical | 113 | T1a | 35 | 3 | 1 | 3 | 2632 | 3 |
Rcc11 | 66 | Male | Partial | 61 | T1a | 15 | 3 | 2 | 5 | 965 | 4 |
Rcc1 | 72 | Female | Radical | 106 | T1a | 20 | 2 | 0 | 2 | 2680 | 5 |
Rcc2 | 72 | Male | Radical | 109 | T1b | 50 | 2 | 2 | 5 | 2319 | 6 |
Rcc5 | 83 | Male | Radical | 81 | T1b | 50 | 2 | 2 | 5 | 109 | 7 |
Rcc6 | 67 | Male | Radical | 176 | T1b | 48 | 2 | 2 | 5 | 1385 | 8 |
Rcc13 | 34 | Male | Partial | 73 | T1a | 23 | 3 | 1 | 3 | | 1 |
Rcc24 | 47 | Female | Partial | 48 | T1a | 40 | 3 | 1 | 3 | | 1 |
Rcc17 | 54 | Male | Partial | 68 | T1a | 35 | 3 | 1 | 3 | | 2 |
Rcc20 | 66 | Male | Radical | 67 | T1a | 30 | 3 | 1 | 3 | | 2 |
Rcc21 | 57 | Male | Partial | 73 | T1a | 30 | 3 | 1 | 3 | | 3 |
Rcc16 | 62 | Male | Radical | 82 | T1a | 30 | 3 | 1 | 3 | | 3 |
Rcc23 | 74 | Male | Partial | 81 | T1a | 16 | 3 | 1 | 3 | | 4 |
Rcc22 | 66 | Male | Partial | 83 | T1a | 38 | 3 | 1 | 3 | | 4 |
Rcc10 | 78 | Female | Partial | 64 | T1a | 20 | 2 | 0 | 2 | | 5 |
Rcc18 | 68 | Female | Partial | 45 | T1a | 20 | 2 | 0 | 2 | | 5 |
Rcc14 | 72 | Male | Partial | 97 | T1b | 55 | 2 | 2 | 5 | | 6 |
Rcc4 | 63 | Male | Radical | 82 | T1b | 50 | 2 | 2 | 5 | | 6 |
Rcc7 | 76 | Male | Radical | 98 | T1b | 45 | 2 | 2 | 5 | | 7 |
Rcc15 | 75 | Male | Radical | 73 | T1b | 45 | 2 | 2 | 5 | | 7 |
Rcc12 | 63 | Male | Radical | 80 | T1b | 45 | 2 | 2 | 5 | | 8 |
Rcc19 | 68 | Male | Radical | 68 | T1b | 45 | 2 | 2 | 5 | | 8 |
Serum was available from progressors (n = 2) and nonprogressors (n = 6). We also obtained biopsies from the metastasis of 6/8 progressor patients; data not shown.
Tumor specimens and serum collection
Tissues from all 24 ccRCC patients were stored as formalin-fixed and paraffin-embedded (FFPE) samples at room temperature. Serum was harvested from patient blood samples within 1 h after sampling and subsequently stored at − 80 °C.
Four 10 µm sections were cut from the FFPE blocks and used as input, whereas for serum samples, 200 µl was used as input. Total RNA for sequencing and qPCR was extracted, as previously described [
17,
18], using the miRNeasy FFPE kit (cat no. 217504; Qiagen, Venlo, The Netherlands). RNA was extracted from serum with the miRNeasy serum/plasma kit (cat no. 217184; Qiagen) according to the manufacturer’s instructions. Following RNA extraction, samples were stored at − 80 °C.
RNA yield and gene expression analysis
Total RNA concentration was measured using a Qbit RNA HS assay kit on a Qubit 2.0 fluorimeter (Thermo Fisher Scientific, Waltham, MA, USA). Integrity was assessed using an Agilent RNA 6000 Nano kit on a 2100 bioanalyzer instrument (Agilent Technologies, Santa Clara, CA, USA), and DV200 values were calculated.
RNA extraction from tissue specimens yielded an average of 1362 ng/sample, whereas RNA extraction from serum samples yielded an average of 97 ng/sample.
RNA library preparation and sequencing
Sequencing libraries were generated using the TruSeq RNA exome library kit (Illumina, San Diego, CA, USA) according to the manufacturer’s instructions.
Libraries were quantitated by qPCR using the KAPA library quantification kit–Illumina/ABI Prism (Kapa Biosystems, Wilmington, MA, USA) and validated using the Agilent high-sensitivity DNA kit on a bioanalyser. Libraries were normalized to 2.6 pM and subjected to cluster and paired-end read sequencing, performed for 2 × 75 cycles on two NextSeq500 HO flow cells (Illumina), according to the manufacturer’s instructions. Sequencing depth was 30 million reads/sample. Base calling was performed using the NextSeq500 instrument and RTA 2.4.6. FASTQ files were generated using bcl2fastq2 conversion software (v.2.17; Illumina).
TopHat (
https://ccb.jhu.edu/software/tophat/index.shtml) and Bowtie (
http://bowtie-bio.sourceforge.net/index.shtml) were used for assembly of reads and alignment of the contigs to the human genome assembly (GRCh38), respectively. An empirical expression filter was applied, which left genes with > 1 count per million in at least three samples. Trimmed mean of M values [
19] normalization was applied to adjust for variation in library size. Group was used to determine the difference between the two patient groups, and age matching was accounted for as a blocking factor, with one progressor and two nonprogressor samples per age-matched block. Group is here defined as diagnosis, whereas those three age-matched patients constitute a “block” which is factored into the analysis to account for their age-matching.
Genes with a p ≤ 0.05 and an absolute fold change (abs. FC) ≥ 2 were considered differentially expressed. Pathway analysis was performed with Ingenuity Pathway Analysis (v.47547484; Qiagen, Redwood City, CA, USA), with the Ingenuity Knowledge Base used as the reference set. Canonical pathways were sorted by the smallest Benjamini–Hochberg adjusted
p value. Biomarker analysis was performed with the KNN validation package in GenePattern (
http://www.broadinstitute.org/cancer/software/genepattern). Euclidean distance was used as the distance measure, where three neighbors were considered, and leave-one-out internal cross-validation was applied. PCA, hierarchical clustering with Ward’s method, and other data visualization techniques were undertaken using JMP Genomics (v.9.0; SAS Institute, Cary, NC, USA) and GraphPad Prism software (v.9.0; GraphPad Software, La Jolla, CA, USA).
Gene set enrichment analysis
Gene set enrichment analysis (GSEA) was performed with GSEAv4 (
http://www.gsea-msigdb.org/gsea/index.jsp). Normalized gene expression values and their patient group information with the Human Ensembl Gene ID MSigDB 7.4 were tested for enrichment using the KEGG pathway database with 1000-fold permutation of phenotypes, weighted enrichment statistics and signal-to-noise metric for ranking of genes. Gene sets smaller than 15 and larger than 500 were excluded.
Immunohistochemistry
Immunohistochemistry (IHC) was performed on 4-μm-thick FFPE sections with the following primary antibodies: anti-AGAP2 (1:100; polyclonal, rabbit, no. HPA023474; Sigma–Aldrich, St. Louis, MO, USA) and anti-USP10 (1:1000; monoclonal, rabbit, no. ab109219; Abcam, Cambridge, UK, USA). We included a negative control by including a duplicate of another section and omitting the primary antibody. Incubations were performed overnight at 4 °C and pH 6.0 for both antibodies. Sections were counterstained with hematoxylin (no. CS70030-2; Dako, Kyoto, Japan). The slides were stained with both HE and Ki67 to assess the morphology (data not shown). As positive controls during the staining, we used lymphoid tissue, as the protein has been described as highly expressed in tissue
https://www.proteinatlas.org/ENSG00000103194-USP10/tissue,
https://www.proteinatlas.org/ENSG00000135439-AGAP2/tissue.
Survival analysis
Survival analyses were performed using the Kaplan–Meier log-rank and Wilcoxon signed-rank tests to evaluate progression-free survival (PFS) and overall survival, with events defined as progression or lack of progression. Endpoints were progression, death of the patient due to ccRCC, or PFS to the end of the follow-up period for this study (1.2.2020). Analyses were performed using R (v.1.1.383; R Foundation for Statistical Computing, Vienna, Austria; packages: Tidyverse and Survival). Hazard ratios were determined using JMP Genomics (Fit proportional hazards; SAS Institute), and survival curves were generated using SPSS (v.25; IBM Corp.).
qPCR
qPCR was performed using SuperScript IV VILO master mix with ezDNase (No. 11766050; Thermo Fisher Scientific), TaqMan Fast Advanced master mix (No. 4444556; Thermo Fisher Scientific), and the AGAP2-AS1 primer and probe (Hs01096080_s1, no. 4426961; Thermo Fisher Scientific). qPCR was performed on a StepOne Plus real-time PCR system (Applied Biosystems, Carlsbad, CA, USA), with the gene encoding 40S ribosomal protein S13 (RPS13; Hs01011487_g1, no. 4426961; Thermo Fisher Scientific) used to normalize samples. RNA input for cDNA was 20 ng for serum and 50 ng for solid tissue. We used a no template control as negative control.
Statistical analysis
mRNA abundance, qPCR analysis, and correlation plots were generated using SPSS (v.25; IBM Corp., Armonk, NY, USA), with correlations determined using Spearman’s rho test and continuous variables for age, creatinine level, AGAP2-AS1 expression, tumor size, time to metastasis, and categorical variables for sex, Leibovich score, and sample status.
In qPCR, three technical replicates per sample were used to compile an average Ct value, which was used in subsequent analyses. qPCR analysis to determine abs. Fold change (FC) between groups was determined by averaging the normalized Ct values for each group and determining the ∆∆Ct with the averaged values. Significance and p values were evaluated using the Mann–Whitney U test according to the ∆Ct values from each sample.
Categorical variables, such as different nephrectomies, were analyzed with the Chi-squared Test.
Discussion
In this study, we investigated the gene expression profiles of ccRCC patients with an originally estimated low risk of progression who nevertheless developed metastasis during a follow-up period of up to eleven years.
Our main finding is that the level of AGAP2-AS1 expression in tumor tissues from the time of the initial surgery correctly predicts 100% of the nonprogressor group and close to 90% of the progressor group of patients with low-risk ccRCC. Higher expression of AGAP2-AS1 long noncoding RNA in progressors than in nonprogressors was further confirmed by qPCR.
Taken together, these data provide novel tools, contributing to a more effective prognostic profiling of patients with low-risk ccRCC.
Considering the low percentage of low-risk ccRCC patients developing disease progression [
8,
12], it is reasonable to question whether they represent a transcriptomically distinct cohort. Indeed, PCA and hierarchical clustering results indicate that progressors and nonprogressors form distinct groups at the transcriptome level, and the identification of prospective biomarkers using qPCR is of potentially high clinical relevance.
Interestingly, qPCR analysis of serum samples also suggests a higher level of
AGAP2-AS1 circulating RNA in progressors, although the investigated cohort was too small to detect significant differences. If confirmed, these data would support the use of liquid biopsies compared to solid tissue samples in diagnosis [
23]. As the liquid biopsies were taken at the time of surgery, we hypothesized that liquid biopsies taken later, but still prior to recurrence, may have contained higher levels of
AGAP2-AS1 in the progressor patients.
Conventional prognostic models, e.g., the Leibovich score, are characterized by high sensitivity and specificity to predict recurrence in ccRCC [
10]. Furthermore, they are well established and do not require the use of additional sequencing techniques [
9,
10,
12,
24].
However, AGAP2-AS1 overexpression specifically detected metastasizing tumors classified as “low-risk” by the current methods. Combining conventional prognostic models and AGAP2-AS1, easily measured by PCR, would allow for a more accurate estimation of the risk profile of low-risk patients.
Once low-risk progressors have been identified, stratification could also allow for a reduced follow-up for low-risk nonprogressors, freeing valuable resources.
AGAP2-AS1 is a long noncoding antisense RNA previously investigated in a variety of cancers [
25‐
32], where its upregulation was shown to correlate with decreased survival rates [
28,
33]. Accordingly,
AGAP2-AS1 silencing suppresses the proliferation and invasion potential of glioblastoma cells while promoting their apoptosis [
34]. Multiple studies also show that
AGAP2-AS1 knockdown inhibits the proliferation of malignant cells from pancreatic [
27] and hepatic cancers and gliomas [
35,
36] in vitro and in vivo. Moreover, breast cancer cell lines overexpressing
AGAP2-AS1 and showing resistance to trastuzumab were resensitized to its effects following gene knockdown [
26]. Interestingly, in a study comparing metastatic to localized prostate cancer, the
AGAP2-AS1 gene was also found to be upregulated in metastatic cancer tissues [
37]. In ccRCC, using a cohort of n = 611 samples
AGAP2-AS1 was found to significantly correlate with higher tumor stages, prognosis and metastasis in the TCGA dataset, corroborating our independent findings [
38].
In our data, AGAP2-AS1 was not differentially expressed between the original tumor and metastases. This could be taken to indicate stable gene expression over time, as the biopsy from the metastasis was taken an average of 4.5 years after the original biopsy.
A well-known transcriptomic characterization of ccRCC is represented by the classification into ccA and ccB subtypes [
39‐
41]. The ccB subtype displays markedly improved disease-specific survival compared with ccA [
40]. The ClearCode34 risk predictor was developed to forecast the ccA or ccB and prognostic group classification [
42]. According to our data, of the 34 mRNAs used in ClearCode34, only receptor tyrosine kinase-like orphan receptor 2 (upregulated in progressors; abs. FC: 3.81;
p < 0.05) was differentially expressed. This lack of overlap might be explained by the small size of our cohort. However, it is also of note that the ccA and ccB subtypes developed within a more heterogeneous cohort and were not restricted to stage 1 tumors.
In the only other closely related report, Parasramka et al. [
12] also investigated low-risk ccRCC progressors using RNA-seq. However, although their findings were supported by a validation cohort, they did not attempt validation with other laboratory techniques. Of the 10 differentially expressed genes identified in both their discovery and validation sets, only ASPM (abnormal spindle-like microcephaly associated protein) (abs. FC: 4.84; p < 0.05) was also differentially expressed in our study. Of the 20 most upregulated genes in both progressors and 20 most upregulated genes in nonprogressors, none were found in the differentially expressed genes listed by the authors.
AGAP2-AS1 was not included within their published results, but as data from this study are not publicly available, we could not elucidate whether
AGAP2-AS1 was not significant, nondetected and/or not identified. A possible explanation for why
AGAP2-AS1 was not found might lie in their more stringent requirements for significance. They set their cutoff for significance at
p < 0.001 and filtered out any genes with fewer than three patients expressing at 2 counts per million. While we used the more usual cut off at
p < 0.05 and filtered at 1 counts per million in at least three samples. As such they found 92 genes which were differentially expressed in their cohort, compared our 1167. Considering that multiple other studies, including TCGA data show that AGAP2-AS1 correlates with both survival, stage, progression and progression in specific stages [
38], AGAP2-AS1 might well have been excluded by these requirements.
One of the only pathways enriched progressors in the GSEA results was ‘’SPHINGOLIPID METABOLISM’’. In a recent paper, several genes from that pathway were linked to ccRCC progression [
43]. In our data, 5 of those 6 genes fit with the authors’ findings, i.e., the gene direction of the deregulation indicated a worse prognosis in the progressive patients.
Our study has several limitations. Only 8 patients out of the initial 443 (1.8%) were progressors within low-risk ccRCC, mainly due to the low frequency of this subtype. Furthermore, the adjusted
p values were not significant for all mRNAs due to the comparison of intrinsically similar samples, e.g., two forms of histologically identical cancer types from closely matched patients. While a larger cohort is unlikely to overcome the later restriction [
12], the former can and should be corrected in a larger validation cohort.
Another limitation is the follow-up time of the controls. Not all nonprogressors had a follow-up time longer than the longest time until metastasis in the progressor group. However, all pairs of progressors and matched nonprogressors included at least one nonprogressor with a longer follow-up time than the time to metastasis in the matched progressor. Still, we cannot categorically exclude that some of the nonprogressors will turn into progressors, albeit delayed compared to the original group. However, even if this were to occur, we have demonstrated that the groups differ transcriptomically and delayed progression would still be a prognostic model of significant value.
There is also the lack of 100% sensitivity in only using AGAP2-AS1. While the results for using AGAP2-AS1 on its own were good, we could have achieved 100% sensitivity by extended the classifier. We could have added more RNAs which identified the one progressor which was incorrectly classified by AGAP2-AS1, increasing sensitivity, potentially at the cost of accuracy. The decreased accuracy due to an increase in misclassified samples might have been worthwhile, as a high sensitivity allows for the ruling out of disease [
44]. Interestingly, the sensitivity of the 10 classifier models with one to 10 gene members, always stayed at 87.5% (always one Prog sample falsely classified, though not always the same), while the specificity decreased slightly from 100 to 81.25% (in the model with 2 genes), and then to 87% (with three to 8 genes), and then back to 93.75% with 9 or ten genes. Therefore, the number of genes needed to achieve 100% sensitivity, the small sample size and only a single misclassification led us to conclude that the results are best presented a single component classifier. However, a single gene might well be insufficient to act as a classifier in a heterogenous cohort such as ccRCC. A clinical classifier would most likely require the inclusion of additional components in order to be viable, however in our results the best classifier was AGAP2-AS1 on its own.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.