Introduction
Large-scale genomic characterization has confirmed striking heterogeneity underlying the molecular landscape of GBM and has catalogued a spectrum of tumor suppressors and oncogenes affected by deletion, amplification, mutation, and/or rearrangement. Alterations of receptor tyrosine kinases (RTKs) are especially prevalent in GBM. RTKs are a class of mitogenic signaling proteins including epidermal growth factor receptor (EGFR), platelet-derived growth factor receptor-α (PDGFRA) and MET, that are widely implicated in human oncogenesis. Indeed, high-level amplification of the
EGFR locus represents the single most common genomic abnormality in GBM, occurring in ~45 % of all cases, and
PDGFRA and
MET are also frequently amplified, in 10–15 % and ~4 % of GBMs, respectively [
5,
10,
31,
43]. Moreover, these amplification events have been associated with specific disease subclasses, defined by transcriptional and proteomic signatures [
4,
37,
45], implying that molecular distinctions within GBM are, to some extent, mechanistically grounded in dysregulated RTK signaling.
RTK amplification in GBM is often associated with intragenic deletions and gene rearrangements, as well as extracellular domain point mutations [
5,
23,
44]. As many as half of EGFR-amplified GBMs have been reported to express the variant III mutation (vIII), a 287-amino acid in-frame deletion of exons 2–7 in the EGFR extracellular domain (ECD) [
42]. The resulting protein constitutively signals in a ligand-independent manner by forming homodimers or heterodimeric complexes with either wild-type EGFR or other ErbB family members [
12]. EGFRvIII primarily stimulates the oncogenic PI3K/AKT pathway [
17,
29], but is also known to interact with the adapter proteins Shc and Grb2, thereby activating RAS/MAPK signaling [
39]. Additionally, EGFRvIII-expressing tumor cells may exert paracrine influence on their neighbors by secreting either microvesicles containing the protein itself [
1] or mitogenic cytokines like IL-6 and LIF [
19]. Other cancer-relevant functionalities ascribed to EGFRvIII include evasion of apoptosis [
30], tumor cell invasion [
22], angiogenesis [
50] and stem cell self-renewal [
16].
A number of additional EGFR intragenic deletions have been identified. Some, like EGFR vI (exon 1–7 deletion) and EGFR vIV (intracellular domain microdeletion), are rare [
6,
9,
38,
48], while others like EGFRvII and EGFRvV are marginally more common, each accounting for more than 10 % of all GBM-associated EGFR mutations [
20,
28,
32]. The vII deletion includes a small 83-amino acid stretch within the EGFR ECD [
47], while EGFRvV involves a C-terminal truncation that ablates the majority of the protein’s intracellular domain, a region responsible for mediating internalization and degradation [
6,
9,
48]. Functional analyses of both mutations have been complicated by their frequent co-occurrence with EGFRvIII [
10]. However, recent work has demonstrated that EGFRvV is itself capable of transformation both in vitro and in mouse xenografts [
7].
Intragenic rearrangements in
PDGFRA have also been described in GBM. Similar to their counterparts in EGFR, these appear to largely occur in the context of high-level genomic amplification. An in-frame deletion in the Ig-like, extracellular domain of PDGFRA (PDGFRAΔ8,9) has been detected in up to 40 % of PDGFRA-amplified cases and results in constitutive kinase activation in vitro [
21,
36]. Cases of C-terminal truncation (PDGFRAΔCt) have also been reported, although defined functional consequences remain to be established [
40]. Moreover, it has yet to be determined how these mutations correlate with other oncogenic and subclass-defining molecular abnormalities in GBM.
The prevalence of RTK intragenic deletions, particularly EGFRvIII, in significant subsets of GBM has made them both attractive therapeutic targets for immunotherapeutic approaches and promising predictive biomarkers for pharmacologic receptor inhibitors [
26,
35]. In this context, there remains a need to effectively detect and quantify EGFR vIII and related abnormalities in RTKs to power more detailed functional analysis and therapeutic trial stratification. Currently, most clinical labs that assess EGFRvIII status do so using non-quantitative techniques such as immunohistochemistry (IHC) and/or reverse transcription-polymerase chain reaction (RT-PCR) for the mutant transcript. Other intragenic deletions in EGFR and those of PDGFRA are not routinely measured as a component of standard patient care.
To determine the frequency and molecular context of common RTK intragenic deletions in GBM, we profiled 192 tumors from TCGA for EGFRvIII using both quantitative reverse transcriptase PCR (QRT-PCR) and a novel approach based on Nanostring nCounter technology. The latter platform was also employed to assess EGFRvII, EGFRvV, and PDGFRAΔ8,9, in the same sample set. We demonstrate that intragenic deletion mutants, particularly EGFRvIII, comprise highly variable proportions of total RTK expression in a given tumor, ranging from the majority mRNA species to only a minor component. Paired with orthogonal profiling data from TCGA, these findings now represent the most comprehensive tumor-based assessment of RTK deletion mutation in GBM to date, and provide a resource for integrated molecular analysis. Moreover, we find that Nanostring-based analysis performs robustly from formalin-fixed paraffin-embedded tissue (FFPE), thus empowering investigation and characterization of a wide dynamic range of expression of EGFRvIII and other deletion mutations in the context of clinical trials.
Methods
Human tissue and RNA extraction
RNA from TCGA samples was allocated from the Biospecimen Core Resource as 3 μg aliquots and sent to the MSKCC TCGA Pilot Phase Cancer Genome Characterization Center (CGCC). TCGA sample collection and RNA extraction followed published protocols [
5,
44]. An additional independent tumor sample set was used to confirm the fidelity of the assay applied to FFPE, including surgical specimens collected at Memorial Sloan-Kettering Cancer Center and frozen. All patients consented prior to surgery under a protocol approved by the institution’s Institutional Review Board. Patient-matched FFPE tissue for comparison was obtained following routine processing by the Department of Pathology and diagnostic confirmation by a neuropathologist (J.T.H.). RNA was extracted from either crushed frozen tissue or 3–8 10 μm slides using the RNeasy Mini kit (Qiagen).
Quantitative reverse transcriptase PCR
From the TCGA sample set, 275 cases with available RNA were interrogated for relative expression of wild-type EGFR and EGFRvIII by RT-PCR. 400 ng of total RNA was reverse-transcribed using the Thermoscript RT-PCR system (Invitrogen) at 52 °C for 1 h. 20 ng of resultant cDNA was used in a Q-PCR reaction using an 7500 Real-Time PCR System (Applied Biosystems) and custom-designed TaqMan gene expression Assays (EGFRvIII Forward primer: 5′CGGGCTCTGGAGGAAAAG3′; EGFRvIII reverse primer: 5′AGGCCCTTCGCACTTCTTAC3′; EGFRvIII internal primer: 5′GTGACAGATCACGGCTCGTG3′; total EGFR: pre-designed TaqMan ABI Gene expression Assays Hs01076076_m1). Primers were chosen based on their ability to span the most 3′ exon–exon junction. Amplification was carried for 40 cycles (95 °C for 15 s, 60 °C for 1 min). To calculate the efficiency of the PCR reaction, and to assess the sensitivity of each assay, we also performed a six-point standard curve (5, 1.7, 0.56, 0.19, 0.062, and 0.021 ng). Triplicates CT values were averaged, amounts of target were interpolated from the standard curves and normalized to TBP (TATA box binding protein pre-designed TaqMan ABI Gene expression Assays Hs00427620_m1). Efficiency of each reaction was determined from the standard curve of a serially diluted sample using the equation: Efficiency = 10(−1/slope) − 1, where slope is fitted to CT vs. log10 (concentration). Relative quantities of TBP, EGFR and EGFRvIII were calculated from each CT[i] based on the reaction efficiencies and minimum CTs from the standard dilution curves (CTmax) according to the formula: Quantity = (1 + Efficiency)(CTmax−CT). All reactions were performed in triplicate. Samples were rejected if multiple TBP replicates failed to cross threshold in <36 cycles or if the median absolute deviation of quantified TBP across replicates was greater than 25 % (5 of 275 samples). The relative quantities of EGFR and EGFRvIII were normalized with respect to TBP.
Nanostring
The nCounter Analysis System (Nanostring Technologies, Seattle, WA) allows for multiplexed digital mRNA profiling without amplification or generation of cDNA [
13]. Briefly, mRNA is hybridized with pairs of ~50 bp probes complementary to each target. The reporter probe is tagged by a target-specific code of four fluorescent reporters at seven positions along a phage DNA backbone. The capture probe is used for immobilization on a slide and once oriented in an electric field; bound reporters are counted and annotated. A custom probe set was designed as detailed in Supplemental Table S1. Total RNA (150–300 ng) was hybridized with the codeset probes and loaded into the nCounter prep station. The samples were quantified using the nCounter Digital Analyzer.
The Nanostring platform includes negative control probes (not complementary to any endogenous mRNA) to assess background noise associated with the fluorescent barcode optical recognition system. To ensure that all samples were within the optimal range of probe density for image analysis, we confirmed that there was no systemic increase in negative control counts as a function of total number of counts recorded per sample. Raw probe counts were normalized to a panel of 8 control genes (B2M, B4GALT1, CLTC, E2F4, GAPDH, POLR2A, SDHA, and TBP) by taking the ratios of each gene’s counts per sample to the average across all samples and scaling by the median of these ratios in each sample. This normalization factor was also applied to the negative control probes counts. A detection threshold was defined for each sample as five times the mean of the negative control probe normalized counts. Of 192 samples run, three cases (TCGA-02-0021, TCGA-12-0827 and TCGA-19-1386) were excluded from analysis as outliers with low expression of the 8 control genes (possibly representing under-loading or poor hybridization).
C-terminal deletion mutation was inferred by the occurrence of relative underexpression (undercounting) of the exon 28 probe versus the exon 19 probe. The normal (wild-type) linear relationship of counts between these two probes was determined by a linear model fit to the central 90 % of the data. This model was then applied to the entire dataset to identify cases with outlier C-terminal underexpression. These cases fell into in two groups: intermediate expression of the truncation mutant (<60 % of expected c-terminal counts), or high expression (<10 %).
RNA and DNA sequence analysis
RNA and DNA sequencing data (BAM files mapped to hg19) were obtained from TCGA through CGHub. RNA sequencing was analyzed to tabulate EGFR and PDGFRA exon junctions as described [
5]. Briefly, counts were made of all EGFR and PDGFRA reads spanning exon–exon junctions and all paired exonic reads with gaps spanning one or more introns. Only reads with perfect alignment scores (CIGAR score) were considered. To account for 3′ bias in RNA sequence representation, mutant junction counts were compared with counts of normal junctions at the 3′ exon. For example, EGFRvIII expression was defined by counting reads with E1–E8 junctions and comparing to the count of reads with “wild type” E7–E8 junctions. EGFRvII was defined by E13–E16 vs. wild-type E15–E16. PDGFRA D89 was defined by E7–E10 vs. wild-type E9–E10. A junction was counted only if seen in more than one read. Exome DNA sequence data for 291 tumors were analyzed to determine read coverage within the EGFR gene in two regions: exons 2–7 (the EGFRvIII deleted region) and exons 8–22 (spanning the transmembrane and kinase domain regions). The normal ratio of counts between regions was determined by linear regression fit of the middle 90 % of ratios. This model was applied to normalize the ratios and allow accurate estimation of relative copy number of exons 2–7 vs. exons 8–22.
DNA copy number analysis
TCGA Level 3 copy number data (normalized and segmented) were downloaded from the TGCA Data Portal for Affymetrix SNP6.0 data (Broad Institute). Copy number was inferred for exon 6 (within the 2–7 deletion) and compared with that of exon 19 (kinase domain region) to identify relative deletion. Level 2 data (normalized) for Agilent 244k aCGH data (MSKCC) were downloaded parsed into to subsets of probe values: probes residing between the midpoint of intron 1 and the endpoint of exon 7 were taken as representing the deleted region in vIII and these log2 ratios were compared to those of probes residing from the start of exon 8 through exon 21 using Student’s
t test. A
p value of 0.05 was taken as significant (uncorrected for multiple testing). CNA focality, a measure of how many genes are included in simple and complex aberrations, was scored for EGFR in each sample using a Genome Topography Scan method previously described (GTS [
5,
43,
49]).
Statistical analysis
Data analyses were performed in R (
http://cran.r-project.org/). A prospective panel of hypotheses regarding the difference between EGFR-amplified/EGFRvIII+ and EGFR-amplified/EGFRvIII− were evaluated by Fisher’s exact test for discrete events and by a two-sided student’s
t test for continuous variables and
p values were adjusted by FDR. In all cases, the EGFRvIII-high and low positives (EGFRvIII in >1 % of EGFR transcripts), along with EGFR-high positives alone (EGFRvIII in >10 % of EGFR transcripts), were independently compared with wild-type EGFR-amplified tumors. Exploratory searches for differentially expressed genes and miRNAs were performed using empirical Bayes analysis within the Linear Models for Microarray Analysis package implemented in R [
41].
Discussion
The Cancer Genome Atlas GBM initiative has recently completed analysis of a molecularly and clinically annotated dataset of unprecedented detail for over 500 tumors [
5]. This project was initiated in 2006, before the advent of high-throughput DNA and RNA sequencing technologies. As a result, the initial TCGA marker paper in 2008 had no direct measure of intragenic deletion mutations despite these being the most common forms of RTK activation in GBM [
44]. Our study aims to provide this annotation for 189 TCGA tumors, quantified by Nanostring and verified for EGFRvIII quantitatively by RT-PCR. As technology has advanced, TCGA has subsequently performed RNA sequencing for 164 of the most recent cases, 47 overlapping our NS dataset. Together, the NS and RNA-seq data provide a quantitative annotation of common RTK deletion variants for 306 tumors. The ability to cross-reference expression levels of EGFRvIII and other RTK deletions against the clinical and detailed molecular data in TCGA provides a valuable resource to better understand the molecular context in which these mutations are found.
We found no prognostic significance of EGFRvIII expression in the primary GBMs comprising TCGAs dataset. This is consistent with some prior studies performed on independent datasets [
2,
15,
24]. Our global analysis of molecular correlates of EGFRvIII and other deletion mutations revealed that, for the most part, tumors with these mutations were also not distinguished by specific molecular features compared to their wild-type RTK-amplified counterparts. This analysis does not imply that EGFRvIII expression has no molecular effects, but rather that detecting these effects in the TCGA data will require prospective testing of select hypotheses. The TCGA dataset also does not reflect differences in subcellular localization, post-translational modification, or degradation of EGFR protein, any or all of which might be impacted distinctly by vIII mutation [
7,
14,
25]. Nonetheless, the global similarity of EGFR-amplified tumors, whether EGFRvIII positive or negative, suggests that common features are shared by GBMs with EGFR activation by any means, and that neomorphic functions specific to EGFRvIII may not be strongly influential on the tumor phenotypes measured here. In contrast, EGFRvII-expressing GBMs do appear to have an expression signature distinct from most other EGFR-amplified tumors. It is likely that this finding reflects the association of vII mutation with mesenchymal rather than classical transcriptional subclass, as 26/27 EGFRvII signature genes (96 %) were also associated with non-vII-expressing mesenchymal GBMs in the same analysis.
Because RTK mutations are typically associated with gene amplification in GBM, there can be a wide range of expression of mutant and wild-type alleles [
10], and these levels may vary tumor-to-tumor and even cell-to-cell [
19,
33]. Earlier work has shown that multiple mutations can affect a single EGFR allele [
10]. Recent analysis of TCGA RNA-seq data revealed that multiple EGFR deletion and point mutations were often expressed in the same tumor at different allelic frequencies [
5]. We observed a high rate of co-occurrence between different EGFR deletion mutants in our sample set—100 % of EGFRvII and 44 % of EGFRvV-positive tumors also harbored EGFRvIII. The biological significance of multiple coincident EGFR deletion mutations in the same tumor remains unclear. Interestingly, some evidence supports the possibility of functional heterodimerization involving mutant and wild-type receptors, which may play a driving role in the maintenance of EGFRvIII as a minority species in a transformed cell [
11,
25].
In addition to providing a molecular annotation resource, this report describes a transcript-based quantitative assessment of EGFRvIII, along with other deletion mutants operative from a relatively small amount of biomaterial. Our Nanostring-based assay exhibited notable linearity even at low levels of transcript expression and performed well in the context of FFPE starting material. This latter finding, consistent with a number of prior studies, likely reflects the absence of PCR in the Nanostring workflow. Indeed, such signal amplification can accentuate systematic error in quantitative measurements, particularly in the context of compromised starting material. Methods for the routine detection of RTK deletion mutants like EGFRvIII from surgical biopsy material remain poorly standardized and non-quantitative. Immunohistochemistry and/or RT-PCR are the predominant assays used in the clinical setting, with results typically interpreted in a binary fashion as either “positive” or “negative”. While such readouts are practical for certain applications and are currently less expensive, they do not accurately capture the molecular and cellular heterogeneity known to characterize GBM, nor are they readily quantifiable. Moreover, recent analysis has shown that multiple EGFR point and deletion mutations can be expressed in the same tumor at different allelic frequencies [
5].
Overall, our findings agree with prior literature, both in the proportion of cases where Nanostring was suggestive of EGFRvIII—24 % of total and 54 % of EGFR-amplified—as well as the proportion of high-level expressers (10.6 % overall) [
3,
51]. These figures include cases in which EGFRvIII was detected in <1 % of EGFR transcripts, where the biological significance and contribution of technical noise is unknown. Available RNA-seq data from overlapping TCGA samples provided strong cross-validation, as detectable reads for EGFRvIII were present in all but one of the samples designated >1 % by Nanostring. Similar correlations were observed for EGFRvII, EGFRvV, and PDGFRA Δ8,9, albeit on fewer samples. The higher sensitivity of the Nanostring assay to detect mutant transcripts at low expression levels may be related to better coverage depth. In all cases, Nanostring provided markedly higher read counts than RNA-seq (typically 50- to 100-fold greater). Next-generation sequencing costs can only be expected to fall in the coming years, enabling higher read counts routinely. Nevertheless, the limited tissue specimens available in the clinical setting may be insufficient to supply the microgram quantities of RNA typically required for transcriptome sequencing, and a significant proportion of clinical material is FFPE. Thus, assay platforms that are both cost- and resource-effective will continue play central roles in clinical management. Additionally, the ability of the Nanostring nCounter to assess up to 800 mRNAs simultaneously, while not comprehensive, should allow the multiplexing of RTK deletion mutants with a number of other transcripts and gene expression signatures of interest without increasing the required biomaterial.
RTK deletion mutants, along with their wild-type receptors, remain therapeutic targets of considerable potential for GBM. The lack of encouraging clinical results with RTK inhibition thus far may reflect, in part, inadequate drug penetration, lack of molecular stratification in clinical trials and signaling feedback mechanisms [
8,
18,
27]. Cellular and molecular heterogeneity involving wild-type and mutant RTK composition, as we observed in this study, likely complicates strategies to effectively inhibit oncogenic signaling. Indeed, investigations carried out in vitro and in human patients indicate that the inhibitor sensitivity profiles of wild-type EGFR and EGFRvIII are distinct [
46]. In this respect, our findings and those of others support the notion that a successful therapeutic strategy will require the effective inhibition of both mutant and wild-type receptor at concentrations achievable in the target tissue. Indeed, incomplete targeting of EGFR isoforms could simply drive tumor evolution toward a cellular population expressing an untargeted (resistant) variant. For loci that are commonly amplified in GBM, “quantitative genotyping” of the amplicons and their contained mutations may be a requirement to unambiguously establish their value as predictive and prognostic markers, particularly if established pathogenic mutations exist as minority species. Consequently, methodologies such as those described in this report may prove vitally important to standard clinical practice.