Background
Sinonasal Undifferentiated Carcinoma (SNUC) is a highly aggressive disease involving the anterior skull base, nasal cavity and paranasal sinuses. It is a rare tumor, with only a few hundred cases in the literature [
1]. Patients usually present at an advanced stage, and have poor outcomes [
2,
3], with two-year overall survival rates as low as 25% in some cohorts [
1,
4‐
9]. Validated prognostic factors are limited to traditional clinical variables (overall stage, high grade, and poor differentiation), and no additional data on possible informative biomarkers is currently in clinical use [
10]. Current treatment modalities including surgery, radiation, and systemic chemotherapy alone or in combination with radiation (CRT) have poor outcomes and carry significant toxicity to patients [
11‐
13]. A recent study reveals improved survival with chemoselection paradigms, with five-year disease free survival rates of 59% in the total cohort and rates as high as 81% in responders [
14]. However, despite these promising results, patients who did not show initial response to induction chemotherapy had a 0 and 39% five-year DSS when treated with CRT and surgery +/− CRT respectively. These results indicate the urgent need for novel therapeutics particularly for this subset of patients with aggressive disease. Importantly, there have been no novel or targeted agents introduced for SNUC treatment since its initial identification, which is partially due to a limited investigation into the underlying genetics defining SNUC pathogenesis.
To date, only a few case reports describing mutations associated with disease pathogenesis have been published. The most commonly reported mutations include
IDH2 and
SMARCB1 which have been identified in small case series via traditional sequencing approaches or targeted sequencing panels [
15‐
17]. There have been additional case reports of potentially actionable mutations in isolated SNUCs including
ERBB2 and
FGFR1 [
18,
19], but previous efforts have been limited in their scope of sequencing [
4] and currently there have been no comprehensive whole exome or genome sequencing studies performed on SNUCs.
As such, this rare, devastating disease has limited treatment options currently available and characterizing genomic profiles of SNUCs may have significant benefit for the future development of rational therapeutic strategies. By understanding the genomic architecture behind this disease process, we may also begin to identify prognostic biomarkers that help identify the patients that fail current treatment paradigms. Here, we provide survival data from 46 patients treated at our tertiary referral center and report the first whole exome sequences profiling the mutational landscape of SNUCs.
Materials and methods
Patient population
A single-institution retrospective case series informed by a prospectively maintained database of patients with SNUC was performed. The study was approved by the University of Michigan Institutional Review Board (HUM00080561). Patients with a history of sinonasal undifferentiated carcinoma treated at the University of Michigan were included in the clinical dataset (
n = 46). Pathology descriptions for the cohort are listed in Supplemental Table
1a. Inclusion criteria for genomic, copy number and transcriptome analysis is as follows: 1) Patients with sinonasal undifferentiated carcinoma as confirmed by our board-certified pathologist (J.B.M.); 2) Blocks maintained in the University of Michigan pathology archive; 3) Sufficient DNA or RNA yield for next generation sequencing. Additionally, a prospective patient was consented to our University of Michigan IRB-approved MiOTOseq precision medicine program (HUM00085888) as described [
20]. In total, there were 5 patients who met inclusion criteria for analysis, and demographics are shown in Supplemental Table
1b.
Survival analysis statistics
Survival was calculated using Kaplan-Meier analysis and outcomes were compared using Log-rank analysis. Multivariate cox regression analysis was performed using Backward Wald method with an inclusion of variables with p-values < 0.1. Statistical analysis was performed using SPSS v26 (IBM, Armonk, NY). Kaplan-Meier curves were created using Prism v8 (Graphpad, San Diego, CA).
Cell line
The patient derived SNUC cell line, MDA8788–6, was generously provided by MD Anderson. Generation of this cell line was previously described by Takahashi
et.al [
21]. Cells were cultured in a humidified incubator at 37 °C with 5% (vol/vol) CO2 in DMEM with 10% FBS, 1X Pen/Strep, 1X NEAA. Cells were genotyped to confirm the STR profile of the cell line (Supplemental Table
2) as previously described [
22].
DNA isolation
DNA was isolated from formalin fixed, paraffin-embedded (FFPE) samples following the manufacturer’s protocol for AllPrep DNA/RNA FFPE kit (Qiagen, Hilden, Germany) as previously described [
23,
24]. Tumor and adjacent normal regions were identified on H&E stained slides and aligned to tissue paraffin blocks. An 18-gauge sterile needle was used to core 2–4 samples from each region. Deparaffinization was performed using the xylene/ethanol method with the only modification being that samples were digested using proteinase K at 56 °C for 20–24 min. DNA isolation was then completed using the Allprep Isolation kit (Qiagen, Hilden, DE) following manufacturer protocol. Each sample was analyzed using a Nanodrop spectrophotometer for purity (260:230 and 260:280 ratios) and concentration was determined using 1uL of sample with the Qubit 2.0 Fluorometer and measured with a bioanalyzer as described [
25]. DNA extraction for MDA8788–6 cell line was performed using Wizard® Genomic DNA Purification Kit (Promega, Madison, WI).
DNA sequencing
Genomic DNA from each tumor and adjacent normal specimen was submitted for sequencing to the University of Michigan’s DNA sequencing core for exome sequencing using both the DNA TruSeq Exome Library Preparation kit (Illumina, Catalogue number FC-150-100x; SNUC2, SNUC5, SNUC8, SNUC10) and the Roche NimbleGen V3 capture kit (SNUC1). DNA from the MDA 8788–6 cell line was sequenced as described [
26]. Libraries were prepared according to the manufacturer’s instructions. Libraries were then paired end sequenced to 125 nucleotides as part of pool with an average of 4 samples per lane on an Illumina HiSEQ4000 yielding an average depth of greater than 90x per sample.
Exome variant calling
Quality of the sequencing reads was assessed using FastQC v.0.11.5. Because the reads had adapter contamination as well as a high k-mer content at the start of the reads, trim galore v0.4.4 was used to remove adapters and trim reads. Reads were aligned to the hg19 reference genome using BWA v0.7.1. Mapping was followed by marking duplicates using PicardTools v1.79. Base quality score recalibration was done using GATK v3.6 and this was the last step in preparing the reads for variant calling. Samtools v1.2 was used to create pileup files for each tumor-normal pair. Varscan v2.4.1 was used to call variants from these mpileup files using the somatic mode of the variant caller. Goldex Helix Varseq v1.4.6 was used to annotate variants. All variants in the introns and intergenic regions were filtered out. Variants with more than 5 reads supporting the alternate allele in the tumor samples were considered as true positives.
Copy number analysis
Aberration Detection in Tumor Exome (ADTEx) v.2.0 was used to make copy number estimation calls from the pre-processed tumor-normal BAM files which were also used for variant calling. A state from 0 to 4 was assigned by the software based on its estimated copy number. State 0 corresponds to a homozygous deletion, 1 corresponds to a heterozygous deletion. A normal copy number is denoted by state 2. States 3 and 4 represent a gain and amplification respectively.
Microsatellite instability (MSI) detection
MSIsensor was used to detect somatic MSI loci from the tumor-normal sample pairs as described previously [
27]. The software assigns a status to each sample pair based on an instability score calculated based on a threshold of more or less than 3.5% of called microsatellites having alterations. We present this score as well as the overall percentage of microsatellite alterations for each tumor-normal pair.
Sanger sequencing
Excess DNA from above was used to validate mutation calls in novel genes. Primers were designed using MITprimer3 to amplify a small region surrounding the nominated single nucleotide variants (SNVs) as described in Supplemental Table
3. Polymerase chain reactions (PCRs) were optimized for each primer pair on cell line genomic DNA and then used to amplify the regions from tumor and adjacent normal DNA using Platinum Taq DNA High Fidelity polymerase (ThermoFisher, Waltham, MA). PCRs products were then visualized on an eGel (Invitrogen, Waltham, MA) and purified using the a PCR purification kit (Qiagen, Valenica, CA) as described [
28] and submitted for Sanger sequencing at the University of Michigan’s DNA sequencing core. Results were visualized using the LaserGene software suite.
Cell line RNA sequencing and analysis
Total RNA from the MDA8788–6 sample underwent standard QC and was submitted for RNA sequencing to the University’s DNA sequencing core as previously described [
26,
29]. Briefly, the Illumina Stranded RNAseq kit was used and libraries were sequenced on an Illumina HiSEQ4000 using 75 nt paired end approach. Quality of the RNA sequencing reads was determined using FastQC v0.11.5 and we did not identify any quality issues. We then used a two-step alignment protocol of Star v2.5.3a to map the reads and genome index files were first generated using the reference human genome and annotated transcriptome files. In the second step, we then used the index files to guide read mapping. Samtools v1.9 and Picard v2.4.1 were used to retain only uniquely mapped reads and FPKM was computed using Cufflinks v2.2.1 with default parameters, with the exception of modifying “--max-bundle-frags” to 100,000,000. This modification was made to avoid raising of the HIDATA flag at loci that have more fragments than the pre-set threshold for every locus.
Fusion gene analysis
FusionCatcher (v1.00) is a software package designed to look for gene fusions, translocations and rearrangement events using paired end RNA-Seq data and was used to identify novel gene fusions in the MDA8788–6 cell line.
Linked read sequencing
High molecular weight DNA was isolated from the SNUC cell line by lysing 1.5 million cells overnight at 37° with lysis buffer (10 mM Tris-HCl, 400 mM NaCl, 2 mM EDTA), 10% SDS, and a proteinase K solution (1 mg/mL Proteinase K, 1% SDS, 2 mM EDTA). Following overnight lysis, DNA was salted out of the solution with 5 M NaCl for 1 h at 4° and precipitated with ice cold ethanol for 5 h at − 20 °C. High molecular weight DNA was eluted in TE buffer; the quality and integrity of the DNA was assessed using the Tapestation Genomic DNA ScreenTape kit (Agilent). The DNA was submitted to the University’s DNA sequencing core for 10x based linked read library generation and sequencing on an Illumina NovaSeq6000 with 300 nt paired end run. Samples were de-multiplexed and FastQ files with matched index files were generated using Long Ranger Version 2.2.2. Data was visualized using Loupe software package, Version2.1.1 (2.4).
Fusion gene knockdown
All siRNA including ON-TARGETplus Non-targeting Control siRNA, ON-TARGETplus GAPDH Control siRNA, and a custom siRNA targeting the PGAP3-SRPK1 fusion site were purchased through Dharmacon (Lafayette, CO). Each siRNA was reconstituted at a concentration of 1 nmol/50 uL in 1X siRNA buffer (DHarmacon, Lafayette, CO). MDA8788–6 cells were plated at a concentration of 250,000 cell per well in 3 mL growth media. The following day all media was removed and cells were starved in 1 mL of serum DMEM for 3 h. Each siRNA was prepared by adding 400uL of OPTI-MEM (Gibco, Waltham, Massachusetts) with 24uL of siRNA and left to equilibrate for 5 min. Separately, 24uL of oligofectamine (Invitrogen, Carlsbad, CA) was added to 96uL of OPTI-MEM. After 5 minutes the two mixtures are added together and allowed to equilibrate at room temperature for 20 min. Cell were then treated with 250uL of siRNA mixture containing buffer only, Non-targeting siRNA, PGAP3-SRPK1 fusion siRNA, or GAPDH siRNA. After 3 h 2.5 mL of growth medium was added to each well. The following day cells were harvested in 700uL of QIAzol Lysis Reagent (Qiagen, Valencia, CA) and proceeded directly to RNA extraction or stored at minus 80 °C for future extraction. RNA extraction was performed using RNeasy Mini Kit (Qiagen, Valenica, CA) per manufacturer’s instructions. RNA sequencing of the fusion knockdown was also performed as above. Briefly, extracted total RNA from MDA8788–6 NT siRNA and PGAP3-SRPK1 fusion siRNA were submitted to the University’s DNA sequencing core and processed as above (Illumina Stranded RNAseq kit was used and libraries were sequenced on an Illumina NovaSEQ6000 using 300 cycle paired end approach).
Quantitative polymerase chain reaction (qPCR)
Confirmation of successful siRNA knockdown and validation of RNAseq findings were performed with qPCR. Following RNA extraction, cDNA synthesis was performed using SuperScript™ III First-Strand Synthesis System (Invitrogen, Carlsbad, CA) and qPCR was performed using QuantiTect SYBR Green PCR Kit (Qiagen, Valencia, CA) and run on QuantStudio5 (Applied Biosystems, Foster City CA). Targets included
SRPK1,
PGAP3-
SPRK1 fusion,
GAPDH,
HSDL2,
CCND1,
FOXO4,
Beta-Actin,
HRPT, and
RPL-19; primer sequences are listed in Supplemental Table
4. Analysis was performed using the
2ΔΔ-Ct method [
30].
Discussion
Here we report survival outcomes for a cohort of 46 SNUC patients treated with CRT or surgery +/− CRT. Survival analysis from our cohort is congruent with previous reports of low survival rates [
1,
4‐
9] and shows little differences in survival when stratifying by treatment modality. Multivariate analysis revealed that only tobacco use was an independent predictor of poor outcomes in our cohort. The lack of robust clinical predictors highlights the need for more in-depth understanding of molecular markers that may predict treatment outcomes.
In this study, we confirm the presence of previously noted alterations in
IDH2,
SMARC family members, and
ERBB2 from initial targeted sequencing studies. Previous studies have noted high rates of
IDH2 mutations ranging from 55 to 84% [
15,
17,
39] and have identified R172X as a hotspot location. While only 1/5 of our samples contained an
IDH mutation, this did occur at the R172 codon. Similarly, prior studies have cited loss of
SMARCB1 in 33–43% of SNUC samples and have demonstrated worse outcomes in these patients [
40,
41]. Notably, while we did not find a high frequency of
SMARCB1 mutations, we did identified copy number alterations in
SMARCB1 in addition to other SMARC family members. These data suggest that deregulation of the
SWI/SNF nucleosome remodeling complex (consisting of known tumor suppressors
SMARCB1,
SMARCA4,
PBRM1,
ARID1B, and
ARID2), through one of its many components, is a critical step in disease progression in SNUCs [
42].
SMARCB1 has been implicated in numerous other solid cancers as a tumor suppressor gene, including sarcomas, carcinomas and rhabdoid tumors of varying sites [
43]. It appears to have tumor suppressor functions in inhibiting cell cycle and proliferation via the p16-Rb-E2F and Wnt/Beta-catenin pathways, among others [
43].
SMARCA2 function and expression may also play a critical role in response to specific targeted therapies (particularly with EZH2 inhibition) in tumors with
SWI/SNF dysregulation [
44], suggesting a potential role for EZH2 inhibitors in SNUCs. The remainder of our samples however lacked the traditional
SMARCB1 or
IDH mutations implying diversity in SNUC tumorigenesis and suggesting the importance of identifying novel alterations within these SNUC tumors.
Previous isolated reports of SNUCs have identified overexpression or amplification of growth factor receptors [
18,
19,
45] and in this study, we have identified genetic alterations in
ERBB2 and
FGFR family growth factor receptor genes as well as
ALK, suggesting potential targetable option in SNUCs. In a previous study of a SNUC cell line, high
ERBB2 expression was identified with a notable response to ERBB2 inhibition [
18].
FGFR3 alterations have been implicated in head and neck cancer and in vitro and in vivo studies suggest a promising role for FGF inhibition in head and neck tumors [
46‐
48]. Further, a recent publication by Takahashi et al. identified a 34 gene signature differentiating responders from non-responders after induction chemotherapy [
49]. Critical pathways highlighted in this work included
PI3K and
JAK/STAT. Our work similarly identified alteration within
PIK3CG as well as recurrent alterations within the
JAK/STAT pathway. Given the diverse, but potentially actionable set of alterations that our data defined, these results suggest a role for in depth molecular analysis of this rare disease in order to gain insight into molecular alterations that may drive discovery of future therapeutics, and potentially guide individual patient treatment options.
Finally, this study identifies a novel fusion of
PGAP3-SRPK. SRPK1 has been previously characterized to drive cell proliferation, migration, and invasion in colorectal and gastric cancers [
50‐
53] suggesting that the fusion protein may have oncogenic consequences in the SNUC cell line. CNV analysis additionally revealed copy gain in one tumor in the
SPRK1 gene. Knockdown of the PGAP3-SRPK1 fusion gene resulted in changes in expression of
CCND1,
FOXO4 and most significantly a decrease in
HSDL2 and
NAGK suggesting a functional role for this novel fusion gene. Unfortunately, insufficient RNA prevented evaluation of SNUC tissues for presence of the fusion. This is the first study to date to suggest a role of
SRPK-1 in sinonasal undifferentiated carcinoma.
Limitations in interpretation of the novel
PGAP3-SRPK1 gene fusion do exist. For example these results may represent a trans-splicing event such as that described by Li et al. In this paper, the authors demonstrate the presence of chimeric JAZF1-JJAZ RNA in normal endometrial tissue lacking the
JAZF1-JJAZ fusion [
54]. Given we have not yet performed protein validation of the PGAP-SRPK-1 fusion it is possible that this represents chimeric RNA that occurred in a trans-splicing event. Further it is also possible that a DNA rearrangement was missed by our sequencing. We had on average > 20 reads covering the
PGAP3 and
SRPK1 genes (read coverage of each gene is depicted in Fig.
4d and e), with a slight gap in read coverage between exon 1 and 2 of
PGAP3 that corresponds to the potential breakpoint in that gene, so it is possible that the linked read analysis missed the DNA breakpoint because of low sequence ability, or other library specific issues.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.