Background
Germ cell tumors (GCTs) include germinomas, comprised of testicular seminomas and ovarian dysgerminomas, and nonseminomas, comprised of yolk sac tumors (YSTs), teratomas and embryonal carcinoma [
1]. While GCTs are heterogeneous, they are grouped together due to a presumed common stem cell of origin, the primordial germ cell (PGC). Oosterhuis and Looijenga have proposed classification of GCTs into five distinct entities based on cell of origin, histology, genomic imprinting status, age at and location of clinical presentation, and chromosomal constitution [
2]. Type I GCTs are those found predominantly in infants and young children, often manifesting in the first four years of life and always before puberty. Type II GCTs are most commonly found in the testis of adolescent males and young men following puberty, but are also found in the ovaries of adolescent and young adult women and the midline/brain of children and adolescents. Pathologic evidence confirms that GCTs are the neoplastic counterpart of the PGC [
2], and several lines of evidence indicate that GCTs, including adult testicular GCT (TGCT), begin
in utero [
3]. Thus, alterations in normal embryonic development are likely to be etiologically relevant to GCTs. Of particular interest are the processes the PGCs undergo during normal development, including segregation from the somatic cells, migration to the gonads, complete epigenetic reprogramming, reacquisition of pluripotency and sex determination [
4].
Aberrant DNA methylation has been implicated in cancer etiology, and may be especially relevant in GCTs due to the extensive epigenetic reprogramming that occurs in the germ line and early embryo during normal development [
5]. Adult TGCTs have been studied most thoroughly in the context of DNA methylation, and thus a majority of our knowledge regarding methylation is limited to these tumors. Interestingly, methylation patterns in GCT differ by histologic subtype in both adults and children [
6‐
16]. In general, methylation increases with tumor differentiation: the lowest levels of methylation occur in the embryonal carcinomas and the highest in the teratomas [
6,
7,
10,
11,
13,
15‐
18]. Understanding methylation patterns in GCTs, overall and by histologic type, may identify the developmental stage at which the tumor arose. This knowledge in turn may identify the at-risk period when external exposures are most harmful.
MicroRNAs (miRNAs) are small endogenous noncoding RNAs that regulate gene function in a manner specific to cell type and developmental stage [
19‐
23]. Differential miRNA expression is associated with human cancers [
24‐
28], including GCTs in children and adults [
29‐
35]. These studies have reported higher expression of miRNAs in the miR-371–73 and the miR-302 clusters and lower expression of let-7 in Type I and Type II GCTs compared to normal samples [
29‐
37]. Alterations in the serum levels of the miR371–3 and miR-302/367 MiRNAs also show promise as a diagnostic and follow-up tool for TGCT patients [
38], highlighting the potential translational impact of molecular evaluation.
Knowledge of stem cell biology is directly relevant to mechanisms of GCT tumor initiation, maintenance and metastasis, since reacquisition of pluripotency is a key step in early germ cell development [
39]. Typically, expression of stem cell markers (e.g., OCT3/4, STELLAR, NANOG, LIN28) is induced following demethylation of early stage germ cells [
6,
17] and is turned off following entry to meiosis [
40‐
42]. Expression of pluripotency markers past the appropriate developmental stage is a hypothesized explanation for tumorigenesis in germ cells [
41]. Notably, studies of adult TGCT have shown aberrant expression of stem cell markers in intratubular germ cell neoplasia (IGCNU), the precursor of TGCT, and in undifferentiated histologic subtypes of GCTs (seminomas and embryonal carcinomas) [
43,
44]. Stem cell markers are also expressed in early germ cells in females [
45‐
47] and have been detected in ovarian dysgerminomas [
48]. Marker expression past the appropriate developmental stage is correlated with genetic variation, including mutation in
c-KIT [
48] and its ligand (
KITLG) [
49], and DNA methylation [
41]. Given that pediatric GCTs likely originate from a germ cell at an earlier stage of development than adult TGCTs [
2], stem cell marker expression may be particularly relevant in pediatric tumors.
As described above, previous studies have described variation in methylation, miRNA, and mRNA expression in Type I and Type II GCTs, including studies that have evaluated the interaction between miRNA and mRNA expression [
29,
35]. To further explore relevant molecular interactions, we used an integrated approach to understand differences in promoter methylation, miRNA expression, stem cell gene expression, and genotype data by tumor characteristics in a series of GCTs. We evaluated correlations between data based on the assumption that these processes are linked and co-regulated (for example, epigenetic changes within promoter regions and expression of cognate miRNA species determine the level of mRNA). We also find differences in miRNA expression and stem cell gene expression by tumor histology.
Methods
Study samples
Type I and Type II GCT samples from males and females were obtained from the Cooperative Human Tissue Network (Columbus, OH). Tumors were resected at initial diagnosis and snap frozen at −70 °C. Pathology reports were also provided. Data were available for tumor histology (YST, teratoma, germinoma, or mixed/other), tumor location (gonadal or extragonadal), sex, and age at diagnosis (< 10 years, ≥ 10 years). The age categories were chosen based on tumor histology. The majority of the tumors diagnosed between the ages of 4 and 10 were of similar histology to the Type I tumors while tumors diagnosed after age 10 included histologic subtypes typically included in the Type II category. Normal adjacent tissue was also available for seven of the tumors in our case series.
This analysis used existing data with no personal identifiers; therefore, the study was deemed exempt from review (category #4) by the Institutional Review Board of the University of Minnesota.
DNA and RNA extraction
Genomic DNA was isolated from GCT tissue and paired normal adjacent tissue (when available) using either the TRIzol® extraction method (Invitrogen Life Technologies, California) or a QIAamp DNA Mini Kit (Qiagen Sciences, Maryland) according to the manufacturer’s recommended protocol. DNA yield was quantified using 1 μl DNA on a NanoDrop™ spectrophotometer (Thermo Scientific, Maryland). Extracted DNA was stored at −80 °C until further analysis.
Total RNA was extracted from fresh frozen tissue using the TRIzol® extraction method (Invitrogen Life Technologies, California) according to the manufacturer's protocol. Following extraction, RNA was cleaned using the RNeasy Mini Kit (Qiagen, Maryland) according to the manufacturer's recommended protocol. RNA yield was then quantified using 2 μl on a NanoDrop™ spectrophotometer (Thermo Scientific, Maryland). Extracted RNA was stored at -80O C until further analysis.
Methylation analysis
DNA methylation was measured as previously described [
16]. Briefly, prior to methylation analysis, 1 μg genomic DNA was treated with sodium bisulfite to convert unmethylated cytosines to uracil using the EZ DNA Methylation Kit (Zymo Research, Orange, CA) according to manufacturer’s protocol. DNA methylation at 1505 CpG loci in 807 cancer-related genes was evaluated using the GoldenGate Cancer Methylation Panel I (Illumina, Inc.) in the University of Minnesota Genomics Center following the manufacturer’s protocol as described [
50]. Replicates were included, including four duplicates that were included on both arrays and five duplicates that were included within one array.
Methylation was calculated as the variable β, which is the ratio of the fluorescent signal from the methylated allele to the sum of the fluorescent signals of both methylated and unmethylated alleles [
50]. GenomeStudio software (Illumina, Inc) was used to calculate the average methylation values (β) from the ~30 replicate methylation measurements for each CpG locus. We used raw average β values without normalization. GenomeStudio software was also used to assess data quality for each CpG loci. We omitted all CpG loci where ≥ 25 % of the samples had a detection
p-value > 0.05 (
N = 16, 1 %). X-linked CpG loci (
N = 84) were also removed, resulting in 1,405 loci for analysis.
miRNA expression
Expression of 800 miRNAs was measured using the NanoString nCounter miRNA Expression Assay kit (NanoString Technologies, Seattle, WA). The nCounter detects total counts of miRNA through hybridization with fluorescently labeled bar coded probes to the miRNAs of interest followed by scanning and counting to quantify expression [
51]. Total RNA samples were analyzed following the manufacturer’s instructions for the Human v2 miRNA Expression Assay Kit (NanoString Technologies, Seattle, WA).
Stem cell factor expression
Real-time quantitative PCR gene expression profiling was performed for 84 pathway-specific genes using the human Embryonic Stem Cells RT2 Profiler PCR Array according to the manufacturer’s protocol (SABiosciences, Frederick, MD). Briefly, the RT2 First Strand Kit (SABioscience, Frederick, MD) was used to synthesize cDNA from 1 μg purified RNA. cDNA was obtained with a High Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Foster City, CA), and quantitative PCR (qPCR) was performed with a StepOnePlus Real-Time PCR system (Applied Biosystems, Foster City, CA). Expression of all genes was normalized to average expression of five endogenous housekeeping genes (E2M, HPRT1, RPL13A, GAPDH, and ACTB).
Genotyping
Genotype data were generated for four SNPs identified in GWAS of adult TGCT as previously described [
52]. Briefly, PCR amplification and sequencing were performed for SNPs in three genes:
SPRY4 (rs4324715),
BAK1 (rs210138) and
DMRT1 (rs755383). The
KITLG (rs4474514) SNP was detected using a made-to-order TaqMan® SNP Genotyping Assay from Applied Biosystems Inc (catalog# 4351379, assay# C_26154778_10). Primers and conditions for all assays are available upon request.
Statistical analysis
miRNA analysis
We used NanoStriDE for normalization and differential expression analysis of the miRNA data [
53]. Positive control normalization was conducted by creating a normalization factor for each sample using the 6 positive assay controls on the array and negative control normalization was conducted as an upper quantile approach as recommended by the manufacturer (NanoString Technologies, Seattle, WA). Discrete count data were compared across demographic and tumor characteristics using a negative binomial distribution as described by Anders and Huber [
54]. We included a Benjamini-Hochberg correction for multiple comparisons [
55].
Stem cell factor expression analysis
Raw gene expression values were normalized to endogenous housekeeping genes prior to statistical analyses. Unsupervised hierarchical clustering was conducted using the matrix visualization and analysis platform Gene-E with the city block metric and average linkage [
56]. Fold change of gene expression was determined using the 2
(−ΔΔCt) method, and compared YST (
n = 9) to germinomas (
n = 8).
Patient age and sex, tumor location (ovary, testis, and extragonadal), and histology (normal adjacent, teratoma, dysgerminoma, YST and mixed) data were combined with molecular data across a common set of 40 samples into a single two-dimensional matrix. Categorical phenotype and genotype values were arranged in a logical manner (e.g., low severity to high severity for tumor histology) and assigned a numerical code that could be used for correlation analysis. These data were combined with the β values from the methylation analysis, the miRNA counts, and the normalized ΔCt values from the SABiosciences Embryonic Stem Cell array. Additionally, genotypes for the four molecular markers listed above were encoded based on the Kimura matrix [
57] idea that transitions are more likely than transversions and hence grouped together where possible.: −4.0 = t/t; −3.0 = c/t; −2.0 = c/c; −0.5 = g/t; −0.2 = c/g; 0.2 = a/t; 0.5 = a/c; 2.0 = a/a; 3.0 = a/g; 4.0 = g/g.
Two types of correlations were explored using the vector of values for each matrix row (i.e., molecular probe or phenotypic label) against the corresponding vector of values for each of the other rows: standard linear correlations using the Pearson correlation coefficient and more complex nonlinear correlations using the Maximal Information Coefficient (MIC) [
58]. MIC analysis was performed using the R implementation of the MINE software package (
http://www.exploredata.net/). This resulted in two values for correlation (linear, nonlinear) for every pairwise combination of molecular probes and/or phenotypic variables. This very large result matrix was filtered to retain only those pairs that represented a comparison across platforms (or between a phenotypic variable and a platform) and whose Pearson correlation
or MIC values exceeded a threshold of 0.75. This threshold was chosen as it provided a compromise between very dense graph connectivity at lower thresholds and sparse connectivity at high cutoffs.
The resulting pairwise correlations were visualized as a network using Gephi software (
https://gephi.org/). In order to visualize the correlation networks, the molecular probes and phenotypic variables served as nodes in the network graph, and an edge was drawn between any nodes that had either a linear or nonlinear correlation that exceeded 0.75. The edge was labeled with the larger of either the Pearson
R-value or the MIC. These data were loaded into Gephi, and the Force Atlas 2 layout was executed, running until the nodes were far separated in apparent equilibrium, and then the Fruchterman Reingold layout was selected. All pairwise correlations were grouped by single linkage clustering to create networks. Network hubs with four or more neighbors were identified.
The hub nodes and their nearest neighbors were concatenated into a list, and the standard gene symbols extracted. These gene symbols were uploaded to Ingenuity Pathway Analysis software (IPA,
http://www.ingenuity.com/products/ipa), and investigated for upstream activators. To guard against the possibility that functional categories were identified by chance due to the
a priori bias in the initial set of profiling molecules, we randomly selected three additional sets of the same size from the same initial profiling molecules, and submitted these random gene lists to IPA.
The entire analysis process was repeated using subsets of the data. In the entire data set of 40 samples with complete genomic data, it was noted via manual analysis of scatterplots of primary hub genes that differences in the 8 YSTs relative to the other samples dominated the signal. Following the assumption that the YSTs were overshadowing any signal from other histology types, we tested a subset consisting of the 32 non-YST samples.
Discussion
In this analysis, we used a novel method to integrate molecular data across platforms to gain biological insight into the function of GCTs. These analyses highlighted several network hubs, including miRNA clusters and stem cell genes that distinguish YST from normal germ cell samples and other GCTs. Importantly, our approach confirmed several previously reported alterations in embryonic stem cell specific miRNAs. Finally, our analysis of stem cell factor expression highlights the altered expression of multiple stem cell genes in GCTs, with distinct patterns in YST and dysgerminoma. Collectively, these findings suggest that ectopic, aberrant expression of stem cell genes may underlie the unusual and defining capacity of self-renewal in the face of wide differentiation into cells with characteristics of tissues derived from any of the three germ layers observed in GCT, and it is possible that higher levels of stem cell gene expression correlate with tumor progression and prognosis of GCT and other tumors.
Our findings are consistent with previous studies of Type I and Type II GCTs that have identified an overexpression of the miR-371–73 and miR-302 clusters in GCTs compared with normal samples [
29‐
35]. The miRNA-302 and miRNA-371–373 clusters are highly plausible candidate miRNAs in GCTs given their roles as regulators of embryonic stem cell pluripotency markers [
59,
60] which has been discussed extensively in previous miRNA expression studies of GCTs [
33,
35]. Pinpointing the relevant targets of miRNA can be daunting given the large number of target proteins for each miRNA. According to the online database for miRNA target prediction and functional annotations, miRDB [
61,
62] the number of predicted targets for miRNAs in the miR-371–373 and miR-302 families range from 469 to 529; interestingly, LATS2 is one of the highest ranking targets on the list for all miR-302 family members as well as miR-372 and miR-373 (Target Score > 98) [
62]. Functional studies have demonstrated that miRNA-372 and miRNA-373 act as oncogenes in TGCT through interactions with the p53 pathway, in particular through regulation of LATS2 [
32]. Bioinformatic algorithms indicated that miRNA expression in these clusters is associated with downregulation of mRNA expression in other pathways with biological relevance to GCT [
33,
35]. Data also suggest that the miRNA 302 family can be used to reprogram cancer cells into pluripotent cells with an ES cell-like phenotype [
63]. Collectively, these data suggest that these miRNA clusters are in large part responsible for regulating the stem cell phenotype of GCTs.
Four of the six hubs identified in the cross-platform analysis were stem cell related, and highlight the importance of the transforming growth factor (TGF)-β superfamily members in GCT development. Specifically, the miR-302 cluster regulates the Nodal inhibitors LEFTY1 and LEFTY2 [
64]. This interaction plays an important role in germ layer specification by promoting the formation of the mesendodermal lineage while suppressing neuroectorderm formation [
65]. Knockdown of either protein in mice results in altered differentiation, with
Lefty1 knockdown leading to increased differentiation potential and
Lefty2 knockdown leading to increased immature neuroepithelium [
66]. This pathway also plays other important roles in germ cell development, including regulating meiosis in the male germ cells [
67,
68]. Finally, the nodal signaling pathway regulates the bone morphogenic protein pluripotency pathway [
69], which was previously shown to play an important role in the development of GCT [
70]. The importance of miR-122 and miR-126 in GCT is less clear and will require further study. Interestingly, miR-122 was also overexpressed in YST compared with germinoma and normal gonad tissue in a previous study of miRNA in GCT [
33]. Notably, none of the methylation differences that were so striking in our comparison of YST to other GCTs [
16] were identified as being hubs in the cross-platform analysis, suggesting that these changes may be consequence of the altered expression of stem cell genes and miRNAs rather than drivers of the oncogenic process. Given that the associations between the four SNPs evaluated did not differ by tumor histology [
52,
71,
72], it was not as surprising that none of these were identified as hubs in the comparison of tumors by histology.
We chose to use platforms that were highly enriched for cancer and stem cell genes due to the higher information yield when compared to an unbiased search; however, this limited our ability to conduct an unbiased search for over-represented gene categories in the list of features with high cross-platform correlations. We were able to evaluate upstream regulators of these highly correlated features, and we identified three highly enriched activators (TP73, Tretinoin, and Decitabine) in the set of hub-connected genes in the entire dataset. Given the known importance of retinoic acid in germ cell development [
73], it is not surprising that Tretinoin, a topical retinoid, would be identified as a potential regulator in GCTs. The DNA hypomethylating agent Decitabine (5-aza-2’-deoxycytidine or 5-aza) is also intriguing as a potential therapy given relevant data in the literature. Decitabine is effective for treatment of hematologic malignancies [
74,
75] and has also been evaluated in studies of solid tumors with varying success rates [
76]. Preclinical data suggesting that hypomethylating agents may re-sensitize cells to platinum based chemotherapy [
77‐
79] are of particular relevance for GCT; however, early phase clinical trials in epithelial ovarian cancer have provided mixed results of the combination of decitabine and carboplatinum [
80‐
82]. In a previous study of embryonal carcinoma, high expression of the pluripotency-associated DNA methyltransferase 3B (DNMT3B) was associated with sensitivity to 5-Aza [
77,
83]. The elevated expression of DNMT3B in YST observed here and in a previous DNA methylation study [
15] suggests that this may also be a relevant alternative therapy for YST, which often have poorer outcomes than other histologic subtypes [
84].
Most cancers are heterogeneous and multifactorial [
85]. This complexity stems from molecular events on the level of gene integrity, epigenetic modification and transcription, stability of mRNA transcripts (e.g., by miRNAs), translation, protein activation (e.g., by phosphorylation), and cellular interactions, including within the tumor microenvironment. Because of this complexity, it is important to evaluate the joint effects of these alterations. There is not one clear method for evaluating these complex interactions on large datasets. In this analysis, we chose to utilize the MIC as a tool to evaluate the correlations in the dataset in addition to simple linear Pearson correlation. MIC is able to detect linear relationships and correlations among variables that are not strictly linear (such as quadratic or oscillatory associations). Care should be taken when using the MIC with small datasets (e.g.,
N < 30), as it may identify spurious correlations in this extreme [
86]. Also, despite a recent debate [
87] as to which information-theory based measure provides the highest statistical power (maximal information coefficient or a related measure, mutual information), both measures clearly identify nonlinear trends that are missed by Pearson correlation. Additionally, our analysis included a small number of samples which could have limited our power to detect relevant associations. Finally, our analysis focused on a selected list of genes with a priori significance in cancer and stem cells. It is possible that a genome-wide, agnostic approach would identify additional relevant characteristics of GCTs in additional pathways.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
JNP conceived of the study and participated in its design and coordination, analyzed data and wrote the manuscript. JRBMB analyzed miRNA data and contributed to manuscript writing. KATS conducted cross platform data analysis. AJH performed methylation, miRNA, and genotyping analysis. CL performed stem cell factor analysis. JAR conceived of the study and participated in its design and coordination. JT conceived of the study and participated in its design and coordination. All authors read and approved the final manuscript.