Background
Head and neck squamous cell carcinoma is a relatively common malignancy, associated with severe disease- and treatment-related morbidity. One of the most predictive factors of poor clinical outcome is the presence of regional lymph node metastasis, and nodal status of the neck plays a decisive role in the choice of treatment [
1,
2]. The complex process of metastasis in HNSCC is still incompletely understood at a molecular level; however, multiple marker studies have been performed in order to identify markers that predict the presence of metastasis. Recently, high-throughput gene expression studies have been able to identify a metastatic gene expression signature in primary HNSCC tumors, and Roepman et al. were able to predict the presence of lymph node metastasis based on gene expression of the primary tumor [
3]. Analyses in this study were performed in a 'data-driven' way, by means of computational statistics without prior implementation of existing knowledge about functionally related genes and pathways. This technique is very useful in the search for new biomarkers, new subgroups, and differences in their gene expression profiles. Although the authors were able to identify a number of genes that are known to be involved in metastatic disease within this gene set, the interpretation of statistical differences in a meaningful molecular biological context is not self-evident. It has been demonstrated that classification gene sets are profoundly influenced by the microarray methodology, such as the microarray technique, microarray platform, and preprocessing methods [
4‐
8]. Furthermore, it has been shown that classifying gene sets are highly dependent on the chosen analysis strategy [
9,
10]. This is illustrated by the fact that the authors were able to generate several different classifying gene sets that were all able to predict nodal metastasis with reasonable accuracy [
10]. The dependence of classifying gene sets on statistical methods, as well as technical methods such as choice of microarray platform, hampers comparability of results from different microarray studies and raises questions about the biological relevance of the classifying genes. It is therefore necessary that differences in gene expression are validated in an independent dataset. Here, we present an independent gene expression validation study of metastasized versus non-metastasized HNSCC. Differences in gene expression between metastasized and non-metastasized HNSCC were determined in the publicly available dataset generated by Roepman et al., and subsequently validated in an independent gene expression dataset of 11 metastasized and 11 non-metastasized HNSCC tumors of three anatomical localizations (the oral cavity, the oropharynx and the larynx). In addition to the validation of individual differentially expressed genes, we performed a supervised, pathway-based analysis. Gene expression was evaluated within predefined subgroups of genes with a known biological context, i.e. genes within a metastasis related pathway. First, pathways and functional gene clusters that are involved in the process of metastasis in carcinoma were defined using pathways described in literature and the publicly available Kyoto Encyclopedia of Genes and Genomes (KEGG) and Biocarta pathway databases, with a focus on pathways involved in survival, proliferation, apoptosis, cell adhesion, extra cellular matrix signaling and remodeling, hypoxia and angiogenesis [
11‐
13]. Using this supervised analysis strategy, we found considerable concordance between the datasets for pathways involved in survival, proliferation, apoptosis, cell adhesion, extra cellular matrix signaling and remodeling, hypoxia and angiogenesis. Gene sets that were validated by the independent validation dataset were matrix metalloproteinases (MMPs) and pathways involved in MMP regulation, the uPA system and pathways involved in uPA regulation, and HIF1α regulated invasion and angiogenesis. This approach to microarray analysis generates an outcome with readily interpretable biological meaning. Furthermore, by concentrating on groups of genes with a known biological relation rather than individual genes, comparability of microarray studies performed on different microarray platforms is improved [
14].
Discussion
It has been demonstrated that the outcome of microarray studies is profoundly influenced by the chosen analysis strategy and highly dependent on technical aspects such as sample preparation methods and choice of microarray platform [
4‐
10]. This raises questions about the biological validity of the outcome of individual studies, and the validation of microarray studies is therefore essential. Here we present the results of an independent validation analysis of differences in gene expression between metastasized and non-metastasized HNSCC. The reference study and validation study were performed in different centers by different investigators, using different microarray platforms with different probe content. In this study, we concentrated on the validation and biological interpretation of the differences in gene expression between N0 and N+ HNSCC subgroups, and did not attempt to validate classifying gene sets that predict N-status in the reference study or other HNSCC microarray studies, because the data-driven way in which these classifying gene sets are created makes them too dependent on the microarray platform, the microarray technique, the preprocessing methods and the laboratory used to create them [
4‐
8]. Furthermore, gaining insight into the process of metastasis on the basis of these classifying gene sets is troublesome: although some probes within the classifier encode a gene with a known role in tumorigenesis or metastasis, many others have unrelated or unknown functions [
3]. The fact that multiple classifying gene sets can be constructed on basis of the reference study data casts further doubts on their biological validity [
10]. In this study, a gene-based analysis revealed 7 genes that appeared to be significantly expressed in the validation dataset (raw p value < 0.05) (Table
2). All of these 7 genes are known to be involved in processes of tumorigenesis or metastasis. LLGL2 belongs to a group of genes that act as tumor suppressor genes. Loss of function is associated with disruption of cell polarity and tissue architecture, uncontrolled proliferation and growth of neoplastic lesions [
25]. FAP is a cell-surface protease expressed in reactive stromal fibroblasts of epithelial cancers, and is associated with invasion and metastasis in gastric, colorectal and cervical carcinoma [
26‐
28]. PLAU encodes a serine protease involved in degradation of the extracellular matrix. Its plays a well-known role in invasion and metastasis of carcinoma, and is a prognostic factor for metastasis and outcome [
29,
30]. LAMB1 encodes the β1 subunit of members from the laminin family, extracellular matrix glycoproteins that are the major non-collagenous constituent of basement membranes. Laminins containing the β1 subunit (i.e. laminin 8 and 10) have been implicated in the metastasis related processes of angiogenesis, invasion, and migration [
31,
32]. The protein encoded by MSC is a transcriptional repressor that attenuates E2A-mediated gene activation. MSC overexpression is associated with loss of differentiation in multiple tissues and is associated with B-cell lymphoma, but no association with epithelial cancer has been described to date [
33,
34]. COL5A1 and COL5A3 encode alpha chains of collagen type 5. Upregulation is associated with metastatic potential in carcinoma [
3,
35]. However, when corrected for multiple testing, none of these genes could be statistically validated.
Roepman et al. have reported an adverse effect of long-term storage of tissue samples on its predictive accuracy. No explanation was found for this phenomenon, but it did not seem to be attributable to differences in total RNA and cRNA yield or quality [
3]. We have evaluated the effect of storage time on the gene-based validation analysis in this study. Our LIMMA analysis of the most recent tumor samples within the reference dataset identified more differentially expressed genes, an observation that seems to correlate well with the findings of Roepman et al. [
3]. However, we do not find an effect of sample storage time on the outcome of our validation analysis as none of these genes are statistically validated by the most recent samples in the validation dataset.
The gene-based validation between the reference and validation studies was hampered by the use of different microarray platforms with different probes and probe content. In order to overcome this problem, a pathway-based supervised analysis was performed, evaluating differences in gene expression between metastasized and non-metastasized HNSCC for predefined tumorigenesis- and metastasis related pathways and gene sets. By analyzing groups of functionally related genes, we were able to study the same biological processes in both reference and validation datasets, even though not all genes involved in these processes were present, and the number and nature of the represented genes varied in the respective datasets. In this way, 7 metastasis-related pathways and functionally related gene sets that differentiate between metastasized and non-metastasized HNSCC were statistically validated (Table
4 and Figure
1). These validated pathways are metalloproteinases and regulatory pathways of metalloproteinases, HIF1α induced invasion- and angiogenesis related target genes and the urokinase plasminogen activator system, key pathways involved in invasion, extra cellular matrix remodelling, detachment and angiogenesis, essential steps in the progression to metastatic disease (Tables
4 and
5). Metalloproteinases play a complex role in tumor progression and metastasis. Not only do they facilitate invasion by degrading components of the extracellular matrix, there is also evidence that they are involved in angiogenesis. MMPs that induce metastasis are not only produced by the tumor cells but also by stromal cells and leucocytes, especially along the invasive front of the tumor [
36]. The second messenger signalling pathways that lead to expression of MMPs are not fully understood, but there is evidence that MAPK pathways are involved [
37]. Three different regulatory MAPK pathways of MMPs have been identified, and in this study two of them show significant differential expression between metastasized and non-metastasized HNSCC: the MAPK1/3 (ERK1/2) and JNK/MAPK pathways. The MAPK1/3 pathway, which is activated by a variety of mitogenic and growth factors, induces FOS and JUN phosphorylation and expression. The JNK/MAPK pathway, which is induced by various inflammatory cytokines, increases transcriptional activity and protein stability of JUN. FOS and JUN are leucine zipper proteins that can dimerize forming the AP-1 transcription factor complexes. JUN, FOS and AP-1 complexes seem to regulate expression of multiple MMPs [
37]. The urokinase plasminogen activator system (uPA) mediates invasion and metastasis by catalysing extracellular matrix dissolution, and there is evidence that the uPA system plays a role in cell proliferation, migration and modulation of cell adhesion as well. The potential of components of this system as prognosticators in cancer has been evaluated most extensively in breast cancer, but also in HNSCC [
38]. PLAU in particular seems to correlate well with unfavourable outcome, and in this study PLAU correlated well with metastatic HNSCC in both the reference and validation datasets. Hypoxia is a common feature in solid tumors and their metastasis, and can lead to tumor progression in a variety of ways. It induces HIF1α, a transcription factor that regulates angiogenesis as well as cell survival, invasion and metastasis by activating transcription of a host of target genes. The gene sets comprised of HIF1α target genes that are known to be involved in angiogenesis and invasion are significantly upregulated in metastasized HNSCC in this study [
39‐
41]. As oligonucleotide microarrays measure mRNA levels, results reflect the gene-expression levels in N0 and N+ HNSCC. Post-transcriptional events such as splicing, translation or activation of the proteins encoded by these genes are not measured. Upregulation of genes in a specific pathway as determined by oligonucleotide microarrays therefore may not necessarily mean heightened activity of the pathway. However, it is very plausible that the observed differences in gene-expression levels of genes involved in metastasis-related pathways are responsible for the differences in metastatic potential of N0 and N+ HNSCC.
The aim of this study is not only to identify and validate a gene-expression profile that characterizes metastatic disease in head and neck squamous cell carcinoma, but to provide an analysis strategy that incorporates the available insights in the pathways that lead to metastasis. This supervised pathway-based analysis will not reveal new, previously unknown metastasis related biomarkers. It does however increase our understanding of the biological context of the results. By focusing on pathways and functional gene sets, rather than individual genes, the insight into the biological steps that lead from carcinogenesis to metastatic disease in HNSCC is enhanced. Furthermore, by leaving probes that are not relevant to the biological processes of interest out of the analysis, statistical noise and multiple testing problems associated with microarray analysis are reduced. The most important advantage of this strategy however is the increased comparability of data from different microarray studies. Microarray analyses based on individual genes are highly dependent on the exact gene content of the microarray used in the study, and thus on the chosen microarray platform. In a pathway-based analysis however, gene expression does not have to be measured from every single gene involved in a specific pathway, as long as a representative subset of genes is assessed. These representative subsets of genes involved in a specific pathway may vary between studies. A pathway-based analysis thus can reveal biologically relevant similarity between results of different microarray studies even though the gene contents of the microarray platforms used do not match exactly.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
EFH participated in the design of the study, in the validation microarray experiments and the statistical analysis, and drafted the manuscript. MJDH carried out the validation microarray experiments and helped with the statistical analysis. JJG participated in the design of the study, carried out part of the statistical analysis and helped to draft the manuscript. JO carried out part of the statistical analysis and helped to draft the manuscript. VTHBMS helped with designing the study and drafting the manuscript. CJC participated in the design of the study, its coordination and drafting of the manuscript. RJBdJ conceived of the study, participated in its coordination and helped to draft the manuscript. All authors read and approved the final manuscript.