Background
It is well established that most cancers are triggered by somatic or, less commonly, germline mutations in caretaker and gatekeeper genes [
1‐
6]. The caretakers are broadly defined DNA repair genes that are responsible for maintenance of genome stability. Mutations in the caretaker genes, which are considered to be typical tumor suppressors, compromise genome stability and, more specifically, increase the probability of mutation in the gatekeepers which include both tumor suppressor genes and oncogenes [
3,
7]. Tumor suppressors are genes that control cell proliferation, in particular, by causing cell death in response to DNA damage; accordingly, mutational inactivation of tumor suppressors may cause transformation. In contrast, oncogenes are genes that, when mutated, acquire new functions promoting cell proliferation and, eventually, transformation [
4].
Since the pioneering work of Theodore Boveri in the beginning of the 20
th century[
8], tumorigenesis often has been viewed as a somatic version of Darwinian evolution [
9‐
12]. This perspective implies positive selection of mutations that are beneficial from the standpoint of an individual cell, i.e., mutations that promote cell proliferation such as those activating the tumorigenic potential of oncogenes and those inactivating tumor suppressors. In the context of modern evolutionary synthesis, it is equally obvious that tumor evolution should involve substantial purifying selection against mutations impairing proliferation. Although the Darwinian view of tumorigenesis seems to be increasingly gaining foothold, the interplay of selective forces acting on mutations in specific genes is not understood in detail.
Altogether, mutations in more than 200 human genes have been implicated in cancer [
13]. Currently, inactivation of tumor suppressors is considered to be the main driving force of tumorigenesis. The most prominent and best studied tumor suppressor is
p53, a multifunctional transactivator of transcription and regulator of cell proliferation, programmed cell death, and repair [
14‐
16]. The
p53 gene is mutated in nearly 60% of human tumors. Many independent studies have shown that, in addition to its tumor suppressor properties, p53 may also behave as an oncogene[
17]. Specifically, gain of new biochemical (e.g., transactivation of transcription of genes that are not affected by wild-type p53) and biological (e.g,, stimulation of cell proliferation) functions resulting from p53 mutations has been demonstrated [
18‐
22]. Compelling evidence of p53 gain-of-function during tumorigenesis has been provided by recent reports on mouse models of Li-Fraumeni syndrome (LFS), a familial cancer predisposition syndrome caused by germline p53 mutations. These studies revealed substantial changes in the tumor spectra of mice carrying common p53 mutations, indicating that gain-of-function by p53 is important for tumorigenesis[
23,
24].
The conclusion that gain-of-function in p53 mutants is important for tumorigenesis is strongly supported by the results of bioinformatic analysis of the mutation spectra of the
p53 gene [
25,
26]. These studies yielded three lines of evidence compatible with biologically relevant gain-of-function in p53 mutants in tumors:
i) somatic mutations of p53 detected in various cancers showed a highly significant excess of non-synonymous over synonymous substitutions, which is the signature of positive selection[
27], ii) amino acid replacements caused by cancer-associated mutations clustered within evolutionarily conserved, functionally important regions of p53, and iii) mutational hotspots, the sites of frequent mutation which are subject to particularly strong positive selection, differed depending on the type of tumor, which suggests acquisition of distinct new functions by p53 in different tumors.
These observations prompted us to ask whether positive selection could also be detected in somatic mutants of other cancer-related genes in tumors. Genes evolving under positive selection during cancer progression could be viewed as candidate new oncogenes. To delineate the repertoire of such genes, we performed a genome-wide search for positive selection during cancer evolution by comparing the sequences of Expressed Sequence Tags (EST[
28]) from tumors to the corresponding genomic sequences. The rationale of this analysis is to detect somatic mutations in ESTs and identify genes that show a significant excess of non-synonymous over synonymous substitutions in tumors. In principle, EST libraries provide ample material for analyzing somatic mutations in tumors and normal tissues. The problem with this approach is that differences between EST sequences and the sequences of the respective reference genes from the human genome may be caused by a variety of reasons other than somatic mutation including sequencing errors, incorrect assignment of an EST to a reference gene, and single-nucleotide polymorphisms (SNPs).
Several recent, large scale studies employed EST collections for detecting cancer-associated SNPs and cancer-specific alternative splice forms. In particular, Xu and Lee identified 316 human splice variant forms with a statistically significant cancer association; the structures of the most abundant of these were supported by sequences of the corresponding mRNAs isolated from tumors [
29]. Another, larger-scale study by Gupta et al. reported 1120 tumor-specific splice isoforms with a high rate of validation by mRNA sequencing. However, when mRNA analysis was performed, the tissue specificity of many of these transcripts, particularly, those of low abundance, could not be confirmed[
30]. A study by Brentani et al. took a different approach by using ESTs to identify SNPs in a predefined set of cancer-related genes; this resulted in the identification of 237 previously known and 505 new SNPs in these genes[
31]. A comprehensive analysis by Qiu and coworkers involved cross-mapping of the EST database (dbEST) and the database SNP (dbSNP), yielding a statistically significant association with tumors for 4865 SNPs[
32].
These studies emphasize the potential of EST analysis for detecting genomic and expression features associated with cancer. However, they are not particularly informative in terms of uncovering potential causative roles of individual genes in tumorigenesis. We were interested in mining dbEST for somatic mutations that could be positively selected in cancers, which would make the respective genes candidate oncogenes. The inherent problem of such analysis is distinguishing somatic mutations from sequencing errors and SNPs. However, the latter two sources of sequence variation are not expected to produce a signature of positive selection. Indeed, whatever biases are prevalent among sequencing errors, they would not effect the non-synonymous to synonymous substitutions. The issue with SNPs, obviously, is more complex. However, most if not all human SNPs appear to be either selectively neutral or slightly deleterious and do not show signs of frequent positive selection[
33,
34]. Accordingly, the signature of positive selection, namely, an elevated non-synonymous/synonymous substitution ratio [
27,
35], is expected to be detectable among somatic mutations even in the presence of some contamination by sequencing errors and SNPs.
With this premise, we partitioned the EST sequence libraries available through the dbEST database (NCBI, NIH, Bethesda) into those originating from tumors (hereinafter cancer ESTs) and those from normal tissues (normal ESTs), and identified genes with a significant excess of non-synonymous substitutions in each of the two sets. The results suggest that positive selection is more pronounced in somatic evolution of tumors than it is in normal tissues. Many genes with a signature of positive selection in tumors have established or strongly predicted links to cancer.
Discussion
The interpretation of the findings on CASPS genes described here requires extreme caution. Although filters were applied to separate somatic mutations from sequencing errors and SNPs (see Methods for details), it is impossible to guarantee that the final list is free of these irrelevant sources of variation. Furthermore, taking into consideration the number of analyzed ESTs, identification of 112 genes with apparent signs of positive selection is, in itself, not particularly surprising. The strongest indication we obtained that some of the CASPS genes are likely to be associated with tumorigenesis is the significant excess of genes with the positive selection signature among cancer ESTs compared to the ESTs from normal tissues (112 against 37). Based on this ratio and assuming that the apparent signature of positive selection in normal ESTs represents the background noise, it should be expected that ~70% of the CASPS genes are, indeed, subject to positive selection during the somatic evolution of tumors. Additionally, the evidence seems convincing for those genes that, individually, showed a significant difference in the non-synonymous to synonymous substitution ratio between cancer and normal ESTs (Table 2; see
Additional file 1). From a different perspective, however, it is not certain that somatic mutations in normal tissues are not selected for. Furthermore, it cannot be ruled out that some of the genes that seem to evolve under positive selection in normal tissues are associated with the development of precancerous conditions.
Assuming that there is, indeed, a signal of tumor-specific positive selection in our list of CASPS genes, these are likely to be the tip of the proverbial iceberg of genes that evolve under this regimen in various cancers. Although the current EST database is large and represents most of human genes, it is far from being satisfactory for the purpose of analysis of somatic evolution. In the present study, we had no choice but to lump together ESTs from all types of cancers because the amount of variation in individual tumor types was insufficient for statistical analysis. Furthermore, as already indicated, this analysis is capable of detecting selection only for relatively highly expressed genes. Many genes on our CASPS list and more genes that did not make it contained only several non-synonymous substitutions with no synonymous substitutions. Obviously, the statistical power of the present analysis was insufficient to identify positive selection in these genes.
It is expected that, once the EST or complete cDNA data becomes sufficient for separate analysis of tumors of different origins or, ideally, different cell types and tumor progression stages from individual patients, approaches similar to those employed in this work will provide a wealth of information on somatic evolution of the cancer genome. Establishing ancestor-descendant relationships within individuals will allow one to arrive to definitive conclusions regarding the selection forces in action during tumorigenesis.
Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions
The study was conceived and designed by EVK and IBR; VNB, MKB, FAK, and IBR performed the computational analysis of the EST mutation data; EVK analyzed the biological aspects of the candidate positively selected genes; VNB wrote the initial draft of the Methods and Results; EVK wrote the final manuscript which was read and approved by all authors.