Electronic supplementary material
The online version of this article (https://doi.org/10.1186/s12885-018-4388-4) contains supplementary material, which is available to authorized users.
Breast cancer is a heterogeneous disease and personalized medicine is the hope for the improvement of the clinical outcome. Multi-gene signatures for breast cancer stratification have been extensively studied in the past decades and more than 30 different signatures have been reported. A major concern is the minimal overlap of genes among the reported signatures. We investigated the breast cancer signature genes to address our hypothesis that the genes of different signature may share common functions, as well as to use these previously reported signature genes to build better prognostic models.
A total of 33 signatures and the corresponding gene lists were investigated. We first examined the gene frequency and the gene overlap in these signatures. Then the gene functions of each signature gene list were analysed and compared by the KEGG pathways and gene ontology (GO) terms. A classifier built using the common genes was tested using the METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) data. The common genes were also tested for building the Yin Yang gene mean expression ratio (YMR) signature using public datasets (GSE1456 and GSE2034).
Among a total of 2239 genes collected from the 33 breast cancer signatures, only 238 genes overlapped in at least two signatures; while from a total of 1979 function terms enriched in the 33 signature gene lists, 429 terms were common in at least two signatures. Most of the common function terms were involved in cell cycle processes. While there is almost no common overlapping genes between signatures developed for ER-positive (e.g. 21-gene signature) and those developed for ER-negative (e.g. basal signatures) tumours, they have common function terms such as cell death, regulation of cell proliferation. We used the 62 genes that were common in at least three signatures as a classifier and subtyped 1141 METABRIC cases including 144 normal samples into nine subgroups. These subgroups showed different clinical outcome. Among the 238 common genes, we selected those genes that are more highly expressed in normal breast tissue than in tumours as Yang genes and those more highly expressed in tumours than in normal as Yin genes and built a YMR model signature. This YMR showed significance in risk stratification in two datasets (GSE1456 and GSE2034).
The lack of significant numbers of overlapping genes among most breast cancer signatures can be partially explained by our discovery that these signature genes represent groups with similar functions. The genes collected from these previously reported signatures are valuable resources for new model development. The subtype classifier and YMR signature built from the common genes showed promising results.