Background
Breast cancer is one of the most commonly diagnosed cancers and one of the leading causes of death among female cancer patients. It has been estimated that, globally, approximately 12% of the newly diagnosed breast cancers occur in China [
1]. Despite great efforts spent on improving the diagnosis and treatment of breast cancer, its prognosis varies greatly among patients. An effective molecular tool is urgently needed for predicting and classifying prognoses of breast cancer patients [
2].
Cancer is caused by the accumulation of mutations in cancer susceptibility genes and the resulting abnormal cell growth. In addition to genetic variations, aberrant DNA methylation and variations in gene expression patterns have also been recognised to play an important role in tumourigenesis [
3,
4]. Extensive studies have shown that global DNA hypomethylation and regional hypermethylation of Cytosine-Phosphate-Guanine (CpG) -rich islands are prevalent in cancers [
4,
5]. Promoter methylation suppresses gene transcription, and aberrant methylation is one of the major causes leading to instability of the genome, activation of oncogenes and suppression of tumour suppressor genes. Accordingly, aberrant methylation may contribute greatly to breast cancer onset and progression.
Based on variations in gene expression, breast cancer is currently classified into five major subtypes: luminal A, luminal B, ErbB2+, basal-like and normal-like. However, based on the copy number, gene expression and long-term clinical outcomes, breast cancer is further divided into at least 10 intrinsic subtypes, which demonstrate the complexity of the landscape of breast cancer [
6]. Each subtype has a unique expression pattern and unique clinical features [
3,
7] and has a distinct response profile to the same therapy [
8]. Thus, attempts to define the prognosis related gene expression signature remain necessary.
Specific methylation profiles may also exist for different subtypes. Holm et al. [
9] have reported that certain patterns of hypermethylation, which modulate gene expression and promote tumor progression, may be viable targets in some luminal breast cancers. Reportedly, CpGs in the luminal B subtype are the most frequently methylated and those in the basal-like subtype are the least frequently methylated [
10]. Significantly higher methylation levels of tumour suppressor genes Ras Association Domain family 1 (
RASSF1) and Glutathione S-transferase Pi 1 (
GSTP1) have been observed in the luminal B subtype than in the basal-like subtype [
10]. Furthermore, the expression levels of both genes have been shown to be downregulated by hypermethylation in breast cancer [
11‐
15]. The hypermethylation and reduced expression of
RASSF1 and
GSTP1 have been correlated with cancer onset and progression [
13,
14].
Despite extensive investigations into aberrant methylation and gene expression, robust and precise molecular prognostic predictors for specific breast cancer subtypes, such as luminal A and B types, remain to be developed. In the present study, we used the data from The Cancer Genome Atlas (TCGA) as a training set and identified methylation sites that are significantly correlated with luminal breast cancer prognosis. The mRNA expression of genes corresponding to these sites correlated significantly with their methylation levels and prognoses. We further compared mRNA expression profiles between breast cancer and normal tissues and identified eight signature genes used for constructing a risk scoring system. Based on this system, luminal breast cancer patients were classified into low-risk and high-risk groups, which exhibited significant prognostic and molecular differences.
Discussion
Variations in methylation profiles are of considerable importance in breast cancer onset and progression [
4]. Methylation profiles differ among breast cancer subtypes and may influence gene expression [
10]. In the present study, we focused on luminal breast cancer. We downloaded the data from TCGA, a public database that catalogues the genetic profiles of over 30 human tumors, including breast cancer. This platform contains many types of data, such as gene expression, exon expression, miRNA expression, copy number variation (CNV), single nucleotide polymorphism (SNP), mutations, DNA methylation, and protein expression. However, the TCGA database has poor follow up data. A majority of the samples are concatenated shortly after diagnosis, which limited the number of available samples in our study. Due to poor follow up data, the TCGA patient material is not representative of any real breast cancer population. Using data from the TCGA, we identified a set of prognosis-related methylation sites and further evaluated their relationship with corresponding mRNA expression. We identified 14 genes (Table
2) whose mRNA expression levels, methylation levels and prognosis of breast cancer were significantly correlated.
Among these genes,
SOSTDC1 is of special interest, considering its complex role and potential importance in metastatic breast cancer.
SOSTDC1 is a member of the sclerostin gene family and is actively involved in the bone morphogenic protein and Wnt signalling pathways.
SOSTDC1 mRNA levels are downregulated in breast cancer and are associated with survival [
22,
23]. The elevation in
SOSTDC1 methylation level in tumour tissues (Additional file
7: Table S2) may explain
SOSTDC1 downregulation in breast cancer because promoter methylation has an inhibitory effect on gene expression. Because
SOSTDC1 is closely associated with luminal breast cancer, we divided the samples into hypo- and hypermethylation groups based on
SOSTDC1 methylation levels. Another DNA methylation signature,
SAM40, was reported to discriminate patients with luminal A breast cancer between good prognoses and poor prognoses [
24]. This highlights the feasibility of the sub-classification of the patient groups based on DNA methylation signature. Future studies might focus on the combination of
SAM40 and
SOSTDC1 in the prognostic prediction of luminal breast cancer.
To identify signature genes in luminal breast cancer, we also compared mRNA expression profiles between breast cancer and control tissues. A total of 67 differentially expressed genes were found to be significantly correlated with prognosis. Further analysis identified eight signature genes (
ESCO2,
PACSIN1,
CDCA2,
PIGR,
PTN,
RGMA,
KLK4 and
CENPA). These signature genes were used to construct a prognosis-related risk scoring system, based on which samples were classified into low-and high-risk groups. The luminal breast cancer samples from the TCGA and the Metabric cohort were used to validate this system. Interestingly, we found prognostic differences within the Luminal A breast cancer patients in both databases, although the two lines in Fig.
4b were almost overlapping. No significant prognostic differences were found within Luminal B samples, indicating that this risk score system might have prognostic value for patients with Luminal A breast cancer.
Many research groups have focused on the prediction of prognosis and chemotherapeutic benefits by construction of a risk system based on gene expression profile, such as the 70-gene predictor [
25] and the 50-gene signature [
26]. The 50-gene signature test (PAM50) is one of the most widely accepted systems for the prediction of clinical outcomes in women with distinct intrinsic subtypes [
26]. In the patient cohorts of this analysis, our signature genes were more suitable for splitting Luminal A and Luminal B subtypes than PAM50. However, a limitation of our study is that the cohort of luminal breast cancer samples in TCGA was small. Future studies will utilize larger patient cohorts and enrich the clinical data to validate our risk system.
Previous studies have shown that most signature genes are involved in cancer progression, even though they may not be directly involved in breast cancer. It has been reported that
ESCO2,
CDCA2 and
CENPA are cell cycle-related genes involved in cancer progression.
ESCO2 is an acetyltransferase, which is required for cohesion acetylation and the establishment of sister chromatid cohesion in the S phase [
27,
28], and has been found to be upregulated in melanoma [
29].
CDCA2 is required in the formation of mitotic chromatin and is involved in the progression of human squamous cell carcinoma [
30].
CENPA is essential for centromere integrity and chromosome segregation, and
CENPA dysregulation may promote tumourigenesis due to the resulting genome instability [
31‐
33]. Other signature genes, including
PTN,
KLK4,
RGMA and
PIGR, have also been reported to be involved in cancer progression. Increased
PTN [
34,
35] and
KLK4 [
36‐
38] expression is strongly associated with the progression of different malignant cancers. Decreased
PIGR expression has been found in colon tumours [
39], while
RGMA has been reported to have an inhibitory effect on cancer progression [
40,
41]. The remaining signature gene,
PACSIN1, is important in endocytosis and synaptic vesicle recycling [
42,
43]. Although its direct involvement in cancer has not been reported, it may play an indirect role in cancer progression.
Our results also demonstrated significant differences in the expression of these signature genes between low- and high-risk groups and between the control and cancerous tissues (Fig.
3). Our GO and pathway analyses revealed that the genes that were expressed differentially between the low- and high-risk groups were mainly involved in biological processes, such as cell cycle and cancer progression (Fig.
5b and
c).
There are limitations in our manuscript. The gene signature is derived from the segregation of patients based on methylation level of only one gene (SOSTDC1), which could cause bias of data analysis. The eight-gene signature was screened based on bioinformatics analysis and this study may just provide clues for future study of patients with luminal breast cancer. The future focus of our work is to collect more samples and improve our risk score system experimentally.
Taken together, our results supported the role of these genes, consistent with their biological functions, in the development and progression of luminal breast cancer.