Background
Inappropriate gene silencing resulting from aberrant DNA methylation significantly contributes to neoplastic transformation, tumorigenesis, and tumor progression [
1,
2], contributing to some of the hallmarks of cancer [
3]. While abnormal DNA methylation affecting a variety of genes occurs in nearly every type of cancer that has been evaluated, some tumors exhibit aberrant concurrent hypermethylation of numerous genes, a phenomenon known as the CpG island methylator phenotype (CIMP). CIMP was first described in a distinct subset of human colorectal carcinomas that displayed high rates of concordant methylation of specific genes [
4]. Subsequently, CIMP has been described in other human neoplasms, including tumors of the ovary [
5], bladder [
6], prostate [
6], stomach [
7], liver [
8], pancreas [
9], esophagus [
10], and kidney [
11], as well as neuroblastomas [
12], and leukemias and lymphomas [
13,
14]. While tissue type is important in determining which genes are targeted for methylation in a given neoplasm, CIMP-positive tumors in each of these tissue types exhibit gene silencing that is due to cancer-specific (rather than age-specific) hypermethylation of epigenetically-regulated genes. Definitive evidence for a hypermethylation defect (similar to CIMP) among human breast cancers has not emerged, and some investigators have suggested that such a hypermethylator phenotype does not occur in breast tumors [
15]. Nevertheless numerous epigenetically-regulated genes are known to be directly silenced by DNA methylation in breast cancer including cell cycle control genes (
APC, RASSF1, RB, TFAP2A), steroid receptor genes (
ESR1, PGR, RARα), tumor suppressor genes (
BRCA1, CDKN2A, CST6), and metastasis-associated genes (
CDH1, CEACAM6, PCDHGB6), among others [
16‐
19].
In the current study, we analyzed 12 breast cancer cell lines for differential expression of 64 methylation-sensitive genes, to determine if subsets of breast cancer cell lines methylate genes at disparate frequencies, and subsequently confirmed that lack of gene expression was attributable to methylation-dependent silencing. Unsupervised cluster analysis of gene expression patterns reveals two distinct groups of breast cancer cell lines that possess different methylation signatures: (i) hypermethylator cell lines, and (ii) low-frequency methylator cell lines. The hypermethylator cell lines are characterized by high rates of concurrent methylation of six genes (CDH1, CEACAM6, CST6, ESR1, LCN2, and SCNN1A), whereas the low-frequency methylator cell lines typically lack methylation of these genes. Analysis of the enzymes responsible for human DNA methylation reveals aberrant DNMT3b protein expression and elevated total DNA methyltransferase activity in hypermethylator cell lines. These observations combine to suggest the existence of a distinct subset of human breast cancer cell lines that possess novel biological properties related to dysregulation of the methylation machinery resulting in the acquisition of a hypermethylator phenotype.
Discussion
The CpG island methylator phenotype (CIMP) was first used to describe a distinct subset of colorectal tumors that display high rates of concordant methylation of specific genes [
4]. Subsequently, similar epimutational phenomena have been described in a wide range of neoplasms [
5‐
12,
14,
20]. The results of the present study suggest that a subset of human breast cancer cell lines express a hypermethylator phenotype that is characterized by concurrent methylation-dependent silencing of a number of genes, including a specific set of genes with excellent predictive power (
CDH1, CEACAM6, CST6, ESR1, LCN2, and
SCNN1A) that are involved in a wide range of neoplastic processes.
CEACAM6 is a tumor-related gene that is involved in adhesion, migration, invasion, metastasis, apoptosis, and chemoresistance [
21,
22], although the implications of its loss in breast cancers is not well understood. Cystatin M (
CST6) is a recognized breast cancer tumor suppressor gene [
23] that was recently reported to be silenced due to promoter hypermethylation in numerous breast cancer cell lines, as well as primary breast tumors [
24,
25]. E-cadherin (
CDH1) is a well-known suppressor of invasion/metastasis that functions in the maintenance of cell-cell adhesion [
26].
CDH1 and
ESR1 are frequently concurrently methylated in breast tumors [
19], a relationship also discernible in the present study. The nuclear hormone receptor
ESR1, which is silenced by methylation in the majority of estrogen-negative breast tumors [
19], may be the foremost important methylation-sensitive gene in breast carcinogenesis, holding important implications for sensitivity to hormone therapy and clinical outcome. Much less well understood is the role of ion transport gene
SCNN1A in breast carcinogenesis, although its epigenetic regulation in MCF7 cells has previously been noted [
24].
LCN2 is involved in invasion and metastasis [
27], and its expression has been linked to poor prognosis in ER/PR-negative breast tumors [
28,
29]. Thus, methylation-sensitive genes function in various aspects of the normal biology of the breast epithelium. Therefore, concurrent methylation-dependent silencing of multiple genes in neoplastic breast epithelium (as observed in hypermethylator cell lines) is likely to significantly contribute to tumor biology and behavior.
A previous study that examined methylation patterns of primary breast tumors in search of a hypermethylator phenotype found frequent but essentially equally distributed methylation events at 12 genes among different histologic subsets of neoplasms [
15]. These authors concluded that a CpG island methylator phenotype does not occur in breast cancer [
15]. The difference in conclusions about the existence of a hypermethylator phenotype in breast cancer between the current study and the earlier report [
15] is likely attributable to the number and choice of genes examined in the two studies, as well as the analysis of primary breast tumors versus established cancer cell lines. The previous study did not examine many of the genes that we found to be highly predictive of a hypermethylator phenotype (
CEACAM6, CST6, LCN2, and
SCNN1A), but did include several genes (including
GSTP1, RARβ, RB, and others) which were less useful for predicting the hypermethylator phenotype. Thus, our results are consistent with the previous findings: when the genes are analyzed by Bae et al [
15], no distinct hypermethylator phenotype is detectible. It is only through a survey of numerous methylation-sensitive genes that evidence for a hypermethylator phenotype emerges. Additionally, we examined not only genes with conventionally defined CpG islands, but also those with atypical CpG features (such as
CEACAM6), which have only recently been reported as epigenetically-regulated [
24]. Thus, we use the term "hypermethylator phenotype" rather than "CpG island methylator phenotype" to describe the hypermethylation defect in breast cancer cell lines, since the targets of aberrant methylation are not restricted to genes with large CpG islands.
The results of the current study suggest that the mechanism that accounts for the hypermethylator phenotype in human breast cancer cell lines is elevated DNMT activity secondary to overexpression of DNMT3b. DNMT3b protein is significantly elevated in hypermethylator cell lines, and these cells exhibit aberrantly increased DNMT activity and correspondingly high rates of methylation-dependent gene silencing compared to both low-frequency methylator cells and non-neoplastic counterparts. These results are in agreement with those of other recent studies, in which aberrant DNMT3b overexpression was implicated in the methylation abnormalities of breast cancers [
30] and other cancers [
31]. Tumor cells exhibiting DNMT3b overexpression are likely to exhibit methylation-based aberrant gene expression; one study showed that breast tumors that overexpress DNMT3b are more likely to be
ESR1-negative, display increased proliferation, and be associated with poor patient prognosis [
30]. Thus, it seems reasonable to expect that aberrant expression of DNMT3b protein may produce significant differences in tumor biology for breast tumors of the hypermethylator phenotype. In addition to the six hypermethylator cell lines which had elevated DNMT3b protein and total DNMT activity, one low-frequency methylator cell line (ZR-75-1) exhibited a similar hypermethylation defect. However ZR-75-1 cells retain expression of a number of epigenetically-regulated genes, making it functionally similar to other low-frequency methylator cell lines. A number of explanations may account for this apparent discrepancy: ZR-75-1 cells may methylate other epigenetically-regulated genes which were not surveyed in the present study; alternatively ZR-75-1 cells may possess the same functional defect in the DNMT machinery as cells of the hypermethylator phenotype but express additional repressor proteins which block the methylation capacity of the overabundant DNMT3b protein. Additional studies will be required to resolve these possibilities. The detection of a hypermethylator phenotype in breast cancer cell lines constitutes a first step towards determining if a hypermethylation defect can be identified in primary breast neoplasms in vivo. If a subset of primary breast cancers express a hypermethylator phenotype, we would predict these tumors to differentially express other important characteristics related to tumor biology/behavior and patient outcome. This is the case in colorectal cancer, where CIMP status is associated with various clinical features [
32‐
34]. Likewise, CIMP-positive neuroblastomas, esophageal tumors, and leukemias tend to have poorer prognosis and are associated with significantly higher relapse and mortality rates [
12,
35,
36].
Our findings suggest that breast cancer cell lines that express the hypermethylation defect correspond to estrogen-receptor negative tumors, suggesting that the hypermethylator phenotype cosegregates with a subset of breast cancers (ER-negative) that tend to have poor prognosis [
37]. A number of molecular subtypes of breast cancer have been described (including luminal A, luminal B, HER2+ and basal-like), and these different subtypes correlate with important differences in tumor biology, clinical behavior, and patient survival. Luminal A and luminal B tumors are ER-positive and respond better to treatment, resulting in better long-term patient outcome compared to the ER-negative basal-like and HER2+ subtypes [
38]. Our microarray data mining analysis of primary breast cancer gene expression suggests that the hypermethylation defect observed in breast cancer cell lines can also be identified in primary tumors. Preliminary investigation of a limited dataset (n = 88 tumors) identified a strong cluster of tumors that express the hypermethylator signature (Figure
6), with low levels of expression of the six genes of interest (
CDH1, CEACAM6, CST6, ESR1, LCN2, and
SCNN1A). All of the tumors in this cluster were classified as basal-like, and 75% of the basal-like tumors in the dataset expressed the hypermethylation signature. This observation suggests that the hypermethylator defect represents a biological property of basal-like breast cancers. Basal-like breast tumors make up ~25% of all breast cancers but contribute disproportionately to breast cancer deaths as they tend to display more aggressive tumor characteristics such as increased size, rapid tumor growth, increased rate of metastasis, higher incidence of relapse, and lower overall patient survival [
39,
40]. In has also been observed that this subtype of breast cancer is overrepresented in young, African-American women [
41]. These tumors lack expression of the hormone growth factor receptor genes (ER and PR) that are targeted by some drug regimens, eliminating options for targeted therapy. While further studies are needed to understand fully the relationship between basal-like breast cancers and the hypermethylator phenotype, recognition of this fundamental biological property of the basal-like breast cancers may present new molecular targets for development of novel treatment strategies.
Methods
Cell Culture, RNA, and DNA Preparation
Human breast cancer cell lines BT20 (ATCC# HTB19), BT549 (HTB122), Hs578T (HTB126), MCF7 (HTB22), MDA-MB-231 (HTB26), MDA-MB-415 (HTB128), MDA-MB-435S (HTB129), MDA-MB-436 (HTB130), MDA-MB-453 (HTB131), MDA-MB-468 (HTB132), SKBR3 (HTB30), and ZR-75-1 (CRL-1500) were obtained from the Tissue Culture Core Facility of the University of North Carolina Lineberger Comprehensive Cancer Center (Chapel Hill, NC), and the normal breast epithelial cell line MCF12A [
48] (CRL-10782) was obtained from the American Type Culture Collection [
49]. Cell lines were propagated in growth medium specified by ATCC. Growth medium was refreshed three times weekly, and cell cultures were harvested for RNA preparation at confluency using the method of Chomczynski and Sacchi [
50], modified to utilize TRIzol Reagent (Invitrogen Life Technologies, Carlsbad, CA), according to the manufacturer's protocol. Cell lines selected for treatment with the demethylating agent 5-aza-2'-deoxycytidine (Sigma Chemical Company, St. Louis, MO) were propagated in the appropriate ATCC-recommended growth medium containing 250 nM 5-aza (with refreshing three times weekly) for a total of three weeks, before RNA isolation. As described previously [
24], the concentration of 5-aza used in this study is 4–6-fold lower than traditional methods which allows for long term 5-aza exposure without the typically encountered cytotoxic effects [
51,
52]. Isolated RNA was stored at -20°C as an ethanol precipitate prior to use for RT-PCR. Genomic DNA from 2 × 10
6 cultured cells was isolated using the Puregene DNA Purification Kit (Gentra Systems, Minneapolis, PA). Bisulfite modification of genomic DNA was performed using a procedure adapted from Grunau et al [
53], as described previously [
24].
Semi-quantitative RT-PCR
Sixty-four genes were selected for analysis in this study based on their status as marker genes for CIMP in other tumor systems or genes that are known to be methylated in breast cancer specifically (Table
1). Total RNA (2 μg) collected from each cell line was reverse-transcribed into cDNA using Superscript II Reverse Transcriptase (Invitrogen Life Technologies, Carlsbad, CA) and oligo(dT) as the primer, according to standard methodology. Gene-specific oligonucleotide primers were designed using Primer3 software [
54] and were synthesized by the UNC Oligodeoxynucleotide Synthesis Core Facility (Chapel Hill, NC) based upon the known cDNA sequences [
55] for selected mRNAs of interest. The RT-PCR primer sequences and thermocycling conditions for
CEACAM6, CST6, LCN2, and
SCNN1A have been described previously [
24], while those for
CDH1 and
ESR1 are as follows:
CDH1, forward 5'-TCT-TGC-TGT-TTC-TTC-GGA-GG and reverse TGA-CTC-TGA-GGA-GTT-CAG-GG (60°C, 30 cycles, 380 bp product);
ESR1, forward 5'-TTG-TCC-CAT-GAG-CAG-GTG-CC and reverse 5'-GTA-TGC-ATC-GGC-AAA-AGG-GC (58°C, 30 cycles, 201 bp product). Verification of equal cDNA template concentrations between samples was accomplished using
β-actin primers (forward 5'-AGA-GAT-GGC-CAC-GGC-TGC-TT and reverse 5'-ATT-TGC-GGT-GGA-CGA-TGG-AG,). PCR reactions were performed in a 50 μl total volume of buffer containing 50 mM KCl, 10 mM Tris-HCl (pH 8.3), 1.5 mM MgCl
2, 0.001% gelatin, 200 μM of each dNTP (EasyStart Micro 50 PCR-mix-in-a-tube, Molecular BioProducts, San Diego, CA), 0.4 μM of each primer, and 2.5 units AmpliTaq enzyme (Perkin Elmer/Cetus, Foster City, CA). Reactions were carried out in an Eppendorf Mastercycler Thermocycler as follows: 30–35 cycles at 94°C for denaturing (1 minute), 58–65°C for annealing (1.5 minutes), and 72°C for extension (2 minutes). PCR products were fractionated on 2% agarose gels containing 40 mM Tris-acetate/1.0 mM EDTA and visualized by ethidium bromide staining.
Quantitative Real-time PCR
Total RNA samples (2 μg) from cell lines of interest were DNAase treated (Promega, Madison, WI), purified using the Qiagen Rneasy mini-kit (Qiagen, Valencia, CA), and reversed transcribed using the High Capacity cDNA Archive Kit (Applied Biosystems, Foster City, CA) according to the manufacturer's protocol. Real-time primers and probes for CDH1 (Assay ID: Hs00170423_m1), CEACAM6 (Hs00366002_m1), CST6 (Hs00154599_m1), ESR1 (Hs00174860_m1), LCN2 (Hs00194353_m1), SCNN1A (Hs00168906_m1), and β-actin (Hs99999903_m1) were purchased from Applied Biosystems (Foster City, CA). Reactions were carried out using TaqMan Universal PCR Master Mix (Applied Biosystems, Foster City, CA) and the following amplification conditions: 95°C for 10 min, 40 cycles of 95°C for 15 sec, and 60°C for 1 min. Gene expression levels were normalized using β-actin for each cell line and differences in gene expression were determined using the comparative Ct method described in the ABI Prism 7700 User Bulletin #2 (Applied Biosystems, Foster City, CA).
Cluster Analysis of Breast Cancer Cell Lines Based Upon Gene Expression Patterns
Expression levels for genes of interest were analyzed by RT-PCR using cDNA templates derived from 12 breast cancer cell lines and normal MCF12A breast epithelial cells. RT-PCR results for breast cancer cell lines were expressed on a discrete scale (none, low, medium, high) relative to the expression levels of MCF12A cells. Genes from the original panel of 64 that were not expressed in MCF12A cells (n = 16) were omitted from the cluster analysis, to ensure that cancer-specific methylation events were captured. The expression data were mapped to a quantitative scale (0, 1, 2, 3) for clustering purposes. For some analyses, a combined expression score was generated for each cell line by adding the quantitative RT-PCR expression levels of genes of interest. Clustering of cell lines was carried out with SAS/STAT PROC CLUSTER (SAS Institute, Cary, NC) using complete linkage with 5% trimming and no squaring of distance. Kernel density estimation for trimming used the 5 nearest neighbors.
Methylation-specific PCR, Cloning, and Sequencing
MSP reactions were carried out in EasyStart Micro 50 PCR-mix-in-a-tube (Molecular BioProducts, San Diego, CA) using bisulfite converted DNA template (described above). The primers and thermocycling conditions for
CDH1, CST6, and
ESR1 genes have been described previously [
25,
56,
57]. MSP primers directed against methylated and unmethylated alleles of
CEACAM6, LCN2, and
SCNN1A are as follows: methylated
CEACAM6, forward primer 5'-AGG-GCG-GGT-CGT-TTT-GTT-AT, reverse primer 5'-TCA-CGT-AAA-TCA-TAA-ATA-CGA-TCT-CT (58°C, 35 cycles, 174 bp product); unmethylated
CEACAM6, forward primer 5'-AGG-GTG-GGT-TGT-TTT-GTT-AT, reverse primer 5'-TCA-CAT-AAA-TCA-TAA-ATA-CAA-TCT-CT (55°C, 35 cycles, 174 bp product); methylated
LCN2, 5'-CGA-GAG-TTA-TTG-CGT-TTA-GTC-GA, reverse primer 5'-CGA-ATA-AAT-CAC-GAA-ATC-AAA-AAT-TCG-A (60°C, 35 cycles, 273 bp product); unmethylated
LCN2, forward primer 5'-AGA-GTT-ATT-GTG-TTT-AGT-TGA-GGA, reverse primer 5'-CAA-ATA-AAT-CAC-AAA-ATC-AAA-AAT-TCA-A (55°C, 35 cycles, 273 bp product); methylated
SCNN1A, forward primer 5'-TCG-GGA-GTT-TTT-TTT-TTT-TCG-GA, reverse primer 5'-CCG-CCC-GCT-AAC-CGA (56°C, 40 cycles, 135 bp product); unmethylated
SCNN1A, forward primer 5'-TTG-GGA-GTT-TTT-TTT-TTT-TTG-GA, reverse primer 5'-AAC-CCA-CCC-ACT-AAC-CAA (56°C, 40 cycles, 135 bp product). PCR products were fractionated on 2% agarose gels and visualized by ethidium bromide staining. For some analyses, MSP results were converted from a discrete scale (unmethylated product only, both methylated and unmethylated products, or methylated product only) to a quantitative scale (0, 1, 2) in order to generate a methylation score for each cell line that reflects the combined methylation status of select genes of interest.
Bisulfite-converted DNA was amplified using MSP primers directed to specific segments within the promoter regions and/or exon 1 of selected genes. A portion of each PCR product (1 to 5 μl) was cloned into pGEM-T Easy Vector (Promega, Madison, WI). Colonies (n = 5–10) were selected per gene segment and expanded in liquid culture. Plasmid DNA was purified using the Wizard Plus Miniprep DNA Purification Kit (Promega, Madison, WI), prior to digestion with NcoI and NdeI (New England Biolabs, Beverly, MA) to confirm the presence of the cloned insert. Validated clones were sequenced using the universal M13R3 primer with an Applied Biosystems automated sequencer at the UNC Genome Analysis Facility (Chapel Hill, NC). In some cases, the sequencing results are expressed as total methylation index (TMI), which is calculated by dividing the number of methylated CpGs observed by the total CpGs analyzed for a given gene segment of interest [
58].
DNA Methyltransferase Analysis of Human Breast Cancer Cell Lines
Total DNA methyltransferase activity was measured using EpiQuik DNA Methyltransferase Activity/Inhibition Assay Kit (Epigentek, Brooklyn, NY) as previously described [
59], using nuclear extracts from 12 human breast cancer cell lines and MCF12A cells. Nuclear extracts were isolated using the EpiQuik Nuclear Extraction Kit (Epigentek, Brooklyn, NY) and 3 μl of nuclear extract was added to each reaction well, according to manufacturer's protocol. The final volume of nuclear extract yield was used to normalize the assay results for differences in cell number. Nuclear extracts were incubated with methylation substrate for 1 hour at 37°C, and then exposed to the capture antibody for 60 minutes and the detection antibody for 30 minutes, at room temperature. Absorbance was determined using a microplate spectrophotometer at 450 nm, and DNMT activity (O.D./h/ml) was calculated according to the following formula: (Sample OD – blank OD)/(sample volume × 1000), according to manufacturer's instructions. Results are given in activity units expressed relative to the activity level detected in MCF12A cells.
Nuclear extracts were assayed for individual DNMT proteins of interest (DNMT1, DNMT3a, or DNMT3b) using the Epiquik DNMT1, -3a, and -3b assay kits, respectively (Epigentek, Brooklyn, NY). Protein standards of known concentration (30 ng, 20 ng, 10 ng, and 2 ng) were included to generate a standard curve. The amount of DNMT protein was calculated as follows: DNMT protein (ng/ml) = (Sample OD – blank OD/standard slope) × sample dilution, according to the manufacturer's instructions, and are expressed relative to the protein levels of MCF12A cells.
Cluster Analysis of Gene Expression
The publicly available microarray dataset utilized in this study is available online at the UNC Microarray Database [
60] and includes gene expression data for 92 primary breast tumors analyzed in previous studies [
61‐
64]. Clustering of transcripts was carried out with SAS (PROC CLUSTER) based on distance of the log ratio values using complete linkage with 5% trimming. The kernel density estimation for trimming used the 10 nearest neighbors.
Statistical Analysis
The values for the mean and S.E.M. were calculated using the statistical function of KaleidaGraph Version 3.5 (Synergy Software, Essex Junction, VT). Statistical significance was determined using an unpaired t-test (KaleidaGraph). Error bars depicted represent S.E.M. P values for correlation coefficients (R values) were calculated using VasserStats Significance of Correlation Coefficient Calculator [
65]. The Bayesian analysis was performed as described previously [
66] and the percentage of correct assignments, as well as sensitivity, specificity, and positive and negative predictive values were calculated.
Authors' contributions
JDR carried out the majority of expression, methylation, and DNMT experiments and analyses, and drafted the manuscript. AGR performed select DNA and RNA isolations from the breast cancer cell lines and performed methylation analyses of CST6. WDJ performed the unsupervised cluster analysis on RT-PCR expression data and provided support for additional statistical analyses. WBC conceived of and designed the study, participated in its experimental design and interpretation of results, and helped edit the manuscript. All authors read and approved the final manuscript.