Introduction
TNBC has a higher rate of 5-year distant recurrence than other breast cancers, and despite adjuvant chemotherapy as standard of care for this cancer, 5-year recurrence rates are around 30 % [
1]. Those patients that achieve a pathological complete response (pCR) to neoadjuvant chemotherapy have significantly better overall survival [
1,
2]. Furthermore, the correlation between pCR and distant recurrence is considerably stronger within TNBC patients compared to ER+ patients [
3] leading the Food and Drug Administration to allow pCR as a clinical endpoint for TNBC while strongly recommending against it in ER+ patients [
4]. Many studies have established that breast tumors are heterogeneous, both in histology and clinical outcome, and these differences can serve as the basis for clinical classification and prognostication [
5]. Additionally, molecular classification of cancer subtypes is becoming an increasingly important tool in devising treatment plans. For example, mutation analysis of KRAS in colorectal cancer [
6], and EGFR mutation and ALK rearrangement detection in non-small cell lung cancer [
7,
8] are now standard of care.
There currently exist no clinically applied molecular subclassification tools for TNBC. The intrinsic breast cancer classification system [
9], which has proven useful in assigning biological information to breast cancer subclasses, categorizes the majority of TNBC cases within the basal subclass [
10]. However, significant heterogeneity – both clinically and pathologically – exists in TNBC, and better subclassification tools are needed for clinical decision-making. To this end, Lehmann et al. used 21 breast cancer data sets containing 587 TNBC cases and employed cluster and gene expression analysis to identify a set of 2188 genes for the classification of TNBC into six subtypes (two basal-like (BL1 and BL2), immunomodulatory (IM), mesenchymal (M), mesenchymal stem–like (MSL), and luminal androgen receptor (LAR) subtypes) [
11]. The goal is of this study to translate the knowledge of biologically distinct subtypes into rational design of pre-clinical studies for TNBC clinical trials and to facilitate the identification of novel predictive markers. A previous study has shown the promise of clinical utility by retrospectively subtyping 130 TNBC patients who had received standard neoadjuvant chemotherapy comprised of anthracycline, cyclophosphamide and a taxane. This study showed that patients with basal-like BL1 tumor subtypes had an improved response (52 % exhibiting pCR) whereas basal-like BL2 tumor subtypes showed a worse response to standard chemotherapy (0 % pCR) [
12].
In the derivation of the TNBCtype subclassification tool, the final group of 2188 classifying genes was identified from an initial set of approximately 13,000 genes by selection of those genes with expression significantly distinct from the median gene expression among all the cluster-defined subclasses. Although a seminal advance for the TNBC field, this large classification panel could best be applied for clinical use only after further refinement. Such optimization would serve three purposes. First, a more limited set of genes would speed translation of the classifier into a cost-effective clinical tool. Second, although not necessarily the case [
13], the genes most predictive of a subtype may include those most relevant to the regulation and function of that subtype; therefore, a smaller set of genes may increase the likelihood of correlating biological meaning to the panel members and allowing easier comparison of TNBC subtype molecular profiles to other similarly well-defined molecular prognostic and predictive tools. Third, and most importantly, a small set of classifying genes could improve the reproducibility of the TNBC subtyping panel.
Initial gene expression analysis often has the problem of inclusion of genes with little signal contribution. This problem arises from having tens of thousands of genes produced in a typical assay but a considerably smaller number of measured samples within which to assess these potential classifiers. This statistical problem, coupled with the inherent noise of microarray platforms, creates a challenge to the derivation of reproducible classification panels. One study estimated that to achieve similar (i.e., overlapping) gene panels in multiple cohorts, the number of analyzed tumor samples would need to be at least several thousand [
14]. It is well established that overfitting occurs with large-scale gene expression data when poor or no feature or dimensional reduction is attempted [
15,
16]. Careful reduction of the genes included in a TNBC classifier would potentially lead to a robust clinically applicable tool for subtype identification.
Here we describe the development and validation of a new TNBC classification tool using only 101 genes, less than 5 % of the size of the original 2188 gene model of TNBCtype and yet able to reproduce its classification. The association of the BL1 and BL2 subtypes with pathologic response was also reproduced in an independent patient cohort using this new model.
Discussion
TNBC comprises up to 20 % of all breast cancers (as many as 40,000 women newly diagnosed in the US each year), and occurs more frequently in young and African-American women [
1]. TNBC has higher rates of metastatic recurrence and poorer prognosis than other breast cancers, with a 5-year survival of only ~70 % after treatment with the most aggressive conventional cytotoxic chemotherapies. This current state is due in large part to the heterogeneity of TNBC and the still limited knowledge regarding therapeutic targets and biomarkers that can predict the responsiveness of these cancers to either standard-of-care or investigational therapies. Despite overall poor outcomes, approximately 30 % of TNBC patients respond to standard chemotherapy [
1]. Thus, there is a critical unmet need to develop focused diagnostics to identify patients that would benefit from standard chemotherapy and better align new therapeutic regimens with actionable targets expressed in TNBC patients. The TNBC subtype algorithm represents a major advance toward addressing the heterogeneity and therapeutic sensitivities of TNBC [
11]. However, certain features of this original algorithm, such as the large number of genes that comprise it (2188 in total), are not optimal for its routine clinical application. The refinements described herein represent a portion of the optimization steps being performed to ultimately offer TNBC subtyping as a test with clinical utility.
Bioinformatics refinement of the original, academic research-based TNBCtype algorithm allowed minimization of the expression signature representative of all of the TNBC subtypes from 2188 to only 101 genes. Importantly, there was excellent agreement between the originally proposed 2188-gene subclassification model and the new “lean” 101-gene classifier in both a set of discovery and validation TNBC cohorts as well as in an independent TNBC clinical trial cohort treated neoadjuvantly with AC followed by the mitotic inhibitors [
26]. The gene set enrichment analysis that allowed the pruning of the original model of 2188 genes into only 101 genes showed comparable classification and predictive utility. The data suggest that in the 101-gene model, the genes that define each subclass have similar biological function (Fig.
2). Further, from a practical standpoint, the reduction of the classifier to 101 genes with definition of the individual TNBC subtypes by only 8 to 15 genes will allow placement on assay platforms that would be technically challenging or impossible for the 2188-gene signature.
Preliminary evidence suggestive of the clinical utility of TNBC subtyping has already been demonstrated for both the original 2188-gene and the optimized 101-gene models. In the clinical trial cohort [
26] analyzed herein using both models, the BL2 subtype was demonstrated to significantly associate with lack of tumor response to standard chemotherapy, whereas the BL1 subtype significantly associated pCR. Age was a significant predictor of pathological responses in this cohort, but the BL1 and BL2 subtypes (as defined by the 101-gene model) were independent of this factor. To put these findings in context and emphasize their potential relevance to clinical management, it is important to note that historical data show only approximately 25 % of TNBC patients will respond with pCRs to the conventional anthracycline/cyclophosphamide/mitotic inhibitor combination chemotherapy used as neoadjuvant treatment in the test cohort [
28,
29]. By subclassifying a TNBC population with the 101-gene model, we found that 70 % of patients with tumors classified as BL1 experienced pCR, in contrast to only 22 % of those with BL2 tumors. Our findings corroborate the independent study published by Masuda et al., who employed the 2188-gene model on a cohort of patients from the MD Anderson Cancer Center treated with neoadjuvant chemotherapy containing sequential taxane and anthracycline-based regimens and likewise found BL1 TNBC patients to have a high rate of pCR (52 %) and BL2 patients to have the lowest (0 %) pCR rate of all subtypes [
12]. Collectively, these data are supportive not only of the ability of the gene expression models to classify TNBC into stable homogenous subtypes, but also of the likely predictive utility of these subtypes to assess therapeutic sensitivities.
In the original identification of the TNBC subtypes by Lehmann and co-workers, it was noted that the BL1 subtype was typified by high expression of cell cycle and DNA damage response genes [
11]. Additionally, TNBC cell lines that shared expression patterns with this subtype preferentially responded to cisplatin and it was hypothesized that patients with BL1 would have higher response rates to platinum compounds and PARP inhibitors [
27] compared to the other subtypes [
11]. The 101-gene model is being further refined to an even more limited gene sets to individually classify each subtype. Thereafter, clinical utility studies will follow to assess the ability of subtyping to guide therapeutic decisions regarding the use of platinum agents, PARP inhibitors, as well as other agents believed to have efficacy in subsets of TNBC patients (e.g., checkpoint blockade inhibitors, androgen receptor antagonists and anti-angiogenics such as bevacizumab, etc.). Previous attempts with targeted therapies in unselected TNBC have largely been unsuccessful as has been the case with VEGFR and EGFR inhibitors [
30,
31]. However, alignment of targeted therapies with select subsets of TNBC that display biologies dependent on a given target may accelerate development of new therapeutics that are more efficacious for patients with TNBC.
Competing interests
Robert S. Seitz, Kasey Lawrence, Daniel B. Bailey, Stephan W. Morris, David R. Hout, Brock L Schweitzer are employees of and hold stock in Insight Genetics, Inc. Brian Z. Ring is compensated as a consultant for Insight Genetics, Inc. Jennifer A. Pietenpol and Brian D. Lehmann are inventors of intellectual property for TNBCtype licensed by Insight Genetics Inc.
Authors’ contributions
Conception and design: BZR, BDL, JAP, SWM and RSS. Development of methodology: BZR and RSS. Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): BZR, BDL, SWM, JAP and RSS. Writing, review, and/or revision of the manuscript: DRH, RSS, DBB. BLS, KL, BDL, and JAP. Study supervision: DRH and RSS. All authors have read and approved the manuscript.