Background
Breast cancer is the second leading cause of cancer-associated mortality in women [
1] and represents a serious global health concern. WHO has estimated that globally, > 1.7 million new breast cancer cases and 522,000 breast cancer-associated deaths occurred in 2012 [
1]. Breast cancer can be classified as either non-invasive ductal carcinoma in situ (DCIS) or invasive ductal carcinoma (IDC). Invasive tumors can be further categorized into different molecular subtypes, including luminal A, luminal B, Her2
+, and basal-like/triple-negative breast cancers (TNBCs), based on expression profiling or protein surrogates [
2‐
4]. TNBC is defined as ER
(−), PR
(−), and HER2
(−) and is typically more heterogeneous and aggressive than other subtypes, resulting in a relatively poor prognosis. Scant data and lack of biomarkers limit us to build robust models that can stratify patients and predict clinical outcomes [
5‐
9]. Differences between ethnic groups and uneven reporting serve to further complicate matters. For example, while the metabolic profile of African-American TBNC patients has been reported [
10], Asian women are subjected to different risk factors due to lifestyle and genetic differences, which can modulate treatment responses and impact disease outcomes [
11‐
16]. Despite the association between breast stromal collagen density and invasive breast cancer, which is second only to deleterious germline BRCA1 and BRCA2 mutations [
17,
18], stromal biology remains poorly delineated compared with other cancer cell compartments and immune cell populations. Although some previous breast cancer studies incorporate Asian women, large-scale, in-depth, and specific collagen structural data from non-Caucasian women, especially those with TNBC, are lacking. In this work, our group sought to investigate the effect of collagen structure on TNBC outcomes in a cohort of Asian women.
The PD-L1/PD-1 treatment is promising, while the overall response rate is not ideal including breast cancer. Many factors may impact the patient outcome. Mesenchymal cells, immune cells, extracellular matrix (ECM) components, lymphatics, and vasculature are all present in breast cancer stroma. Immune cell infiltration, disruption of the basal membrane and myoepithelial cell layer, and remodeling of stromal collagen are among the earliest key events in the development of invasive breast cancer [
19,
20]. Recent studies have also demonstrated that stromal reaction influences the efficacy of particular treatments, including immunotherapy [
21,
22]. Furthermore, the importance of stromal and collagen biology in breast tumor progression is highlighted by differential immune, angiogenic, and fibroblastic responses, and an array of stromal genes may be used to predict the clinical outcome [
23,
24]. Collagen fibers represent the major structural ECM component in breast tumors, and increases in stromal collagen fibers have been demonstrated to facilitate breast tumor formation, invasion, and metastasis. Collagen is secreted by cancer-associated fibroblasts (CAFs), which are involved in tumor stromal activation, and may lead to tumor progression through multiple mechanisms, including neoangiogenesis, tumor cell proliferation, and invasion [
25‐
29]. CAFs also affect tumor progression by reprogramming the tumor microenvironment at both the metabolic and immune levels and by promoting adaptive resistance to chemotherapy [
30]. In this work, we only focus on the quantification of the collagen remodeling and explore its impact on the TNBC patient survival.
Stromal collagen remodeling is characterized by collagen realignment in the stromal compartment, and multiple recent studies have demonstrated that collagen structure, profiles, and patterns within the tumor stromal microenvironment have diagnostic and prognostic value [
31‐
40]. However, the role of collagen remodeling in tumor formation and metastasis is complex, and while correlations between breast tissue density and tumor formation have been reported [
31], the relationships between collagen density, structure, and cancer cell migration and invasion remain to be fully understood. Collagen in breast tissue was historically thought to form a physical barrier that prevents tumor cell migration, with collagen degradation and deposition being prerequisites for this process [
31,
33,
34,
38,
39]. Furthermore, three tumor-associated collagen signatures (TACs) have been reported [
40], and radial alignment of collagen fibers around the boundaries of tumors is associated with tumor cell invasion. As a result, this topic remains the subject of debate within the field, and no conclusive statement has been reached.
This issue may, at least partially, be due to historical methodological limitations. The remodeling and quantification of collagen have traditionally been studied using biochemical staining techniques, such as Picrosirius Red (PSR) staining or Masson’s trichrome (MT) staining. However, this analysis is highly dependent on the staining protocol and color deconvolution pipeline in digital image analysis. Due to the limitations presented by these imaging techniques, only the total amount of collagen present in tissue samples is typically assessed, with the quantitative structure of collagen fibers remaining beyond their scope. However, the development of quantitative stain-free collagen imaging techniques, such as second harmonic generation (SHG), permits the quantification of collagen structure at a finer level of detail. In this study, we used two-photon excitation (TPE) and SHG to scan breast cancer tissue microarrays (TMAs). SHG is a multiphoton, laser-based, quantitative non-linear optical imaging technique used to identify fibrillary collagen in fixed tissues. Due to its physical principles, it is highly sensitive to changes in collagen fibril and fiber structure and also to the remodeling of connective tissue.
We developed a fully automatic digital collagen profiling platform based on TPE/SHG imaging techniques to quantify collagen using four parameter categories: intensity/area, textural, structural, and fiber distribution features. Numerical image features were extracted from each image. The extracted features were selected using feature selection algorithms, and associations between these features and the existing clinicopathological parameters were investigated across the whole patient cohort. Bioinformatics models were designed for the purpose of classification (diagnosis) and prediction (prognosis). Our results provide a computational solution to classify collagen into two distinct modes based on fiber SHG signal intensity, texture, and morphology: aggregated thick collagen (ATC) and dispersed thin collagen (DTC). Several imaging features were strongly correlated with clinicopathological characteristics, and ATC collagen fiber density (CFD) and DTC collagen fiber length (CFL) were revealed to be of prognostic value based on our patient cohort and their clinical outcomes. Separation of ATC and DTC provides a novel understanding of collagen remodeling during cancer progression, and our results may help to resolve the debate over whether collagen has a role in inhibiting or promoting patient survival. All extracted parameters are listed in Supp. Table 0
1, and they are quantified in ATC and DTC region separately.
Conclusion and discussion
Increased computational power and the integration of computational solutions with artificial intelligence (AI) and machine learning (ML) have resulted in substantial improvements to clinical diagnosis and prognosis. This has had significant implications for the field of histopathology too, where traditional assessments typically involve qualitative characterization on a numerical basis. The resultant discrete numbers are often not sufficient to build robust mathematical models for clinical diagnosis or prognosis, particularly when generated from manual assessments with certain subjective inconsistency. Thus, scanning and digitization of glass slides represent the future for AI-assisted pathology. Imaging techniques are also changing, shifting from basic H&E staining to other techniques, including immunohistochemistry, immunofluorescence, and stain-free imaging techniques, such as SHG. In this work, we applied TPE/SHG to investigate the effect of collagen structures on the prognosis of TNBC patients, which is an aggressive but heterogeneous subtype. SHG imaging quantified the collagen component, including collagen types I and III. Our team developed an automatic image analysis solution to extract different collagen features from the SHG images, including collagen structural information.
The literature has reported that while collagen can potentially form a protective barrier and prevent cancer cells from escaping their original site, collagen fibers have also been found to serve as a “highway” that facilitates cancer cell migration to remote locations, impacting patient survival [
44‐
47]. Previously, the mechanisms underlying these contradictory reports were unclear [
48]. This historical lack of understanding was due to the paucity of methods to differentiate between collagen structures. As demonstrated by a number of previous studies, it is an oversimplification to characterize collagen remodeling as a simple increase or decrease in total ECM collagen content [
48]. It is more important to understand the structural differences in various regions under diverse conditions. For example, the orientation of collagen fibers, such as TACs, is a key parameter [
40]. Fibrils are packed to form collagen fibers. The organization and distribution of these fibers within the tissue are also important, for example, whether the collagen forms any aggregations. One unique parameter generated by SHG in the present study was CFD, which provides a strong indication of collagen fiber strength and fibril packing structure, as the laser power and system calibration generate a correlation between pixel brightness and collagen strength. In our study, we showed some extracted parameters were strongly associated with certain clinicopathological features, including tumor size and DCIS associations. However, tumor grade was not relevant with the investigated collagen features as we expected. This is because grade assessment is mainly based on nuclei morphological and packing features. It is not linked to collagen characteristics. Regardless, the results of our study suggest the importance of assessing collagen structure, as it appears to represent a potential key prognostic parameter. According to our data, tumor grade and tumor size are also poor predictors of prognosis, as shown in Supp Fig.
3.
Furthermore, our results suggest that the collagen compartment has two distinct modes, named ATC and DTC, and differences between these modes underpin the different roles of collagen. Collagen remodeling can either prevent or promote cancer cell migration, and while the mechanism remains to be fully understood, we demonstrated that ATC CFD and DTC CFL together have strong prognostic value. Higher ATC CDF indicates the presence of stronger collagen fibers surrounding the tumor nest, through which it is difficult for cancer cells to escape (Fig.
5a). Therefore, this condition results in a more favorable prognosis. DTC CFL also affects patient survival in a complex way. If a strong protective layer is present to prevent the invasion of malignant cells, the longer collagen fibers facilitate immune cells to promptly respond to combat those malignant cells. On the other hand, if the first layer of protection is weak leading to accumulation of a large number of malignant cells in the stromal area, longer collagen fibers may provide a more effective “highway” which assists cancer cell migration. Once the cancer cell attaches to the long collagen fibers, it is easier for them to “grasp” it and subsequently migrate long distances. Figure
5 schematizes the four possible conditions, and survival curves for these groups are presented in Fig.
4.
Furthermore, due to limitations of previous techniques, the quantification of collagen structure and investigation of the relationships between numerical collagen structure parameters and patient survival were not previously feasible. We demonstrated that patients with the best survival (ATC CFD
(+) and DTC CFL
(+)) had increased aggregated collagen area (Fig.
6b) and increased collagen brightness, representing density (Fig.
6c), compared with other groups. However, increases in collagen brightness may have been caused by collagen aggregation. The presence of more collagen in a given small area may also have resulted in a stronger signal. However, in Supp. Fig.
4A, high DTC CFD was observed in patients with the best survival. This suggests that in the DTC area, the collagen fiber was also brighter. Through combining the above data, we conclude that the collagen fiber packing structure differed in such a way that it caused high CFD in both the ATC and DTC regions. Potentially, the collagen fiber packing structure is also associated with the collagen aggregation patterns. In both the DTC and ATC regions, the CFT average value was the same. This further supports the observation that the CFD is not due to a simple clustering of nearby fibers but a real increase of intensity of the fibers (the denser the fibrils within the fibers are, the higher the SHG intensity is, directly linking the intensity of the fiber with the density of fibrils that compose it). As summarized in Supp. Table
3, the ATC is the main actor in the difference between the best and worse survival groups whereas the DTC is the key to highlight differences among all 4 groups and notably the intermediate groups. ATC area ratio is a key parameter to differentiate the best survival patients with the rest of condition, while other parameters contribute to further classify patients on other aspects. Although the results of this study demonstrate that the aggregation of collagen and collagen brightness (represented by CFD and indicating different collagen packing structures) and the DTC CFL are two key collagen-associated parameters that may impact patient survival, the generic mechanism that controls the differences described in our data remains unclear and requires further investigation.
TNBC patients represent a heterogeneous group [
47]. Lymph node status is the only prognostic feature for TNBC patients, also supported by our existing data, as shown in Supp. Fig.
5. The use of this prognostic factor is dependent on the detection of cancer cells in the lymph nodes. The percentage and number of patients with positive lymph node status in the four groups is presented in Supp. Fig.
6A and B. In total, 55% of ATC CFD
(−) and DTC CFL
(+) patients were lymph node positive, last row in Supp. Fig.
6A; the prognostic model presented in the study shows some correlation with the lymph node status, i.e., 55% of worse survival patients were lymph node positive whereas only 67% of the best survival patients were lymph node negative. Theoretically, it would be possible to build a model based on the collagen parameters investigated in this study to predict patient survival even before metastasis.
The collagen structure parameters extracted in the present study were closely associated with certain pathological parameters, and the mechanism presented in Fig.
5 warrants further investigation. We demonstrated that collagen structure differs in three aspects: (i) the degree of aggregation, (ii) collagen fiber packing structure, and (iii) the length of the collagen in the DTC regions. Although this evidence suggests a potential answer to the historical debate on the role of collagen in breast cancer survival, there may or may not have been causality. Other cellular factors may affect collagen structure, for example, migrating cell behavior, including both immune cell and the migratory capability of cancer cells followed by epithelial-mesenchymal transition (EMT). For example,
NK cells have antifibrotic properties which decrease with progression of fibrosis [
49]. Tumor necrosis factor α (TNF-α) [
50], transforming growth factor β (TGF-β) [
51‐
53], IL-11 [
54,
55], and OSM [
56,
57] are some of the known cytokines involved in the fibrogenesis pathways. However, if any of these factors will impact the abovementioned three aspects and then affect the TNBC patient survival still need further investigation.
Finally, we compared our results with the benign samples and DCIS samples. The data of ATC area ratio, ATC CFD, and DTC CFL are presented in Supp. Fig.
7,
8,
9, respectively. The original data points are presented in panel A, and the mean differences, taking benign as a reference, are presented in panel B. As we mentioned in the “
Results” section, ATC CFD
(+) and DTC CFL
(+) have the best survival. According to the ATC area ratio, ATC CFD, and DTC CFL of these three features, the original data of this best survival population has very similar pattern as patients who were diagnosed as DICS and the mean differences are also very limited. Although other conditions may have one or two parameters closer to DICS patient, they have worse survival as we discussed in Fig.
5.
Although the finding in this work is promising for better stratification of TBNC patients in healthcare system, we must emphasize that it is still preliminary and not yet replicated using whole slide imaging data from an independent cohort. Well-designed rigid validation must be conducted to further confirm its performance before the application in clinical work. The model presented in this work is based on ATC CFD and DTC CFL as highlighted by the blue and green colors in Supp. Table
3; however, other parameters may also contribute to the patient stratification indicating the underline governing mechanism of collagen role in the TNBC patients might be more complicated. It requires further elucidation and is worthy of additional large-scale study. On the other hand, ATC area ratio, highlighted by red in Supp. Table
3, is able to nicely differentiate the best survival patients with other three groups. In practical, this more simplified model can possibly provide some immediate insight in clinic. Tumor progression is an outcome of mutations in multiple different genes, and the relationships between collagen characteristics and other critical tumor compartments, such as the immune response, EMT, adipose, and angiogenesis, should be investigated in a more systematic study.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.