Background
Multiplex IHC (mIHC) and multiplex immunofluorescence (mIF) are methods that are used to detect multiple targets in a single histologic section with different colored chromogens (e.g. DAB, AES, TMB, BCIP) or fluorophores for mIHC and mIF, respectively. Traditional IHC employs a single antibody for each tissue section, where multiple markers are assessed in consecutive serial tissue sections. Therefore, mIHC increases our ability to observe direct interactions between cells within the appropriate histological context in a single tissue section and maximizes the number of markers that can be assessed with limited tissue. Fully automated mIHC and mIF platforms are being deployed as high-throughput assays for future use in CLIA/CAP certified laboratory settings.
We used a mIHC platform to visualize inflammatory responses in the tumor microenvironment of pancreatic ductal adenocarcinoma (PDAC). We chose this model system since PDAC is one of the deadliest types of cancer, known to be poorly immunogenic and unresponsive to currently available immunotherapeutic treatment options [
1,
2]. Investigation of the relationship between PDAC and the inflammatory microenvironment could be further advanced by the development of methods that quantify cell populations and their distribution within the tumor microenvironment in an automated and reproducible fashion. We utilized mIHC rather than mIF due to the decay of fluorophores over time, challenges associated with interpreting mIF from the lack of histologic context, and need for specialized fluorescence or spectral imaging instrumentation that is labor intensive, expensive, and requires expertise [
3‐
20].
The analysis of inflammatory responses in the tumor microenvironment (TME) is increasingly significant as the development and deployment of immunotherapeutic protocols continues to increase for many types of cancer [
2,
16,
18,
21‐
30]. Investigations of tumor-immune interactions in the TME using mIHC may help improve clinical outcomes through the discovery of predictive and prognostic biomarkers [
14,
26,
27,
30‐
64]. Since tumor-immune interactions are exquisitely complex and diverse across different types and subtypes of cancer, meaningful analysis of the TME requires the detection and classification tumor cells and immune cell subtypes to (1) characterize the functional immune status of the TME, (2) identify potential intrinsic immune biomarkers, and (3) provide insight into the expression of known immunotherapeutic drug targets. In order to clinically implement mIHC, pathologists have to be able to meaningful interpret multicolored tissue sections that contain several types of labeled cells.
Thus, computational methods are being explored to augment traditional histologic examination in an effort to help reliably detect and classify multiple distinct cell populations in digital whole slide images (WSIs) of mIHC-stained tissue sections [
4,
6,
7,
9,
11,
13‐
15,
19,
65‐
67]. We developed a suite of algorithms that leverage deep learning to overcome the need to use specialized multispectral imaging instrumentation for quantitative analysis of mIHC WSIs containing six or more distinctly colored chromogens. Our methods utilize computationally inexpensive deep learning convolutional neural networks (CNNs) that are trained to separate colors and classify cells in a time efficient and comprehensive manner with limited training data. The success of each of the methods demonstrates the value of using deep learning-based image analysis methods for automated analysis of mIHC WSIs. Therefore, we also present an application of our methods to quantitatively describe the spatial relationships between tumor and immune cells in PDAC as an example of the types of insights that can be gained from such analysis.
We report our efforts to develop and test complementary color deconvolution and immune cell classification methods by using deep learning CNNs. We developed a suite of deep learning tools with two distinct algorithmic approaches and combinations of these methods. Our suite of deep learning tools includes (1) a deep autoencoder for color decomposition, (2) a U-Net based approach for cell segmentation, and (3) multiple ensemble approaches intended to increase the positive predictive value (PPV) of cell detection and classification. This manuscript reports the development of these methods in a specific use case to quantitatively analyze the expression of six biomarkers to study tumor immune interactions in PDAC. Our goal was to develop these methods to build robust and scalable analytic pipelines that can be easily configured and deployed to analyze mIHC WSIs for a wide array of research and clinical applications.
Discussion
The methods described for image analysis of mIHC-stained slides were designed to be robust, reliable, and easily customizable for future clinical research applications. We developed our suite of analytic methods in an effort to make a clear and significant advancement in the ability to survey the immune landscape of PDAC using deep learning to help unravel the complexity of tumor immune interactions in the TME. Our goals were to develop a scalable suite of methods to analyze PDAC mIHC WSIs in a uniform manner, where we can (1) reliably detect, classify, and enumerate different cell types labeled with different colored biomarkers, (2) calculate the distances between the boundaries of tumor and immune cells in mIHC WSIs, and (3) perform spatial analyses to quantitatively describe a large number of diverse tumor immune interactions in multicolored mIHC WSIs without needing expensive multispectral imaging instrumentation.
Our models leverage CNNs trained with this ground truth data to perform pattern recognition functions with statistical multivariate algorithms to predict color and classify all of the different types of labeled cells in the PDAC mIHC WSIs. The methods described leverage relatively inexpensive seed labels (dots) that can be used to generate training sets. Importantly, the ability to use this form of annotation significantly decreases the effort for pathologists to generate training data since placing seed labels at the center of each cell is kuch quicker than manually segmenting all of the different types of cells by hand. Significantly reduced time, labor, and cost leads to the ability to quickly customize analytic pipelines and improves the scalability of our methods.
After training, our models, which are sophisticated statistical algorithms, iteratively improve by learning additional features in successive cycles. These deep learning models perform non-linear regression in large data sets to make predictions that can be used to quantitatively analyze the features of the uniquely colored cell types in mIHC WSIs. However, evaluating these algorithms in terms of their ability to correctly identify and classify six distinct cell populations with variable spatial distributions simultaneously in mIHC WSIs requires many considerations.
The variability of shapes and sizes of cells along with the variable expression of each of the biomarkers in individual cells within the different labeled cell population leads to formidable challenges for any pathologist and algorithm. Furthermore, subtle differences in staining patterns coupled with overlapping color spectra of the chromogens introduces difficulty in color decomposition from the very beginning. For example, intense yellow and light black can both appear brown. This is further complicated in cases where a cell class may be labeled with more than one biomarker, e.g., localization of yellow and purple within the same cell can appear red. Thus, we need digital pathology and image analysis tools that can accurately distinguish different cell classes based on the variability of color that depends on how each types of cell is labeled with a particular biomarker in WSIs of mIHC tissue sections. Despite the technical challenges, the proposed ColorAE method generates color decomposition results that are generally consistent with Vahadane’s method (as shown in supplemental Fig.
1).
However, ColorAE was designed to analyze mIHC WSIs images with more than three colors. ColorAE performed generally better than U-Net at correctly detecting and classifying multicolored immune cells since ColorAE was able to detect lighter colored immune cells that U-Net failed to detect. We also observed that ColorAE captured fine geometric details that U-Net could not, which is particularly evident when comparing CD8 purple masks. There were also very few B-cells in the tissue sections, which resulted in sample bias, where CD20 red B-cells were often misclassified as CD8 purple T-cells and reflected by the low F1-score. CD16 black myeloid cells and K17 brown PDAC cells were also sometimes difficult to distinguish. Both ColorAE and U-Net sometimes misclassified CD16 black as K17 brown and vice versa. Importantly, this seemed to be related in part to the choice of chromogen, where the combination of the black chromogen coupled with the diffuse staining pattern in subsets of myeloid cells appeared brown to the human eye, which can only be distinguished from K17 brown PDAC cells with morphology.
U-Net outperformed ColorAE to detect and classify K17 brown PDAC cells that were counterstained with hematoxylin. Both U-Net and ColorAE can fail to include cell nuclei in the mask since the algorithms generally classify hematoxylin as part of the background. The nuclei of PDAC cells are large and euchromatic with cytoplasmic K17 staining, so it is likely that the algorithms cannot distinguish the nuclei of tumor cells from the background in this use case. Overall, U-Net generally performs better than ColorAE to identify tumor cells. It is important to note that while the tumor cells (and the total tumor mask area) may be underestimated from the exclusion of some nuclei, the boundaries of tumor nests were preserved. Thus, there was still reliable data on tumor nest locations that could be reliably used for downstream spatial analyses.
Furthermore, we show that the methods are complementary, where U-Net had worse recall than ColorAE to detect tumor, but demonstrated significantly better precision. We also observed that ColorAE predicted very detailed masks but was too sensitive in terms of picking up non-specific and background staining. This can be addressed with post-processing by filtering out predictions that contain objects with areas that are below the threshold of being able to be considered as cells. In comparison, the U-Net model produced reasonably conservative predictions, predicting areas of the cell with high intensity staining. However, cells with irregular extensions and low staining intensity were sometimes not detected (Fig.
4). Overall, U-Net performance was limited by the quantity of superpixel labels for training.
In order to address these issues and limitations, we developed the suite of four ColorAE:U-Net ensemble models to detect
intersections, where a given pixel is predicted to contain a specific color if the pixel is within both of the ColorAE and U-Net masks, and
unions, where a pixel is predicted to contain a specific color if the pixel is within either the ColorAE or U-Net masks. We recognize that if each cell class is considered independently, the same pixel may be classified as one class by ColorAE and a different class by U-Net (Fig.
4), so we consider both labels in these scenarios. While sometimes this may be a false positive, in other cases this may be reflective of expression of multiple markers on a single cell (e.g. CD3 + CD4+ cells) that results in compound colors. By treating both of the prediction labels assigned to a given pixel as valid, we can capture this phenomenon to some extent.
Even though the qualitative results from all ColorAE, U-Net, and the ensemble methods are generally acceptable, the Union ensemble demonstrated the best sensitivity (recall), as shown in Table
3. This is to be expected as the Union ensemble considers pixels positive for each color if the colored label is predicted by at least one model. In terms of precision, the Intersection ensemble demonstrated the best overall positive predictive value (precision) as shown in Table
4, whereas the Union anchor AE demonstrated the best overall F1 score is considered as shown in Table
2, even though the F1 scores are not directly applicable as a performance metric due to intrinsic variability in the intensity and staining patterns of biomarkers in cells. Although we report considerable progress in developing methods that measure six or more different colored biomarkers in mIHC WSIs, we have to note that these models were trained with a limited dataset and were trained to achieve reasonably good overall performance.
Our results indicate that (1) there is no single universal method that can be the best across all of the performance metrics to target every one of the colored IHC markers and (2) multiple complementary methods can be utilized in analytic pipelines to improve the overall reliability of using computational analysis for mIHC WSIs. In our current use case, we used these novel methods to evaluate the tumor microenvironment PDAC mIHC WSIs. While our focus was to create and evaluate methods for the accurate automated detection of the immune cells in mIHC WSIs, we wanted to demonstrate the types of downstream analyses that can be done to investigate spatial relationships between cell subsets. The nearest neighbor and proximity analyses are based on the spatial positions of all masks across the entirety of the tumor region from a representative PDAC mIHC WSI. For the sake of providing a concrete example, we demonstrate proof that our methods can be used to comprehensively analyze collections of mIHC WSIs.
We emphasize that these methods are still experimental, being refined, and require further comprehensive testing and validation in additional mIHC studies. For example, we observed that segmentation of the boundaries of large PDAC tumor nuclei were occasionally suboptimal and sometimes not detected based on tumor morphology, overlapping nuclei, and obscured nuclear boundaries from intense staining. Even though this limitation can potentially pose a problem with respect to accurately counting every tumor cell, it may not be a significant issue in terms of downstream analyses, including nearest neighbor spatial analyses, since the overall edges of the tumor nests are accurate enough to determine the center point and perimeters of the masks. Nonetheless, the area of K17 brown staining or the number of pixels belonging to K17 masks can still be calculated in order to provide a reasonable estimate of tumor area.
During the microscopic examination of multicolored PDAC mIHC WSIs, what one commonly observes is a fascinating distribution of classical DAB brown-stained K17+ PDAC cells in close proximity to an abundance of black-silver colored CD16+ myeloid cells (e.g. macrophages) with variably interspersed purple colored CD3 + CD8+ T-cells, teal colored CD3 + CD4+ T-cells, and yellow CD3 + CD4-CD8- T-cells. We also have observed that red colored CD20+ B-cells are usually rare in the immune infiltrate associated with PDAC tumor cells, but present in lymphoid aggregates much further away. After histologic review, we utilized our suite of methods to perform spatial analyses in an effort to evaluate the feasibility of quantitatively describing the immune landscape in our PDAC mIHC study. The spatial analyses show how the TME of these PDACs is rich in myeloid cells with a relative dearth of T-cells and B-cells. We also gained insight into patterns of distribution of the three different populations of T-cells. Interestingly, we observed that a significant proportion of the yellow CD3 + CD4-CD8- T-cells may actually represent NK/T-cells, gamma-delta T-cells, or immature T-cells, which can be used to guide other studies.
We are eager to explore whether increasing the size of the cohort will allow us to determine if these patterns are conserved across different cases of PDAC. Furthermore, we are examining the relationship of the spatial patterns of distributions of their different immune cell types with survival data to identify potential prognostic biomarkers. We are also engaged in ongoing studies that are applying these deep learning analytic methods across a much larger cohort of PDAC mIHC WSIs. Future work will also evaluate the relationships between different types of immune cells beyond tumor immune interactions in an effort to better understand cancer immunology.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.