In this study we quantified CD3+ and CD8+ TILs using stereology and image analysis in adenocarcinomas of the colon, and we investigated the intra-tumoral heterogeneity of TILs. We found an excellent correlation between estimates obtained by the manual, stereological technique and the image analysis. Additionally, we found that the intra-tumoral variation is considerably lower than the biological variation among tumors.
Correlation between image analysis and stereology
The slight variation between the stereology count and the image analysis was mainly a result of either weak immunohistochemical staining of the T-cells, where the image analysis may miss lymphocytes, or non-specific background staining (i.e. noise), which may be misinterpreted by the image analysis software as lymphocytes. Thus, the image analysis does not consequently over- or underestimate TILs but is rather dependent on the staining quality of the individual section. In some sections we saw an almost perfect correlation between image analysis and stereology in the CA, however, in the IA the correlation was weaker due to background noise in this particular compartment. The image analysis algorithm was designed using images with different staining intensities, and it is not possible to completely avoid misinterpretation of the stained objects. Inclusion of weakly stained lymphocytes may lead to a higher sensitivity for background noise and vice versa. Basically, the aim of an image analysis algorithm is to remove disturbing features and to enhance the structures of interest, and this can be achieved in many different ways. In the present setting background noise was removed using a mean filter, and for enhancing the lymphocytes we used a polynomial blob filter followed by classification by a Bayesian classifier.
To avoid misclassification the quality of the immunostaining is crucial. Our definition of a positive cell profile for the stereological estimation was a clear cut immunostaining of the cytoplasm/membrane for CD3 or CD8, and a discernible nucleus. To obtain an optimal immunostaining, we only used validated antibodies recommended by NordiQC [
26]. However, we are aware that we did not have any influence on any pre-analytical confounders, such as tissue fixation and processing, which may have considerable impact on the quality of the immunohistochemical staining. This is a weakness associated with all studies using a retrospective design, and a prospective study would be warranted. Section thickness may vary a bit, even when using state of the art microtomes, and this variation in thickness is an argument for not including intensity in the evaluation process. In some cases the image analysis software failed to complete delete areas of mucin or necrosis, but most often the areas were excluded correctly. Furthermore, we investigated the impact of the mucinous component in the six mucinous adenocarcinomas included in our series of tumors by performing a sensitivity analysis excluding the sections of mucinous CC (
n = 18). This resulted in almost unchanged correlation coefficients varying from 0.9497 to 0.9633 (
p < 0.0001; data not shown). The image analysis algorithm did not handle exclusion of muscle tissue and vessels, which may partly explain the discrepancy with the stereological estimation. However, this accounts for very small areas, since this only represents a minor part of the tumors.
Our results are in agreement with a study comparing image analysis with manual, stereological estimates of TILs in early stage cervical cancer [
27], but to our knowledge the present study is the first to comparatively investigate these techniques in CC. Carus et al. [
27] found a significant, but lower correlation between the two techniques in obtaining estimates of CD8+ lymphocytes. This might be explained by the fact that their image analysis yielded an area fraction, whereas their stereological approach was based on numerical density. Thus, the two estimates, obtained in their study, were not directly comparable.
Väyrynen et al. [
14] performed manual, semi-quantitative estimation of CD3+ lymphocytes on captured images in 34 randomly selected cases of colorectal cancer and found numerical densities varying from 1.8 to 2243 cells/mm
2 (median 471 cells/mm
2). They compared with computer-assisted image analysis and found correlation coefficients very similar to our results, varying from 0.960 to 0.987. Their manual counts were not based on stereology, and moreover, they were performed in a limited number of fields of vision without a clearly stated sampling approach. The counts were, however, performed by four different observers. We only had one observer, but using strict stereological counting rules our results were reproducible by both techniques. We investigated the correlation between the numerical density estimates and the area fractions, as obtained separately by the stereological technique or image analysis, and found excellent correlation coefficients for these “intra-technical reproducibility tests”.
Stereology is considered the gold standard for histopathological quantification; however, we did not perform unbiased stereology. Unbiased estimation of cell quantity would require a 3D probe, e.g. using a disector, where the third dimension is taken into account [
28]. Such an approach would be very time consuming, and since our aim was to compare the stereological estimates with counts obtained by image analysis, we counted TILs only in 2D, i.e. profile counting. A lymphocyte has an average diameter of 8–10 μm, and to avoid missing cells we used 4 μm thick tissue sections. Thicker sections may both in the stereological technique and image analysis give difficulties in counting overlapping cells, especially in tumors with high numbers of infiltrating TILs. Halama et al. [
29] focused their study on the count of lymphocytes in conglomerates and they used 2 μm thick sections to minimize the possible overlap between cells. They found a high inter-observer variation for manual counts, especially for tumors with high number of TILs. Our concern with 2 μm sections would be problems with a clear nuclear definition. Due to the issue of overlapping cells we also estimated TILs by area fractions. When comparing the stereological estimates of area fraction with those obtained by image analysis, we found a slightly lower correlation compared to the numerical density estimates, but overall the correlation was nearby optimal.
Image analysis provides objective quantitative measurements and is known to have a high reproducibility (i.e. precision), which makes it highly valuable in terms of standardization. According to the algorithm design, the same result can be produced again on the same image. However, accuracy of the measurement, which is defined as the closeness of agreement between a measured value and the true value, is also important to take into consideration. Image analysis can provide precise and reproducible results, which may, however, be biased. Generally, image analysis is validated by semi-quantitative, manual evaluations that may be associated with considerable inter- and intra-observer variability, even among trained pathologists. Such validations could therefore lead to a systematic skewness or bias of the results, and this is the main reason why we choose to compare our image analysis algorithm with stereology, which has both a high precision and accuracy. This is especially important, when considering cut-off levels for the triage of therapy. Moreover, most papers [
6‐
8,
10‐
13] do not report validation, and thus the obtained estimates of TILs might be biased and difficult to compare to other studies. Stereology and image analysis both produce numerical data, which have the advantage of easy comparability with results obtained by the use of another software. Use of semi-quantitative approaches might inhibit or make comparisons with other studies difficult.
The stereological approach has an inherent subjectivity, in that the investigator needs to decide what to count or not to count, but the use of well-defined counting rules and immunohistochemical stains reduces this subjectivity to a minimum. Image analysis is non-subjective. Both methods are considered robust and reproducible. The only observer bias might be associated with the subjective outlining of the region of interest, however our clear definition minimizes this source of bias.
The ongoing
Immunoscore project has the aim of standardizing the procedure for quantifying TILs, and the scientific research group advocating the Immunoscore also recommends automatic quantification of TILs [
15,
16]. Similar to the Immunoscore project, we analyze CD3 and CD8 positive TILs in two different areas of the tumor. However, a difference between the studies might be the definition of the sampling areas, and this is highlighted as the time consuming part for the pathologist [
30]. Moreover, we have not been able to find an exact description of the sampling approach (i.e., area selected) used in the Immunoscore project. Although stereological methods have evolved over the years to be more efficient, they are still laborious and time consuming. In our study the stereological counts required an average of 20 min per section. In contrast, the image analysis method required less than one minute by the investigator. The subsequent automatic image analysis took 10–15 min per slide (analyzing 100% of the region of interest), but the analysis could be performed at any time, day or night, without investigator assistance. Thus, automated digital image analysis requires less human resources than the manual stereological approach, which is in agreement with Ong et al.
, who found the use of computer-assisted, pathological immunohistochemical scoring time-saving compared to conventional visual semi-quantitative scoring [
31].
Heterogeneity
We investigated the heterogeneity of TILs in both the CA and the IA of the adenocarcinomas, well aware that this only represents a minor part of the whole tumor. Dealing with a retrospective design, it was not possible to overcome sampling bias, since the investigated tissue had already been sampled and prepared for diagnostic purposes. Many studies on heterogeneity have used tissue micro-arrays (TMAs), and depending on the core diameter, the analyzed tumor area varies from 0.28 mm
2 [
8] to 3.14 mm
2 [
9]. We used three whole sections from each tumor and analyzed a considerably larger tumor bulk than TMA-based studies.
Intra-tumoral heterogeneity may lead to sampling bias, and it is important to take this into account in estimating TILs, especially with the perspective of clinical, diagnostic implementation. Some studies evaluate TILs by hot spot sampling [
6,
7,
12], while others evaluate in randomly selected tumor areas [
10]. To overcome sampling bias due to intra-tumoral heterogeneity, several studies investigating TILs have focused on different tumor compartments, e.g. the Immunoscore, which combines analysis of TILs in the tumor center and at the invasive tumor margin [
5‐
7]. This may overcome the heterogeneity between the tumor center and the invasive front but does not take into account the heterogeneity found solely in the central and/or invasive front of the tumor. Galon et al. [
8] measured CD3+ and CD8+ TILs in duplicates of spots representative of the tumor center and invasive tumor front. They documented a high level of homogeneity in each tumor region, but it was not reported, whether the sampling was based on hot spots or randomly selected FOV. Also, Nosho et al. [
11] investigated heterogeneity by taking two-four TMA cores from each tumor, but it did not appear from which part of the tumor these cores were taken. Despite the investigation of several cores form each patient, none of these studies present data on intra-tumoral heterogeneity, and the reported results on the prognostic impact of TILs are inconsistent [
8,
11].
Overall, the heterogeneity of TILs in CC has only been sparsely investigated. The study most similar to ours was performed by Laghi et al. [
13], who measured CD3+ T-cells in three random and non-contiguous microscopic areas representing the deep front of tumor invasion. Their investigation was restricted to one tissue section from each tumor and TILs were quantified solely as area fractions. However, they found homogenous results in 66% of the tumors, which is in agreement with our results. In the invasive tumor front we found ICC for CD3+, quantified as area fractions, to be 0.702 by the image analysis method and 0.746 by the stereological technique.
In summary, we demonstrated that biological inter-tumoral variation contributes to the overall variation of TILs estimates to a much higher degree than intra-tumoral heterogeneity. It may therefore be rational only to quantify TILs in one section. For the purpose of reproducibility and comparability we recommend investigating the section representing the deepest invasive tumor front.