Introduction
-
The state-of-the review articles address a specific aspect of the problem such as segmentation of potential regions of interest or classification. However, we have summarized all the steps of BCHI analysis such as pre-processing, segmentation, feature extraction, and classification. Further, we have also summarized traditional and Deep Learning (DL) based methods to process BCHI.
-
Developing DL models dependent on the availability of large datasets with annotations. In this regard, we have summarized various publicly available datasets.
-
A summary on recent trends and popular choices of methods for various steps in processing BCHIs is provided.
-
Finally, we have summarized, our observations, all the challenges of processing images along with the future direction.
Publicly available datasets
Database Name Ref | Total no. of images | Magnification | Image details (image size and format) |
---|---|---|---|
BreakHis [24] | 7909 | 40X, 100X, 200X, 400X | Benign=2480, Malignant=5429 |
700*460 pixels | |||
PNG format | |||
IDC [30] | 162 | 40X | 198,73=IDC negative, |
78=IDC positive patches from 162 slides | |||
1360*1024 pixels | |||
tiff format | |||
BACH [25] | 430 | - | 400=Microscopy images (2048*1536 pixels)-image-wise label |
30= Whole-slide images (42113*62625 pixels)- pixel-wise label | |||
tiff format- microscopy images | |||
in.svs format- WSI | |||
TUPAC-2016 [27] | 821 | 40X | 500=training |
321=testing | |||
Camelyon- 2016 [28] | 400 | 40X, 10X, 1X | WSIs of sentinel lymph node of breast cancer |
Camelyon- 2017 [31] | 200 | 40X | WSIs of sentinel lymph node of breast cancer |
MITOS-ATYPIA-14 [26] | - | 20X,40X | 284 frames at 20X magnification, |
1136 frames at 40X magnifications | |||
tiff format | |||
Bioimaging 2015 [29] | - | 200X | 249=training, 20=testing and 16 extended test datasets |
2048*1536 pixels | |||
BreCaHAD [32] | 162 | - | 1360*1024 pixels tiff format |
Breast cancer | 151 | - | WSI images of breast cancer semantic segmentation [33] |
NuCLS [34] | 151 | - | WSI images of breast cancer |
Overview of the review articles on automation of histopathological image analysis
Image processing approaches
Color normalization
Segmentation
ROI detection and segmentation using a traditional approach
ROI detection and segmentation using DL approach
Segmentation method (Generally categorization) | Segmentation method (Particular categorization) | ROI | No of images | Evaluation Metrics |
---|---|---|---|---|
Threshold-based method | Adaptive thresholding [76] | Cancer Nuclei | 24 H&E images | NA |
Region-based method | Marker-controlled watershed-based [79] | Nuclei | 39 images | PP=90%Sen = 83% |
DC = 0.9 | ||||
Watershed-based [78] | Nuclei | 26 cells | F1-score=0.93 | |
Clustering-based | Graph-based clustering [88] | Epithelial areas in WSIs | 75=benign | NA |
Density-based spatial clustering [89] | Neoplastic epithelium | 75=DCIS | F1-score=0.88 | |
K-means clustering [87] | Nuclei | 100 H&E images | Mean Jaccard | |
index = 0.84 | ||||
Accuracy = 85% | ||||
K-means clustering [91] | Tubule | 10 H&E WSIs | Accuracy = 90% | |
29 H&E images | ||||
Fusion-based method | Gradient driven voting mechanism + | Nuclei | 8 H&E WSI | Precision=93% |
Markov Random | Recall=96% | |||
Field loop backpropagation [81] | DC=0.9 | |||
Wavelet decomposition + multi-scale region-growing [82] | Nuclei | 32=Normal cell 22=Cancer cell | Accuracy=91% | |
Expectation–maximization (EM) driven geodesic active contour+ overlap resolution [80] | Lymphocytes | 100 images | Sen=86% | |
Clustering +watershed-based [77] | Nuclei | 149 cells | Accuracy =87% | |
AdaBoost+active counter [83] | Nuclei | NA | Accuracy=95% | |
Adaptive thresholding + Clustering [76] | Nuclei | 24 H&E images | NA | |
DL based | DNN=Pang Net, Fully Convolutional Net, Decon Net [96] | Nuclei | 2754 annotated nuclei | Accuracy =95% Recall=90% IU =81% Precision=86% F1-score =80% |
Stacked Sparse Autoencoder [99] | Nuclei | 3500 nuclei from 500 images | F1-score=84%, Precision-Recall Curve=78% | |
Encoder and decoder model [95] | Tissue labels | 240 biopsy images | Accuracy=93% | |
Mask R-CNN [94] | Nuclei | 33 images of 512X512 | Precision=91% F1-score=0.86 | |
Bending loss regularization network [97] | Nuclei | 21000 nuclei (4 breasts) | DC = 0.81 | |
DCNN +Encoder and decoder [111] | Tissues | 12 breast cancer WSI | FWIoU= 95% | |
CNN [100] | IDC | 162 WSI slides | F-score =71% Accuracy= 84% | |
DL based | CNN+ Active counter+ Adaptive ellipse fitting [104] | Nuclei | 204WSIs | F1-score=80-85% AveP=74-82% |
Residual-inception-channel attention U-net [105] | Nuclei | TCGA dataset | F1-Score=0.82 | |
Atrous spatial pyramid pooling U-net [102] | Nuclei | NA | NA | |
Conditional Generative adversarial network [106] | Nuclei | NA | F1-Score=0.86 | |
Transfer learning based-deep CNN [110] | Mitosis cell | NA | F1-Score=73% Precision_recall=76% | |
DCNN [107] | Mitosis cell | 920 mitosis cells | Precision=0.84% Recall=0.83 F1-score=85.05 | |
Others | Level set information [84] | Nuclei | 18=Benign 36=Malignant | Accuracy=81% |
Hybrid level set information [56] | Nuclei | 4000 Nuclei | NA | |
Color-based [62] | NA | TCGD dataset | Accuracy=85% |
Classification
Classification using traditional image processing approach
Classification using DL approach
Classification using hybrid approach
Year | Pre-processing | Segmentation | Feature Extraction | Classification | Evaluation Metrix | Ref |
---|---|---|---|---|---|---|
2016 | NP | NP | Curvelet, LBP | SVM, Random forest, Decision tree, Polynomial classifiers | Acc=91% (Polynomial classifier) | [115] |
2017 | Color deconvolution | NP | LBP | Random Decision Tree | Acc=84% | [113] |
2017 | Macenko, Nonlinear transformation | Thresholding | Color, texture, Shape | SVM | F-score=88% | [185] |
2017 | Non liner mapping | Hybrid active counter | Pixel, Object, semantic level | SVM | Acc=92% | [170] |
2017 | Macenko | NP | Color, shape, Nuclear density | CNN, SVM | Sen=95% | [29] |
2018 | Macenko | NP | CNN | FCN | Acc=87% | [156] |
2018 | Gaussian Blur Filters | K-means, Watershed | Morphology, Geometric | Rule-based, Decision Tree | Acc=70-86% | [121] |
2019 | Macenko | NP | VGG16 | FCN | Acc=94-97% | [161] |
2019 | Color deconvolution | NP | VGGNet | Random forest, FCN | Sen=90%, Pre=87%, F1-score=88% | [183] |
2019 | Macenko | NP | Inception network | Gradient Boosting Tree | Acc=91-95% [BreakHis] | [178] |
2019 | Quantile normalization | Hybrid level set | CNN | SVM | Acc=90% | [56] |
2019 | Macenko | NP | GoogleNet, VGGNet, ResNet | FCN | Acc=97% | [142] |
2020 | Image rescaling | NP | VGG16, VGG19, Xception, ResNet50 | SVM, Logistic regression | Acc=83-93% | [129] |
2020 | Macenko | Laplacian of Gaussian | AlexNet, ResNet-18, ResNet50, ResNet-101, GoogleNet | SVM | Acc=96%, Sen=97% | [174] |
2020 | Color enhancement | NP | ResNet-50, DenseNet-121, ML-InceptionV3, ML-VGG16 | E-SVM | Acc=97% | [177] |
2020 | NP | NP | ResNet50, DenseNet-161 | FCN | Acc=91% | [148] |
Discussion
Dataset
Magnification factors
Color normalization
ROI segmentation
Traditional methods vs DL
Challenges
-
The lack of standard datasets makes it difficult to evaluate and compare various methods. A standard dataset would provide various researchers a common platform facilitating appropriate comparison.
-
Creating annotations for nuclei segmentation is tedious, time consuming and challenging.
-
Segmentation of nuclei from 400x magnification is still a challenge due to overlapping and clustered nuclei. Further, segmentation of nuclei at 100x is challenging due to the small size, varying structure, and random distributions of nuclei.
-
There are no standard metrics to evaluate the performance of the color normalization methods.
-
There is scope for developing a unified algorithm for the segmentation of nuclei and classification of histopathological images at varying magnifications holistically.
-
The heterogeneous characteristics of malignant samples make it difficult to model the patterns to differentiate them from benign samples.
-
CNN based methods for histopathological image classification extracts features from the entire image and may fail to focus on the regions of interest such as nuclei, gland and mitotic cells, which contribute largely to the decision of classifying images as malignant and benign. Hence, there is scope for incorporating attention mechanism in CNN to enable the model to focus on a potential ROI.