Introduction
Radiological diagnosis of fibrosis and cirrhosis
Radiological diagnosis of HCC
Radiological findings of cirrhosis | |
---|---|
• Nodular or irregular surface | |
• Coarsened liver edge | |
• Increased echogenicity | |
• Atrophy and multinodular (typically atrophy of right lobe and hypertrophy of carduate or left lobes) - in advanced disease |
Radiological findings of portal hypertension | |
---|---|
• Increased portal vein diameter | |
• Presence of porto-systematic collateral circulation | |
• Reversal of portal vein flow | |
• Splenomegaly | |
• Ascites |
Radiological findings of HCC | |
---|---|
Focal liver lesion with hypervascularity in the late arterial phase and washout on portal venous and/or delayed phases |
Artificial intelligence, machine learning, and deep learning
Machine learning based CAD systems
Image pre-processing method | Description |
---|---|
Mean filter [40] | The mean filter replaces each pixel value in an image with the mean value of its neighbouring pixels, including itself |
Median filter [40] | The median filter replaces each pixel value in an image with the median of neighbouring pixels, including itself |
Wiener filter [41] | The Wiener filter is based on statistical properties to filter out the noise that has corrupted the original signal |
Bilateral filter [42] | It is a non-linear, edge-preserving, and noise-reducing smoothing filter. It does the spatial averaging without smoothing edges |
Gaussian filter [43] | It is linear smoothing filter where the filter (kernel) weights are chosen according to the shape of the Gaussian function |
Unsharp masking [44] | The unsharp masking technique sharpens an image by calculating the difference between orignal and its blurred version. It increases the contrast of small details in the magnified texture |
Histogram equalisation [44] | It is a technique of adjusting image intensities to enhance contrast. This is achieved by stretching out the most frequent intensities, helping low contrast regions to achieve high contrast. The histogram equalisation method helps to improve the global contrast of the image |
Adaptive histogram equalisation [44] | It is adaptive method that computes several histograms, each corresponding to a distinct region of the image, and uses them to redistribute the intensity values of the image. Adaptive histogram equalisation is suitable for improving local contrast in the image |
CLAHE [45] | Compared to histogram equalisation and adaptive histogram equalisation that are global contrast enhancement methods, the Contrast Limited Adaptive Histogram Equalisation (CLAHE) performs local contrast enhancement. This has been widely adopted in improving lower contrast in ultrasound imaging |
Method | Description |
---|---|
First Order Statistics (FOS) | Average gray level (Mean), standard deviation, variance, skewness, kurtosis, uniformity, energy, entropy |
Statistical feature matrix (SFM) | Coarseness, Contrast, Periodicity, and Roughness |
Law’s Texture Energy Measures | Law’s texture energy measures based on five coefficient vectors to represent level (L), edge (E), spot (S), ripple (R), and wave (W). In total 18 texture features can be extracted |
FPS | Radial Sum and Angular Sum of the discrete Fourier transform |
Fractal | Hurst exponent, fractal dimension |
Gray-Level Difference Statistics (GLDS) | Contrast, differential mean, difference entropy, inverse difference moment, angular second moment |
Gray-level Co-occurrence Matrix (GLCM) | Energy, Entropy, Dissimilarity, Contrast, Correlation, Homogeneity, Autocorrelation, Cluster shade, Cluster prominence, Maximum probability, Sum of Squares, Sum Average, Sum Variance, Sum Entropy, Difference Variance, Difference Entropy, Information measure of Correlation, Inverse Difference moment-Normalized |
Moment Invariant (MI) | A set of moments invariant to rotation, scaling, and translation derived from second and third normalised central moments |
Gradient based features | Mean, Variance, Kurtosis, Skewness, and percentage of pixels with non-zero gradient |
Gray-level run-length matrix (GLRLM) | Short run emphasis, Long run emphasis, Gray-lvel non-uniformity, Run-length non-uniformity, Run percentage, Low gray-level run emphasis, High gray-level run emphasis, Short run high gray-level emphasis, Long run low gray-level emphasis, Long run high gray-level emphasis |
Gabor Wavelet Transform (GWT) | Mean and standard deviation of Gabor output images obtained by using a set of Gabor wavelets at different scales and orientations |
Geometric | Centre of gravity x, Centre of gravity j, Height, Width, Area, Perimeter, Roundness, Euler number, Major axis length, Minor axis length, Orientation, Solidity, Extent, Eccentricity, Convex area, Danielsson factor, Filled area |
Frequency-domain | Discrete Cosine Transform (DCT) features, Discrete Wavelet Transform (DWT) features, Wavelet Packet Transform (WPT) features, Curvelet Transform (CT) features, Stationary Wavelet Transform (SWT) |
Phase congruency | Variance, contrast, covariance |
Gabor texture | Multiple Gabor filters having different frequencies and orientation can be used to extract specific features from an image |
Algorithm | Description |
---|---|
Principal Component Analysis (PCA) | It is statistical technique that converts high-dimensional data to low-dimensional data by selecting the most important features that capture maximum information about the dataset. The top most relevant features are selected based on the variance that can explain in the original dataset |
Pearson’s Correlation Coefficient | It measures the correlation between features to find out which features are highly correlated and which are not. Based on this analysis, the features that are redundant and do not add value to the final prediction are dropped |
Analysis of Variance (ANOVA) | The ANOVA is a statistical method that computes the differences and their variations among the given classes in the data. Based on the statistical analysis, p-value and F-value are computed, based on which significant features are selected |
Mutual Information | The mutual information (MI) quantifies the amount of information obtained from one variable through the second variable. Using higher-order statistics calculated using MI, we can select features which can maximise the MI between subset of selected features and the target variable |
Fisher score | The Fisher score selects each feature independently based on their scores under the Fisher criterion, providing a subset of most representative features |
Locality Sensitive Discriminant Analysis (LSDA) | The LSDA is a feature reduction techniques based on the analysis of studying relationship between data points. The LSDA is effective because it preserves both discriminant and local geometrical structures in the data |
Algorithm | Description |
---|---|
Naive Bayes (NB) | The Naive Bayes is a probabilistic classifier based on the Bayes’ Theorem. It predicts the class of a given sample by computing the maximum posterior probability based on the prior probability and the observed likelihood in the training set. The sample is assigned a class with the highest occurring probability |
K-Nearest Neighbour (KNN) | The K-Nearest Neighbour classifier is one of the lazy statistical learning algorithms. The training data in KNN algorithm acts as a feature space and during testing, the test sample is compared to all the training samples using a distance metric and label of the training sample having least distance is assigned to the test sample. To improve its robustness, the contributions of the K-neighbours is adopted to decide the label of the test sample |
Logistic Regression (LR) | The Logistic Regression is one of the powerful and baseline methods of supervised classification. The ordinary regression is extended to give the probability of outcome between 0 and 1. To use logistic regression as a binary classifier, a threshold is set based on which a sample is discriminated between two classes |
Decision Tree (DT) | A decision tree is a tree-based classifier where an internal node represents feature, the branch represents a decision rule, and each leaf node represents the outcome. The decision tree classifier provides the benefits of easy interpretation and efficient handling of outliers |
Support Vector Machine (SVM) | The SVM classifier aims to find the optimal hyperplane with the largest margin between positive and negative samples in the high-dimensional feature space. Kernel functions such as Gaussian and Radial Basis Function are used for non-linear mapping of the training data from input space to higher-dimension feature space. The SVM classifier is suitable for complex datasets and shows good generalisation ability on unseen test set |
Random Forest (RF) | The RF classifier is an ensemble learning method in which multiple classifiers’ predictions are voted to form the final prediction. In general, ensemble learning methods are robust and provide superior performance given pros and cons of single classifier |
Extreme Learning Machine (ELM) | The ELM is a single-layered feed-forward neural network which can be trained in a single pass, making it faster than conventional machine learning algorithms. The ELM has three layers (input, hidden, and output). The weights from input to hidden are randomly initialised and are fixed. During a single pass, the weights from hidden to output layer are learnt by the classifier |
Deep learning
Convolutional neural networks
-
Convolutional layer: The convolutional layer is the core building block of a CNN which uses the convolution operation in place of general matrix multiplication. Its parameters consist of a set of learnable filters, also known as kernels. The main task of the convolutional layer is to detect features within local regions of the input image that are common throughout the dataset and map their appearance to a feature map. The output of each convolutional layer is fed to an activation function to introduce non-linearity. There are a number of activation functions available such as Rectified Linear Unit (ReLU), Sigmoid, etc.
-
Sub-sampling (Pooling) layer: In CNNs, the sequence of convolutional layer is followed by pooling layer which reduces the spatial size of the input and thus reduce the number of parameters of the network. A pooling layer takes each feature map output from the convolutional layer and down samples it. In other words, the pooling layer summarises a region of neurons in the convolution layer. The most common pooling techniques are max pooling and average pooling. Max pooling takes the largest value from a patch of the feature map, whereas average pooling takes the average of each patch for the feature map.
-
Activation function: The activation function refers to the features of activated neurons that can be retained and mapped out by a non-linear function, which can be used to solve non-linear problems. Common activation functions include sigmoid, tanh, ReLU, and Softmax. ReLU is one of the widely used activation function as it overcomes the vanishing gradient problem in deep neural networks.
-
Batch normalisation: Batch normalisation is used to address the issues related to internal covariance shift within feature maps. Internal covariance shift is a change in the distribution of hidden units’ values, which slows down the convergence and requires careful initialisation of parameters. Batch normalisation normalises the distribution of feature maps by setting them to zero mean and unit variance. It also makes the flow of gradients smooth and acts as a form of regularisation, helping the generalisation power of the network.
-
Dropout: Dropout is a regularisation techniques heavily used in convolutional neural networks. In dropout, some units or connections are randomly dropped (skipped) with a certain probability. Due to multiple connections, a neural network co-adapts by learning non-linear relations. Dropout helps to overcome this co-adaptation by randomly dropping some of the connections or units, preventing the network from overfitting on the training data.
-
Fully connected layer: In fully connected layers, each neuron from the previous layer is connected to every neuron in the next layer and every value contributes to predicting the class of the test sample. The output of the last fully connected layer is passed through an activation function, generally softmax, which outputs the class scores. Fully connected layers are mostly used at the end of the CNN for the classification task.
Representative convolutional neural networks
CNN architecture | Description |
---|---|
AlexNet [51] | The first CNN model to win the ImageNet challenge in 2012 and brought deep learning revolution. Compared to LeNet, the AlexNet use ReLU activation function, dropout for regularisation, data augmentation during training, and splitting computation on multiple GPUs |
VGG [55] | A popular deep CNN model from University of Oxford. The VGG network popularised the idea of using small filter kernels and training the deeper network using pre-training on shallower versions. Two the popular variants of VGG network are: VGG-16 (having 16 layers) and VGG-19 (having 19 layers) |
GoogLeNet [56] | Winner of the 2014 ImageNet challenge. This model contains multiple inception modules, which provides the idea of multi-scale processing allowing modules to extract features at different levels of detail simultaneously. By stacking multiple CNN layers, model becomes quite complex, yet having less number of model parameters. One of the popular GoogLeNet network is the Inception-v3 |
ResNet [57] | Winner of the 2015 ImageNet challenge. ResNet networks contains skip connections providing information preserving capability by simply copying the activations from lower layers to higher layers. By concatenating and stacking multiple ResNet blocks, it made possible to have much deeper networks, yet having lesser model parameters. Having skip connections in addition to the standard pathway gives network the ability to preserve more information, increasing network’s ability to pick and lose information, learning residuals, and building deeper networks. Major ResNet network variants include ResNet-18, ResNet-50, ResNet-101, and ResNet-152 |
DenseNet [58] | The DenseNet model uses concatenation of the activations of previous layers to the activation of the current layer. The use of feature maps of all previous layers to the current layer helps to achieve feature reuse capability and reducing training parameters. The idea of concatenating activations from previous layers preserve global state, making DenseNets particularly well-suited for smaller datasets, especially medical imaging datasets. One of the important DenseNet model that has been applied by the medical imaging community is the DenseNet-121 model |
CNN training strategies
-
Transfer learning: Transfer learning [59] refers to the ability to share and transfer knowledge from a source task to the target task. Convolutional neural networks learn features in an hierarchical manner, whereby early layers learn generic image features such as edges and corners, whereas later layers learn features specific to the dataset. Given that it is challenging to obtain large-scale annotated datasets in the medical domain due to cost and time constraints, transfer learning helps to leverage the learning of models trained on large-scale datasets such as the ImageNet [60].
-
Data augmentation: Current state-of-the-art CNNs need large-scale annotated data to train in a supervised manner. Given the complexity of CNN models, it is easy for them to overfit on small size medical imaging datasets. Data augmentation [61] is a technique to generate synthetic data, for example by applying different affine transformations such as rotation, scaling, translation, flipping, and adding noise. Data augmentation not only increases the dataset size during training, but also adds diversity to the data, making the model robust on unseen data.
Evaluation measures
-
True Positive (TP): If a person having Cirrhosis is detected as Cirrhosis
-
True Negative (TN): If a person not having Cirrhosis is correctly detected as non having Cirrhosis
-
False Positive (FP): If a healthy person is detected positive for having Cirrhosis
-
False Negative (FN): If a person having Cirrhosis is detected as a healthy one.
-
Precision calculates the fraction of correct positive detection of Cirrhosis.
-
Recall measures how good all the positives are, which depends on the percentage of total relevant cases correctly classified by the model. It is also called sensitivity.
-
F1-measure is the harmonic mean between precision and recall.
Contributions
Study | Focus | PRISMA | Methods | Datasets | Data |
---|---|---|---|---|---|
[62] | Diffuse liver diseases | ✗ | ML | ✗ | US |
[63] | Liver fibrosis, cirrhosis, and cirrhosis-related nodules | ✗ | ✗ | ✗ | US;MR;CT |
[64] | Liver cancer | ✗ | ML | ✗ | US |
[65] | Applications of Ultrasound imaging | ✗ | ML | ✗ | US |
[66] | Applications of Ultrasound imaging | ✗ | ML;DL | ✗ | US |
[67] | Chronic liver diseases | ✓ | ML | ✗ | US;CT;MR;ES |
This Study | Liver diseases | ✓ | ML;DL | ✓ | US |
-
Sect. 7 provides search strategy in terms of selected databases, inclusion and exclusion criteria, and keywords related to search query.
-
Sect. 8 provides a systematic review of diagnosing liver diseases using ultrasound imaging.
-
Sect. 9 provides an overview of various public datasets for the diagnosis of liver diseases.
-
Sect. 10 provides current limitations and future research directions.
Search strategy
Data sources and search queries
S.No. | Inclusion Criteria |
---|---|
1 | Study must be published between January 2010 and December 2021 |
2 | Study must be peer-reviewed journal articles or conference proceedings and written in English |
3 | Study should have clinical focus on diagnosis of liver diseases using computational techniques |
4 | Study must have used ultrasound as medical imaging modality |
5 | Technical studies diagnosing multiple diseases including liver are also considered |
6 | Study should have performed automated diagnosis of liver diseases using computer applications such as computer vision, machine learning, and deep learning |
7 | Study must have evaluated the performance of the proposed system using standard evaluation metrics |
S.No. | Exclusion Criteria |
---|---|
1 | Study should not be systematic reviews, meta-analysis, and survey papers |
2 | Study should not focus on diagnosing liver diseases using other imaging modalities such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI), serum biomarkers, liver biopsy, Magnetic Resonance Imaging derived Proton Density Fat Fraction (MRI-PDFF), etc |
3 | Studies not having technical contribution such as white papers, cases studies, letters, abstracts only |
Concepts | Keywords |
---|---|
Concept 1: Keywords related to diagnosis | chronic liver disease(s) OR acute liver disease(s) OR focal liver disease(s) OR diffuse liver disease(s) OR liver lesion(s) OR hepatic disease(s) OR fibrosis OR steatosis OR fatty liver disease OR nonalcoholic steatohepatitis OR NASH OR nonalcoholic fatty liver disease OR NAFLD OR hepatocellular carcinoma OR HCC |
Concept 2: Keyword related to tasks | classification OR detection OR localization OR segmentation OR registration OR tracking OR temporal analysis OR severity scoring |
Concept 3: Keywords related to imaging modalities | ultrasound OR contrast-enhanced ultrasound OR CEUS OR computed tomography OR CT OR magnetic resonance imaging OR MRI |
Concept 4: Keywords related to computer applications | computer-aided OR computer-aided detection OR computer-aided diagnosis OR automated analysis OR artificial intelligence OR machine learning OR deep learning OR deep neural network OR convolutional neural network OR cnn OR dnn OR deep-cnn |
Search query | (Concept 1) AND (Concept 2) AND (Concept 3) AND (Concept 4) |
Article selection
S.No. | Question |
---|---|
1 | Are research objectives clearly defined? |
2 | Is research methodology well-defined? |
3 | Is the train and test data source clearly defined? |
4 | Are the data pre-processing techniques clearly defined and their selection justified? |
5 | Are the feature extraction or feature engineering techniques clearly described and justified? |
6 | Are the learning algorithms clearly described? |
7 | Does the study perform the comparison with the existing baseline models? |
8 | Is the performance of the proposed system evaluated and results properly interpreted and discussed? |
9 | Does the conclusion reflect the research findings? |
Review of studies on the diagnosis and staging of liver diseases
Study | Dataset(s) | Data Pre-processing | Feature extraction and Selection | Learning method(s) | Results | Main finding(s) |
---|---|---|---|---|---|---|
[72] | 55 patients having fibrosis stages (F1: 17; F2: 12; F3: 5; F4: 21) | ROI selection | 11 elastic parameters obtained from RTE by PCA. Blood biomarkers as additional features | Spearman’s correlation coefficient | AUC: 0.93 | RTE images along with histology results is effective in diagnosing liver fibrosis |
[73] | 279 US images | Data augmentation | Features are extracted from VGGNet fc7 layer | RF, SVM, GBDT, MLP, FCNet | Accuracy96.06% | Features extracted from pretrained CNN model with FCNet as classifier outperforms conventional ML classifiers |
[74] | 513 patients US images | Encoding diagnostic and demographic parameters | Textural features | NB, RF, KNN, SVM | Sensitivity (77.0%), and Specificity (77.3%) | ML classifiers showed superior performance for liver fibrosis staging than liver fibrosis index (LFI) method |
[75] | 144 patients suffering from Hepatitis B | Multi-parametric features; Spearman’s correlation coefficient for feature selection | DT, LR, ANN, RF, SVM | Mean AUC of 0.85 | Multi-parametric features improves staging of liver fibrosis compared to mono or dual modalities | |
[76] | 229 patients providing 10 US images each | Multi-stream feature extraction using VGG-16 | ANN | Overall accuracy: 65.6% | An indicator-guided ML framework provides better results for fibrosis diagnosis | |
[70] | 466 patients (Normal: 64; Fibrosis: 401) | Automatic ROI selection made on gray-scale and 2D SWE images | Inception-V3 model to extract 4096-dimensional embeddings from B-mode and SWE images | Softmax classifier | AUC of 0.950, 0.932, and 0.930 achieved for classifying fibrosis of stage S4, S3, and S2, respectively | Combination of gray-scale images and 2D SWE images for better results than a single modality |
[77] | 13,608 US images | Liver US images without a focal hepatic lesion were used | VGGNet | Accuracy of 4-class classification: Internal test data (83.5%) and external test data (76.4%) | CNNs achieved at par performance to that of radiologists in determining METAVIR score using US images | |
[78] | 157 subjects having fibrosis stages (F0: 44; F1: 31; F2: 35; F3: 20; F4: 27) | Speckle noise reduction, ROI selection and data augmentation | Image features extracted along Glisson’s line | MLNN and CNN | Accuracy (five class): 84.38% | Focusing on region along the Glisson’s line improves fibrosis classification |
[79] | 550 B-mode US images from 55 participants (NAFLD: 38; Normal: 17) | ROI selection and data augmentation | Features extracted from pre-trained ResNet-V2, GoogleNet, AlexNet, and ResNet-101 | SVM | Overall accuracy: 98.64%, Sensitivity: 97.20%, and Specificity: 100% | Concatenation of features from diverse CNN models is helpful in improving diagnostic performance of NAFLD classification |
[69] | 214 subjects providing US images | Multi-modal fusion network with active learning | AUC: 0.897 and accuracy: 70.59% | Fusion of multiple US modalities along with active learning demonstrated better performance | ||
[80] | 286 US images | ROI selection, data augmentation | CNN framework | Accuracy: 95.66% | Multi-scale information and local attention mechanism can provide accurate liver fibrosis classification | |
[81] | 700 US images | Contour detection and CLAHE | CNN features | Softmax, SVM | Accuracy: 98.59% | Deep features extracted by CNN are robust for fibrosis classification |
[82] | 640 US images | Textural features along with patient’s age and gender | AlexNet, VGG-16, VGG-19, GoogleNet | Accuracy: 95.29% | Combination of patient characteristics along with image features improved classification accuracy |
Study | Dataset(s) | Data Pre-processing | Feature extraction and Selection | Learning method(s) | Results | Main finding(s) |
---|---|---|---|---|---|---|
[83] | 60 US images (Normal: 20; Cirrhosis: 40) | ROI selection | Uniform LBP and GLCM for feature extraction, PCA for feature selection | SVM | Overall accuracy: 87.0% | Uniform-LBP features can better describe the cirrhotic features in US images |
[84] | 91 US images (Normal: 44; Cirrhosis: 47) | Data normalisation | Liver capsule detection using sliding window and dynamic programming based linking | CNN | Mean accuracy of 89.2%, AUC of 0.968 | Liver capsule detection based features are effective in diagnosing cirrhosis from liver US images |
[85] | 147 US images (Normal: 75; Cirrhosis: 72) | Manual ROI selection | Textural features. The correlation-based filter (CFS) for feature selection | KNN, SVM | Overall accuracy: 99.31% | Combination of Wavelet and Curvelet features provided superior performance than other feature extraction methods |
[86] | 110 US images (30 Normal and 80 Cirrhosis) | Manual ROI selection | Intensity difference technique | SVM | Overall accuracy of 98.18%, sensitivity of 98.75%, and specificity of 96.67% | Intensity difference technique provides discriminative features for cirrhosis and normal liver |
[87] | 500 US images (Normal: 200; Cirrhosis: 300) | Data augmentation | 2048-dimensional CNN embeddings from each ROI | ANN | Overall accuracy: 68.1% | CNN using transfer learning and data augmentation outperforms conventional methods |
[88] | 681 US images having B-mode and color doppler data | Automatic ROI selection | LiverTL having pre-trained VGG-16 as the backbone model | Overall AUC of 0.948 | Transfer learning applied on ROI images improves cirrhosis diagnosis | |
[89] | 69 patients providing B-mode US images | Radiologist delineated boundaries of liver surface | Pre-trained GoogLeNet | AUC score of 0.992 | Liver shape information acts as a strong indicator for cirrhosis diagnosis | |
[90] | 189 US images (70 normal, 94 steatotic, 25 cirrhotic) | ROI selection | Textural features | Accuracy: 85.3% | Feature selection and adding liver length as parameter improved classification accuracy |
Study | Dataset(s) | Data Pre-processing | Feature extraction and Selection | Learning method(s) | Results | Main finding(s) |
---|---|---|---|---|---|---|
[91] | 20 US images (Normal: 10; Fatty: 10) | Manual ROI selection | Textural features | Bayesian classifier | Overall accuracy: 95% | RF and speckle images can accurately capture textural features relevant for diagnosis of fatty liver disease |
[92] | 100 US images (Normal: 42; NAFLD: 58) | ROI selection | Textural features | DT, Fuzzy classifier | Overall accuracy: 93.3% | Combination of texture and DWT-based features improves diagnostic accuracy of the model |
[93] | US images (NAFLD: 30; Normal: 30; Heterogeneous: 19) | Manual ROI selection | Wavelet Packet Transform (WPT) | SVM | Accuracy (3-class classification): 95.4% | Multi-scale analysis using WPT provides performance is suitable in assisting experts for accurate FLD diagnosis |
[94] | 180 US images (Normal: 80; Fatty: 100) | Manual ROI selection | Textural features, LDA for feature selection | Linear model with information fusion | Overall accuracy: 95.0% | Information fusion based classification can provide superior performance for NAFLD diagnosis |
[95] | 53 US images (Normal: 12; Mild: 14; Moderate: 14; Severe fatty liver: 13) | Liver region-of-interest (LROIs) and diaphragm region-of-interest (DROI) from each US image | Textural features; Differential evolution feature selection (DEFS) algorithm | SVM | Overall classification accuracy: 84.9±3.2 | Features extracted from liver parenchyma (LROIs) along with features extracted from diagphragm ROI (DROIs) helps to improve overall classification performance |
[96] | 394 subjects providing US images, radio-frequency data | Textural, backscattering, and attenuation features | Statistical analysis | AUC of 0.73 for NAFLD and 0.81 for severe NAFLD on test set | Quantitative diagnostic index can distinguish mild and severe NAFLD from normal liver | |
[97] | 100 US images (Normal: 50; Fatty: 50) | Manual ROI selection | GIST descriptors as features. Marginal Fisher Analysis for dimensionality reduction | KNN, DT, SVM, AdaBoost, PNN | Accuracy: 98%, Sensitivity: 96%, Specificity: 100%, AUC: 0.9674 | GIST features are significant characteristics of fatty liver disease |
[98] | 100 US images (Normal: 50; Fatty: 50) | Image normalisation, CLAHE | Radon Transform (RT) and 2D-DCT coefficients as features. LSDA for dimensionality reduction | DT, KNN, PNN, SVM, AdaBoost, Fuzzy Sugeno classifier | Accuracy: 100%, Sensitivity: 100%, Specificity: 100% | Combination of RT and 2D-DCT features are good discriminators for fatty liver disease |
[99] | 63 subjects (NAFLD: 36; Normal:27) | ROI selection | Textural and Gabor directional features | SVM, ELM | Accuracy: ELM (97.75%) vs SVM (89.01%); AUC: ELM (0.97) vs SVM (0.91) | ELM based classifier provided superior results compared to SVM for characterisation and stratification of NAFLD |
[100] | 650 patients (Normal: 196; Grade-I: 173; Grade-II: 157; Grade-III: 124) | Images cropped to form 1,000 texture patches | Textural features, Gabor filter, and curvelet transform | KNN, SVM | Overall accuracy: 96.9% | The curvelet transform features in combination with SVM classifier gave highest accuracy in diagnosing various grades of fatty liver disease |
[101] | 90 US images (Normal: 45; Fatty: 50) | Manual ROI selection by radiologist | Textural and Fractal features. Mutual Information (MI) for feature selection | SVM, KNN, DT, AdaBoost | Accuracy: 95.55%, Sensitivity: 97.77% | Mutual information feature selection is an effective technique to select best features for diagnosing fatty liver disease |
[102] | 63 subjects (NAFLD: 36; Normal: 27) | ROI selection | Textural features | SVM, ELM, and CNN (Inception model) | Accuracy: SVM (82%), ELM (92%), and CNN (100%) | CNN model outperformed ML-based system for ultrasound tissue characterisation |
[103] | 55 subjects (NAFLD: 38; Normal: 17) providing 550 US images | Data cleaning, images resizing | GLCM and CNN-based features | SVM | AUC: CNN-based (0.977), HI-based (0.959), and GLCM-based (0.892) | CNN based features from B-mode US images can be used for NAFLD diagnosis |
[104] | 57 US images (Normal: 25; Fatty: 32) | Manual ROI selection | Textural features. PCA for dimensionality reduction | KNN, MLP-kernel SVM | Overall accuracy: 98.78% | Wavelet features are computationally efficient and manufacturer independent, and are suitable for NAFLD diagnosis |
[105] | 1,000 images (Normal:250; Mild:250; Moderate:250; Severe:250) | No pre-processing applied | Textural and transfer scattering coefficients (SC) features | KNN, SVM, EKNN | Overall accuracy: 98.8% | Compressed transfer SC features proved to be effective in representing the texture of fatty liver and providing good classification accuracy |
[106] | 577 subjects (Normal: 200; Fatty liver: 377) | SMOTE method to generate synthetic samples to balance classes | Each variable weighted by the information gain ranking process | RF, LR, ANN, NB | Accuracy: 86.48%, Sensitivity: 87.16%, Specificity: 85.89% | RF classifier showed superior performance compared to NB, ANN, and LR classifiers for NAFLD diagnosis |
[107] | 204 subjects (Normal: 64; Fatty: 140) | Manual ROI selection | 1-D CNN | Accuracy: 96%, Sensitivity: 97%, Specificity: 94% | DL algorithms using RF data can accurately diagnose NAFLD and hepatic fat fraction quantification | |
[108] | 240 subjects (Normal: 106; Mild: 57; Moderate: 67; Severe: 10) | Manual ROI selection | Customised CNN model | AUC: 0.933 | DL algorithms showed better diagnostic ability than gray-scale values in moderate and severe NAFLD | |
[109] | 55 subjects (NAFLD: 38; Normal: 17) providing 550 US images | Manual ROI selection, data augmentation | Local phase and radial symmetry features | Multi-scale CNN | Overall accuracy: 97.8% | Local phase-based image enhancement and feature representation helps to improve NAFLD diagnosis |
[79] | 55 subjects (NAFLD: 38; Normal: 17) providing 550 US images | ROI selection, data augmentation | Features extracted from ResNet-V2, GoogleNet, AlexNet, and ResNet-101 | SVM | Overall accuracy: 98.64%, Sensitivity: 97.20%, and Specificity: 100% | Concatenation of features from diverse CNN models is helpful in improving diagnostic performance of NAFLD classification |
[110] | 2,070 patients providing 21,885 US images | Contrast enhancement and noise removal using CLAHE and Gaussian filter | Various CNN models | AUC scores for mild steatosis vs others: 0.974 | DL can predict early stage steatosis level with good performance | |
[111] | 90 subjects | Images resizing, data augmentation and data normalisation | VGG-19 | Overall accuracy: 80.1%, precision: 86.2%, and specificity: 80.5% | Multi-view US images and DL can effectively classify fatty liver disease and measure fat fraction values | |
[112] | 300 US images (155 normal and 145 fatty cases) | Multiple ROI selection | Textural features | Genetic Algorithm | Accuracy: 95.71%, F1-score: 95.64% | Ensemble algorithms can improve NAFLD classification accuracy |
[113] | 1,119 US images from 106 patients | Customised ML model in AutoML | Precision: 88.98%, Recall: 88.24% | AutoML tool has potential for aiding a physician in diagnosing NAFLD on US images | ||
[114] | 235 participants providing 245 US images | Manual ROI selection | Textural features | XGBoost, Random Forest, SVM | AUC:0.8 | ML model provides at par performance to CAP model |
Study | Dataset(s) | Data Pre-processing | Feature extraction and Selection | Learning method(s) | Results | Main finding(s) |
---|---|---|---|---|---|---|
[115] | 680 US images (Normal: 200; Fatty: 160; Cirrhosis: 160; Liver cancer: 160) | Two ROIs of 64 x 64 pixels selected | Mean, Variance, Skewness, and kurtosis | ANN | Overall accuracy: 96.125% | Automatic liver tissue can be characterised by liver US images for the diagnosis of liver diseases |
[71] | 97 US images (Normal: 30; Steatosis: 4; Chronic hepatitis without cirrhosis: 9; Compensated cirrhosis: 35; Decompensated cirrhosis: 35; and HCC: 6) | ROI of 128 x 128 pixels along the medial axis | Acoustic attenuation coefficients, FOS, GLCM | DT, SVM, KNN | Overall accuracy: 73.20% using SVM | Non-invasive methods provide reliable information about the staging of chronic liver diseases |
[116] | 150 US images (Normal: 50; NAFLD: 50; Cirrhosis: 50) | CCA to generate bounding box around liver region, Image cropping, CLAHE for contrast enhancement | Curvelet Transform for feature extraction, LSDA for feature reduction | DT, SVM, KNN, LDA, QDA, NB | Accuracy: 97.33%, Specificity: 100.0%, Sensitivity: 96.0% | Liver disease index (LDI), one of the factor developed using LSDA coefficients an be used to differentiate fibrosis and cirrhosis disease |
[117] | 129 US images (Normal: 29; Steatosis: 47; Fibrosis: 42; Cirrhosis: 12) | Coordinates conversion and automatic ROI selection | Textural features both in spatial domain and transform coefficients (FPS, DWT, WPT) | KNN, ANN, SVM | Accuracy: 94.91%; Sensitivity (normal): 100; Sensitivity (steatosis): 100; Sensitivity (fibrosis): 87.5; Sensitivity (cirrhosis): 100 | Textural features in combination with hierarchical classification can diagnose liver diseases, avoiding invasive method of liver biopsy |
[118] | 79 CLD cases (Liver cancer: 44 and liver abscess: 35) | Manual ROI selection | Textural features, Sequential forward selection and sequential backward selection for feature selection | SVM | Overall highest classification accuracy of 89.25% | Features selected by sequential forward selection gave highest classification performance |
[119] | 279 US images (Normal: 95; Steatosis: 105; Cirrhosis: 79) | Multiple ROIs selected from each US image by an expert | Correlation, homogeneity, entropy, variance, energy, contrast, standard deviation, and run percentage. Fisher discriminant for feature selection | Majority Voting | Accuracy scores: Normal/Steatosis (95%), Normal/Cirrhosis (95.74%), Steatosis/Cirrhosis (94.23%) | A combination of different feature extraction methods along with voting classifier is superior in diagnosis diffuse liver diseases |
[120] | 216 US images (Normal: 72; Hepatitis: 72; Cirrhosis: 72) | ROI selection, data augmentation | AlexNet, ResNet | Overall accuracy: 86.4% | Fine-tuning pre-trained CNN models improves diagnostic performance | |
[121] | 264 US images (Normal: 128; Diffuse liver disease:: 136) | ROI selection made by radiologist | Textural features | Self-organisation feature maps (SOFM) | Sensitivity: 98.6%, Specificity: 98.2% | Textural features are good discriminators for diagnosing liver pathology |
Study | Dataset(s) | Data Pre-processing | Feature extraction and Selection | Learning method(s) | Results | Main finding(s) |
---|---|---|---|---|---|---|
[122] | 450 US images (Liver cancer: 50; Hepatocellular adenoma: 150; HEM: 35; Focal nodular hyperplasia:145; Lipomas: 70) | Manual ROI selection by sonographer | Texture features including enery, contrast, correlation, entropy, and homogeneity | SVM, Fuzzy-SVM | AUC (5-class classification): 0.971±0.012 | Combination of GLCM textural features with Fuzzy-SVM classifier provides superior results |
[123] | 111 B-mode US images (Normal: 16; Cyst: 17; HCC: 15; HEM: 18, MET: 45) | Speckle noise removal and an ROI selection of 25 x 25 pixels | FOS, GLDM, GLRLM, Law’s TEM, and GWT | MLP | Overall accuracy (5-class classification): 86.4% | Two-step neural network classifier training showed superior performance in classifying focal liver lesions from US images |
Virmani et al. (2013) | 108 US images (Normal: 21; Cyst: 12; HEM: 15; HCC: 28; MET: 32) | Manual ROI selection by radiologist | Texture and Gabor features, PCA for dimensionality reduction | SVM | Overall accuracy: 87.2% | Texture features in combination with PCA and SVM classifier gave better results in diagnosing liver lesions |
[124] | 51 US images (HCC: 27; MET: 24) | Manual ROI selection by an experienced radiologist of size 32 x 32 pixels | GLCM, GLRLM, FPS, and Law’s TEM. GA-SVM for feature selection | SVM | Overall classification accuracy of 91.6% with sensitivity of 90% and 93.3% for HCC and MET cases | ML-based CAD systems can assist radiologists in diagnosing liver malignancies and facilitating better disease management |
[125] | 150 US images (Cyst: 50; HEM: 50; Malignancies: 50) | 7 ROIs representing echo, morphology, edge, echogenicity, and posterior echo enhancement | GLCM, FOS, algebraic moment invariant (AMI), auto-correlation (AC), Laws’ TEM, and Gabor Wavelet features for each ROI | SVM | Accuracy for Cyst vs. HEM: 93.77%, Cysts vs. Malignancies: 92.13%, HEM vs. Malignancies: 69.33% | Multiple ROIs representing varied characteristics of liver US can provide enhanced and stable classification performance compared to single ROI for each US image |
[124] | 51 US images (HCC: 27; MET: 24) | Manual ROI selection by an experienced radiologist of size 32 x 32 pixels | GLCM, GLRLM, FPS, and Law’s TEM. GA-SVM for feature selection | SVM | Overall accuracy of 91.6% with sensitivity of 90% and 93.3% for HCC and MET | ML-based CAD systems can assist radiologists in diagnosing liver malignancies and facilitating better disease management |
[126] | 56 B-mode US images (Normal: 15; Cirrhotic: 16; HCC: 25) | ROI selection by an experienced radiologists of size 32 x 32 pixels | DWT, SWT, and WPT features. GA-SVM for feature selection | SVM | Overall accuracy: 88.8%; Sensitivity: 90.0% for normal and cirrhotic liver and 86.6% for HCC | Non-invasive imaging methodologies can be used for diagnosing liver diseases, in turn avoiding liver biopsies |
[127] | 108 B-mode liver US images | Two ROIs, inside ROIs and surrounding ROIs by an experienced radiologist | GLCM, GLRLM, FPS, Laws’ TEM, and Gabor features. PCA for feature selection | PCA-NN | Overall 5-class classification accuracy: 95.0% | Incorporating texture ratio features along with texture features computed from surrounding region of the lesion improve diagnosis of FLLs |
[128] | 60 US images (Normal: 30; Fatty: 10; Cirrhosis: 10; Hepatomegaly: 10) | ROI selection using active snake contour model | Intensity histogram, GLCM, GLRLM, MI, and mixed features | ANN | Overall accuracy: 95% | GLRLM features shows better results for focal liver lesion classification |
[129] | 26 CEUS videos (HCC: 6; HEM: 10; Abscesses: 4; MET: 3; Localised fat sparings: 3) | Salient frames from each video are selected, image correction techniques applied | Time intensity curves (TIC) features extracted by sparse non-negative matrix factorizations | LDA, KNN, SVM, BPN, Deep Belief Networks (DBN) | Accuracy: 86.36%, Sensitivity: 83.33%, Specificity: 87.50% | Deep Belief Networks trained using TIC features outperforms conventional ML algorithms for focal liver lesions classification |
[130] | 94 US images (Normal, Cyst, HEM, HCC) | Noise removal using bilateral filtering. Automatic ROI selection | A total of 94 features containing 6 histogram features and 88 GLCM based texture features | KNN, Multi-SVM | Overall accuracy: 96.11%, sensitivity: 97.08%, Specificity: 91.83% | Multi-SVM gave superior results compared to KNN classifier for the staging of focal liver lesions |
[131] | 52 CEUS video sequences (FNH: 13; HEM: 17; HCC: 16; MET: 6) | Salient frames selected from each video. Liver lesion ROI selection by a radiologist | Time intensity curve (TIC) and morphology features | SVM | Overall accuracy: 90.3%, sensitivity: 93.1%, and specificity: 86.9% | The proposed pipeline accurately detect and classify focal liver lesions from CEUS images |
[132] | 99 US images (Cyst: 29; HEM: 37; Malignancies: 33) | Manual ROI selection by an experienced radiologist | FOS, GLCM, Law’s TEM, and echogenicity. PCA for dimensionality reduction | ANN | Accuracy (Cyst vs. HEM): 99.7%, Cyst vs. Malignant: 98.72%, and HEM vs. Malignant: 96.13% | ANN showed superior performance compared to other ML algorithms such as SVM in diagnosing focal liver lesions |
[133] | 88 patients providing 111 US images (95 FLLs and 16 normal) | Manual ROI selection and image enhancement | Textural features | SVM | Overall one-against-one gives accuracy of 93.1% | One-against-one approach using multi-class SVM provides better results compared to tree structured approach |
[134] | 110 US images (Cyst: 44; HEM: 18; HCC: 32; Normal: 16) | ROI selection using level set method and Fuzzy c-means clustering | Stacked Sparse Auto-encoders (SSAE) | NB, KNN, Multi-SVM, Softmax classifier | Overall accuracy: 97.20%, Sensitivity: 98%, Specificity: 95.70% | SSAE features can capture high-level feature representations for diagnosis of focal liver lesions |
[135] | 364 US images (Abscess: 48; Cirrhosis: 40; Cyst: 30; Echinococcosis: 40; Fatty: 34; HEML 34; Hepatitis: 27; Hepatomegaly: 38; MET: 42) | Active contour segmentation (semi-automatic method) for ROI selection | Textural features in spatial and frequency domain | RF | Overall accuracy for 10-class classification: 91% | Wavelet filtering on US images helps to overcome brightness and contrast variations, in turn helps to improve overall classification performance |
[136] | 140 US images (Normal: 78; Benign: 26; Malignant: 36) | CHAHE for noise removal and contrast enhancement | Bi-dimensional empirical mode decomposition (BEMD) based features, ANOVA for feature selection | SVM, LDA, KNN, RF | Accuracy: 92.95%, Sensitivity: 90.80%, and Specificity: 97.44% | Proposed model can accurately diagnose FLLs without manually selecting region of interest in liver images |
[137] | 177 patients with FLLs | Manual ROI selection of lesion areas | Sparse representation features and iterative approach for feature selection | SVM | AUC of 0.94 for classifying benign vs malignant FLLs | Multi-modal US images improves FLL diagnosis |
[138] | 93 patients providing 47 FLL cases and 46 benign cases | Manual ROI selection across three phases: arterial phase, portal venous phase, and delayed phase | Textural features extracted from each ROI. Deep Canonical Correlation Analysis (DCCA) for feature selection | Multi-kernel learning | Accuracy: 90.41%±5.80 | Deep Canonical Correlation Analysis helps to learn better feature representation and explore the correlated information between various views of data, in turn providing superior performance |
[139] | 4,420 CEUS videos (HCC: 2,110; FNH: 2,310) | Manual ROI selection by an experienced radiologist | 3D CNN | Overall accuracy: 93.1%, Sensitivity: 94.5%, Specificity: 93.6% | Extending 2D CNN models to 3D CEUS videos can further improve diagnostic performance of FLLs | |
[140] | 367 US images (Homogeneous: 258; Angioma: 17; MET: 48; HCC: 6; Cyst: 30; HNF: 8) | ResNet-50 and DenseNet-121 models for feature extraction | FCNet | Mean AUC score for 5 lesion types: 0.916 | Supervised attention helps the model to focus its attention for prediction as well as interpretability of results | |
[141] | 2,143 patients providing 24,343 US images | Manual ROI selection, image resizing | ResNet-18 | AUC for FLLs: 0.924, Sensitivity: 86.5%, Specificity: 85.5% | Proposed model provided superior results to that of skilled radiologists | |
[142] | 15,296 US images (10,687: normal and 4,609: FLL) | ROI selection, image resizing, noise filtering | GWT, LBP and CNN features | SVM | Overall accuracy of 98.40% | Fusing traditional features with CNN features improves FLL diagnosis performance |
[143] | 20,432 US images containomg HCC, cysts, HEM, and focal fatty sparing | Fan-shaped ROI selection, data augmentation | RetinaNet with ResNet-50 as a backbone | Overall detection rate of 87.0%, sensitivity of 83.9%, and specificity of 97.1% | CNNs has shown good performance in the detection and diagnosis of FLLs in US images | |
[144] | 574 patients providing CEUS images | Image resizing, data augmentation | ResNet-152 | Overall AUC of 0.934 and accuracy of 91.0% | DL applied to multi-phase CEUS image overcomes interobserver subjectivity | |
[145] | 91 patients providing CEUS videos for FLLs | ROI selection | CNN | Overall accuracy: 88% | DNN showed superior performance for diagnosing FLLs | |
[146] | 3,873 patients (Cyst: 1,214; HEM: 1,220; MET: 1,001; HCC: 874) | Semi-automatic segmentation to delineate lesion areas | Customised CNN model | Overall accuracy for joint learning: 82.2%, accuracy for classification only system: 79.8% | Joint classification and segmentation system gave better performance compared to segmentation only and classification only systems | |
[147] | CEUS videos from 145 participants during arterial phase | ROI selection | CNN features | 3D-CNN, CNN-LSTM | Accuracy: 98% | Learning the change of enhancement patterns for CEUS videos is an effective approach to classify HCC and FNH |
[148] | 87 B-mode and CEUS images (13 benign and 74 malignant) | ROI selection | Textural and spatiotemporal features | ML classifiers | Balanced accuracy: 84% | Combination of spatiotemporal and texture features are important to aid accurate FLL classification |
Study | Dataset(s) | Data Pre-processing | Feature extraction and Selection | Learning method(s) | Results | Main finding(s) |
---|---|---|---|---|---|---|
[149] | A prospective study involving 442 patients with Child A or B cirrhosis | Patient demographics, clinical data, and laboratory values | Multivariate Cox regression model, Random Forest | The regression model has a c-statistic of 0.61 (95% CI 0.56\(-\)0.67) whereas ML model (RF) has a c-statistic of 0.64 (95% CI 0.60\(-\)0.69) | Machine Learning algorithms are superior in accurately identifying patients at high-risk of developing HCC | |
[150] | 268 HCC patients from a centre in Romania | CNN | Overall AUC of 0.935, accuracy of 91%, sensitivity of 0.944 and specificity of 0.884 | DL outperforms classical ML methods for HCC prediction | ||
[151] | 434 Hepatitis B patients from a centre in China | CNN | AUC of 0.900 on the test set | DL based approach can accurately predict 5-year HCC development risk | ||
[152] | CEUS images of 318 patients | CNN | AUC of 0.84 | Proposed model based on dynamic CEUS radiomics performed well in predicting early HCC recurrence | ||
[153] | B-mode and CEUS images of 48 patients affected by HCC | Automatic ROI selection | CNN | AUC of 0.982, accuracy of 98.25%, sensitivity of 98.16%, and specificity of 98.24% | The fusion of B-mode and CEUS images at decision level improves the HCC diagnosis performance | |
[154] | US images of 60 patients with 61 other malignancies and 112 patients with 120 HCC | Clinical features were also given | CNN | Sensitivity of 78.6% and specificity of 82.6% | US combined with clinical features is valuable in differentiating HCC from OM in the setting of cirrhosis | |
[155] | 1,241 CEUS videos (667 HCC, 574 non-HCC) | ROI selection, data augmentation | Time-intensity curve features | ResNet-50 | Accuracy: 83%, AUC: 0.89 | Integrating features from different perfusion stages is significant for HCC prediction |
[156] | 200 US images | Manural ROI selection, data augmentation | Textural features | ML classifiers, CNN | Accuracy: 98.9%, AUC: 0.99 | Fusion of CNN and ML classifiers lead to improved results |
Public datasets and online initiatives for the diagnosis of liver diseases
-
B-mode fatty liver ultrasound: [103] released a B-mode US dataset for the diagnosis of NAFLD steatosis assessment using ultrasound images. It contains 550 B-mode ultrasound scans and the corresponding liver biopsy results. The dataset was collected from 55 subjects admitted for bariatric surgery in the Department of Internal Medicine, Hypertension and Vascular Diseases, Medical University of Warsaw, Poland.
-
SYSU-CEUS: The SYSU-CEUS dataset [157] contains 353 CEUS videos of three types of focal liver lesions, namely, 186 instances of Hepatocelluar carcinoma (HCC), 109 instances of Hemangioma (HEM), and 58 instances of Focal Nodular Hyperplasia (FNH). Datasets specific to liver tumours have also been made available through participation in online challenges.
-
LiTS: The Liver Tumor Segmentation Challenge(LiTS) [158] dataset provides 201 contrast-enhanced 3D abdominal CT scans and segmentation labels for liver and tumour regions. Each slice of the volume has a resolution of 512 x 512 pixels. Out of 231 volumes, 131 carry their respective annotations whereas no ground-truth labels are provided for the test set containing 70 volumes. The in-plane resolution ranges from 0.60 mm to 0.98 mm, and the slice spacing from 0.45 mm to 5.0 mm.
-
SLIVER07: The Segmentation of the Liver 2007 (SLIVER07) [159] dataset is a part of the grand-challenge organised in conjunction with the MICCAI 2007 for liver tumor segmentation. The training data consists of 10 tumors from 4 patients with their ground-truth segmentations. For the test set, consisting of 10 tumors from 6 patients, the ground-truth was not made available to public by the task organisers. The dataset contains liver tumor CT images corresponding to portal phase of a standard four-phase contrast enhanced imaging protocol.
-
3D-IRCADb: The 3D Image Reconstruction for Comparison of Algorithm Database (3D-IRCADb) 1 consists of 3D CT scans of 10 men and 10 women having liver tumors in 15 of the cases. The anonymised patient image, labelled image corresponding to ROI segmented, and mask images are given in the DICOM format. The in-plane resolution ranges from 0.57 mm to 0.87 mm, and the slice spacing from 1.6 mm to 4.0 mm.
-
CHAOS: The Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge (CHAOS) [160] is an IEEE ISBI 2019 challenge dataset focused on segmentation of healthy abdominal organs from CT and/or MRI. The CHAOS dataset contains abdominal CT of 40 subjects having healthy liver. Each slice has a resolution of 512 x 512 pixels.
-
Multi-organ abdominal CT reference standard segmentation: The Multi-organ Abdominal CT Reference Standard Segmentation dataset [161] comprises 90 abdominal CT images delineating multiple organs such as spleen, left-kidney, gallbladder, esophagus, liver, stomach, pancreas, and duodenum. The abdominal CT images and some of the reference segmentations are from two datasets: The Cancer Image Archieve (TCIA) Pancreas-CT dataset [162] and the Beyond the Cranial Vault (BTCV) abdominal dataset [163]. The segmentation of various organs across these CT volumes was performed by two experienced undergraduate students and verified by a radiologist on a volumetric basis.
-
DeepLesion: The DeepLesion [164] dataset, released by the National Institute of Health (NIH) consists of more than 32,000 annotated lesions identified on CT images, collected from 4,400 unique patients. Each of the 2D CT scans is annotated with lesion type, bounding box, and metadata. Each images has a resolution of 512 x 512 pixels.
-
MIDAS: The MIDAS liver tumor dataset from the National Library of Medicine (NLM)’s Imaging Methods Assessment and Reporting project provides 4 liver tumors from 4 patients with five expert hand segmentations. The dataset was made available by Dr. Kevin Cleary at the Imaging Science and Information Systems, Georgetown University Medical Center.
-
CLUST: The Challenge on Liver Ultrasound Track (CLUST) [165] provides a dataset for automatic tracking of liver in ultrasound volumes. The dataset consists of 86 independent studies, with 64 (2D + t) and 22 (3D + t) studies. The dataset was split into training (40% of all sequences) and testing set (60%) from the complete dataset. Annotations were provided for the training set but no ground-truth provided for the test set.
Limitations and future directions
Limitations
-
Focus on classification Most of the studies focused on the classification task, i.e., binary classification such as: Normal vs. Fatty, Normal vs. Fibrosis, Normal vs. Cirrhosis or multi-class classification such as: Normal vs. Fibrosis vs. Cirrhosis, Normal vs. Hepatocellular Carcinoma vs. Metastasis vs Hemangiomas. However, very little work has been done on disease progression and severity scoring for diseases.
-
Small in-house datasets Although there has been a lot of work on diagnosing liver diseases using various imaging modalities such as ultrasound, CT, and MRI, there are only a few publicly available datasets. Most studies in our literature review worked on in-house data which are often of small size. Also, these datasets suffer from a class-imbalance problem. As accuracy score is not a reliable metric on imbalanced datasets, the lack of publicly available benchmark datasets often limits the true assessment of proposed algorithms in these studies.
-
Classical CAD system still prevalent Currently, deep learning has shown tremendous performance improvement across various fields such as computer vision, natural language processing, robotics, and biomedical image processing. Specifically in biomedical image computing, deep learning has shown superior results on various tasks such as classification, segmentation, and tracking. Due to the lack of publicly available large-scale annotated datasets on liver US studies, the classical machine learning pipeline is still prominent in the community.
Future research direction
-
Need for multidisciplinary approach: The management of HCC encompasses multiple disciplines including hepatologists, diagnostic radiologists, pathologists, transplant surgeons, surgical oncologists, interventional radiologists, nurses, and palliative care professionals [166]. A study by [167] showed that the development of true multidisciplinary clinic with a dedicated tumour board review for HCC patients increased survival; due to improved staging and diagnostic accuracy, efficient treatment times and increased adherence to clinical diagnostic and therapeutic guidelines. Therefore, the AASLD recommends referring HCC patients to a centre with multidisciplinary clinic.
-
Make use of multi-modal data: Current state-of-the-art deep learning models when trained on multi-modal data such as B-mode images, Doppler images, contrast-enhanced ultrasound images, and SWE images could improve the early staging and diagnosis of HCC. Multi-modal data can provide complementary information, in turn helping models to improve.
-
Need for benchmark datasets: In order to push the community’s effort in improving diagnostic performance by proposing novel methods, there is a need to establish a benchmark environment with the release of a large-scale annotated dataset in the public domain. Similar to benchmark algorithms during challenges, the task organisers can release annotated training and validation data but not release test data labels. Once participants have fine-tuned their methods, they may submit predictions on the test set on the challenge evaluation server.