Introduction
The early detection and accurate staging of liver significant fibrosis are crucial for antiviral therapy. Shear wave elastography (SWE), an elasticity-based US technique, has shown good accuracy in detecting fibrosis [
1]. However, the applicability of SWE is substantially limited in cases of obesity, ascites or necroinflammatory activity (up to 15.8%) [
2,
3]. Thus, assessment with a single imaging modality only provides limited information and could always be affected by steatosis and necroinflammatory activity [
4,
5].
Radiomics, a term that includes the suffix “-omics,” generates high-throughput data from medical images [
6,
7], which contain information on prognosis, response to treatment and monitoring of disease status [
8,
9]. As one important modality of medical imaging, US can provide not only morphological information but also stiffness and perfusion assessments, which may not be acquired using other imaging methods [
5,
10‐
12]. We have applied the “-omics” concept to computing quantitative US imaging, a field referred to as “ultrasomics.” In our opinion, big imaging data of liver fibrosis, in terms of heterogeneity, tissue texture, stiffness and vascularity perfusion, should be taken into consideration when analyzing fibrosis staging.
In addition to multimodality data, machine learning is another powerful tool to improve clinical decision-making [
13]. Currently, newer advances in data analysis contributed by the field of machine learning have greatly extended researchers’ ability to make meaningful discoveries. Machine learning enables accurate and reliable prediction using data with very large numbers of variables and small sample sizes. Therefore, the optimal machine-learning model for ultrasomics studies with small sample sizes should be determined. To our knowledge, comparative studies on the effectiveness of machine learning-based decision support systems are lacking [
14].
In this study, we present the concept of multiparametric ultrasomics, which is a machine learning-based clinical decision support system that uses US imaging big data. We extracted a set of ultrasomic features that captures the morphology and hemodynamic changes associated with liver fibrosis to (1) develop a robust, noninvasive technique to predict the liver fibrosis stage using routine US data that can be easily obtained in the clinical setting and (2) investigate the optimal machine-learning model in a small sample size study.
Discussion
In the current study, we propose the use of multiparametric ultrasomics as a decision support tool for liver fibrosis staging. In addition to conventional radiomics features from digital images, we acquired RF signal and dynamic perfusion information to construct ultrasomics, which are unique but convenient to acquired ultrasound parameters [
10,
11,
20]. These mineable data for the evaluation of fibrosis staging were tested and compared with different machine-learning algorithms. Multiparametric ultrasomics using AdaBoost, RF and SVM provided the highest performance in this study with a small sample size.
In the construction of ultrasomics, we used unsupervised machine learning to explore the data characteristics of the parameters. A higher correlation between ORF and conventional radiomics features was found. An ORF signal is post-beam-formed data from a transducer, and it can provide intact information without signal processing [
22]. Radiomics parameters are conventional features that are mathematically extracted quantitative descriptors based on digitally transformed images [
23]. Although the digital images were signals post-processed with a digital scan converter, both signals simulated the morphological homogeneity of liver tissue. Notably, the ORF data, which included original and superior information, performed better in the assessment of fibrosis and steatosis. Some papers have reported results on the tissue characterization of hepatic fibrosis or steatosis via quantitative ultrasound examination using statistical data on B-mode ultrasound and radiofrequency echo signals [
24‐
26]. However, dynamic perfusion parameters, showing a lower correlation with both morphological parameters, demonstrated the highest diagnostic value for activity stages, which was correlated with liver microcirculation [
27]. Therefore, our results suggested that these three signals could be divided into two categories: morphology and hemodynamics, which reflected the fibrosis and steatosis stages and the activity stages, respectively.
For liver fibrosis staging with ultrasomics, although the optimized machine-learning algorithms had been selected, a clinical model that used single-modality parameters still provided unsatisfactory AUC values for staging (AUC < 0.8). In addition, the models that used duplicate morphological parameters (higher correlated features of ORF and conventional radiomics) displayed the lowest AUCs in the validation groups, which may be due to the redundant information between two morphological parameters. Moreover, the models using combined morphology and hemodynamic features demonstrated better performance. For the evaluation of fibrosis stage, the accompanying activity of liver tissue should not be ignored [
28,
29]. Our results also agree with the principle that models constructed with two categories should achieve higher AUCs despite the use of a machine-learning algorithm. This finding shows that the use of multiparametric ultrasomics from different pathophysiological procedures would enhance the performance of our clinical decision support system [
30].
With these big data, machine learning is driving great changes in medical disciplines that are based on pattern recognition (e.g., radiology and pathology) [
31,
32]. Generally, machine learning with a larger sample size produces more accurate classification [
32‐
34]. However, the ultrasound images that qualified for the computing analysis in the present work were restricted because of their dependence on the operator and software, and our study included only 144 cases as a result. Therefore, determining the optimal machine-learning algorithm for a small sample size is of great value. A few recent studies have investigated the effects of different machine-learning classification methods on single-modality radiomic-based clinical predictions [
35,
36]. Our study showed that the three machine-learning methods of AdaBoost, RF and SVM performed better with any category of parameters. The principle of SVM is to map the input parameters into a high-dimensional feature space via preselected nonlinear mapping [
33,
37]. In this space, an optimal classification hyperplane is constructed and is optimized to maximize the classification of the two categories. The use of a margin among the hyperplane and two categories reduces the size and distribution requirements of the data. RF combines predictions from several weak classifiers to generate a more accurate and stable prediction. Random samples and features guarantee the robustness to noise in the data with few tuning parameters and a small sample size [
32,
38,
39]. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of instances misclassified by previous classifiers [
13,
36]. Thus, AdaBoost is sensitive to noisy data and outliers; this sensitivity makes AdaBoost less susceptible to the overfitting problem.
The first and most important limitation of this study was that multiparametric ultrasomics could not include parameters of liver stiffness. Many single-center studies have already shown that shear wave elastography achieved a good AUC of 0.89 in staging F2 fibrosis patients [
17]. However, considering patient compliance, only one US machine was used to acquire the ultrasound data in this prospective study. Additionally, the primary purpose of this study was to optimize the model of fibrosis staging based on multiparametric ultrasomics using machine learning. Our present work has demonstrated that (1) data exploration correlating with pathophysiology and (2) model construction using machine learning could improve the robustness of fibrosis staging models. However, the reported performance of shear wave elastography in staging significant fibrosis (AUC = 0.69–0.92, sensitivity = 0.77–0.90 and specificity = 0.70–0.87) was similar to that of our multiparametric model. This finding reminds us that, although the data mining process could enhance the performance of clinical decision systems, the optimal performance of the model has already been determined by the data. In a subsequent study, we will collect shear wave elastography data to produce a better model.
Our study also had other limitations. Second, our study only included a cohort of patients in our hospital. It is necessary to establish an independent validation cohort to test the generalizability of our ultrasomics model. A third limitation is that several patients enrolled in the study had focal liver lesions. This may also affect the association between US parameters and pathology. To eliminate the potential effect of tumors on the adjacent liver parenchyma, we created strict exclusion criteria. Fourth, the population of the study was Chinese patients with chronic HBV, which led to a low BMI of 20.2 kg/m2 and a low proportion of liver steatosis. Fifth, due to the requirement in image acquisition, only a short time (15–20-s clips) was covered in the CEMF images. This may not show the whole perfusion procedure in the liver. However, we attempted to analyze the blood flow arrival time to the liver and kidney based on a time-intensity curve, and the wash-in curve contains more information for the arrival time. Sixth, the analysis of the liver (conventional radiomics and ORF) was done from a small 2-cm ROI placed in segment 6, which has the same limitation as a biopsy of not reflecting the potential heterogeneity of the liver disease.
In summary, we have demonstrated that expert knowledge on data acquisition and analysis can optimize the robustness of clinical decision support systems. Additionally, the three machine-learning methods of AdaBoost, RF and SVM are optimal algorithms for studies with a small sample size. The application of this framework in future studies will facilitate data mining in the era of ultrasomics.