Aims and introduction
Nonalcoholic fatty liver disease (NAFLD) is common worldwide. The estimated global prevalence of NAFLD was 25.2% in 2018 [
1], and it was reported that 30% of NAFLD patients go on to develop nonalcoholic steatohepatitis (NASH) [
2]. NAFLD is asymptomatic and the fact that few treatment options are available is the biggest concern. Identifying the intermediate/high risk group of NAFLD, and intervening in associated risk factors to prevent cirrhosis, is crucial. The clinical care pathway recommends blood tests followed by elastography, to detect fibrosis in NAFLD patients in the early stages [
3]. Therefore, elastographic imaging techniques are becoming pivotal tools in the non-invasive quantitative assessment of fibrosis.
Transient elastography (TE) and shear wave elastography (SWE) is used in the clinical setting to detect liver fibrosis with ease, compared to liver biopsy (LB). TE is recommended in some guidelines [
4,
5]. However, point SWE (pSWE) and 2-dimensional-SWE (2D-SWE) were marketed later than TE and so far, only 2 primary studies [
6,
7] directly compared diagnostic performance of all three (TE, pSWE and 2D-SWE) in hospital setting. As such, evidence synthesis using meta-analysis (MA) alone would be inadequate for clinicians to understand differences in the diagnostic performance of all of these non-invasive methods. Considering the American Association for the Study of Liver Diseases (AASLD) guidance [
8], elastographic imaging techniques are keys to detecting ‘at risk’ NASH patients in hospitals after finding patients with suspected NAFLD at a primary/non hepatology care. If SWE can be used as an alternative to TE, it could improve timely patient access to assessment for liver fibrosis.
Network meta-analysis (NMA) enables comparisons, with some assumptions, that were not made in previous studies as indirect evidence, as opposed to direct evidence that contrasts interventions in 1 study. This method has been used to compare diagnostic accuracy of urinary biological tests to diagnose non-invasive bladder cancer [
9], biomarkers to detect pancreatic cancer [
10], and imaging methods to assess ischemic stroke [
11]. However, having conducted a search of the database, it appears that no study has been reported for ultrasonographic elastography in NAFLD patients.
In this study, the aim was to clarify whether the diagnostic accuracy of SWE particularly for significant liver fibrosis was similar to TE in adult NAFLD by quantifying differences using NMA in the hospital setting.
Methods
This study was reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA)-the diagnostic test accuracy statement (Supplementary information [SI]1) [
12], and the protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO) (CRD42022327249).
Study selection and bias assessment
A systematic review was conducted from electronic bibliographic databases including MEDLINE, The Cochrane Library, and Web of Science. Studies published from January 2010 to May 2022 were included. Medical subject headings with combinations for the literature search were used as follows: defined diseases (nonalcoholic fatty liver disease, NAFLD, nonalcoholic steatohepatitis and NASH), and elastographic methods (elastography, transient elastography, TE, magnetic resonance elastography (MRE), MR elastography, MRE, shear wave elastography, SWE, acoustic radiation force impulse imaging and ARFI) (SI.2). Two reviewers (RY and TO) applied the eligibility criteria and selected studies independently using the PRISMA flow diagram [
13]. When decisions differed, discussions were held on whether the studies should be included or not until an agreement was reached. When selected papers included patients from the same clinics and hospitals, papers were chosen to include the most studies possible conducted in a single country.
The 2 reviewers independently assessed and determined risk of bias in each study in the same way using Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) (
http://www.bris.ac.uk/quadas/). Each question was categorized as yes, no, or unclear following discussion.
Inclusion and exclusion criteria
As the purpose of our study, we extracted studies which included all of the following criteria: (a) adults with NAFLD diagnosed by LB as a gold standard; (b) use of elastographic imaging; (c) primary papers which conducted cohort or case–control studies; (d) duration of the diagnosis between elastographic imaging and LB no longer than 3 months considering a lifestyle change which might affect the results of the examinations; and (e) a 2-by-2 table could be constructed. Exclusion criteria were (a) under 18 years; (b) NAFLD patients with other causes of liver diseases; (c) NAFLD patients were not diagnosed by LB, or patients were pathologically diagnosed as a normal liver; (d) studies written in languages other than English and Japanese or unpublished, as well as reviews, case reports, gray literature or letters; (e) the time periods of the studies from the same hospital or clinic overlapped; and (f) NAFLD subgroup data was not gained from studies which analyzed as chronic liver disease or NAFLD patients with normal liver. We removed 1 inclusion criteria; primary papers published by middle- or high- income countries written in our registered PROSPERO because no study was excluded by the criteria.
Data was extracted as follows: patient characteristics (age, sex, body mass index [BMI], and diabetes mellitus [DM]), liver biopsy fibrosis stages, success rate and the interquartile range/median value of each elastographic method, fibrosis stages with the cutoff, accuracy measures (sensitivity and specificity), and the numbers of patients in 2-by-2 tables (the numbers of true positive, false positive, false negative, and true negative in addition to the overall sample sizes). Fibrosis stages were divided into 5 stages by Brunt/Kleiner classification [
14,
15]: F0, no fibrosis; F1, perisinusoidal or portal; F2, perisinusoidal and portal or periportal; F3, septal or bridging fibrosis; and F4, cirrhosis. Additional information about study design, methodology, and the prevalence of NAFLD in each study was also collected.
The count data used in MA was cell counts in 2-by-2 tables. Some were calculated from the extracted values of sensitivity and specificity because some literature did not report or misreported all the essential values. The 95% confidence intervals for the accuracy measures of included studies were re-calculated from the 2-by-2 tables using the exact binomial method.
Outcome measures
The pre-specified primary outcomes to be synthesized were sensitivity and specificity of a diagnosis for ≥ F2. As a set of secondary outcomes, the same measures for ≥ F3 and ≥ F1 were also synthesized, which were added after the protocol was registered because of the large number of studies and its clinical importance. Cirrhosis was out of scope in this study.
Statistical analysis
A Bayesian random-effects bivariate normal model was fitted to the data for each ultrasonographic method separately before a mixed-effects bivariate normal model for NMA for all ultrasonographic methods, including MRE, to estimate pooled sensitivity and specificity and corresponding 95% credible intervals (CrIs) and prediction intervals (PIs) and 95% credible and prediction regions. Note that 1 result per each study was included in separate MA while multiple results with varying thresholds from 1 study were included in NMA when available. The inclusion of multiple thresholds was possible because of the hierarchy in the NMA model including study-level random effects. Results with a cutoff chosen to be 90% of sensitivity or specificity were excluded, which resulted in the exclusion of 1 study from MA and NMA [
6]. This exclusion was needed because such sensitivity or specificity would have biased upwards our pooled estimates even though we used the bivariate normal models to jointly treat them. Hierarchical summary receiver operating characteristic curves were also drawn (see SI.2 for detailed statistical methods, SI.3 and SI.4 for fitting/convergence results of our main and inconsistency models, respectively). Posterior and prediction distributions of differences in sensitivity and specificity were sampled to calculate posterior/prediction probabilities that the differences were equal to or greater than 0%. The same set of analyses was done using a −5% margin.
Model fit was visually assessed in plots of prediction distributions and observed data points. Heterogeneity was assessed by prediction distributions and a pre-specified sub-group analysis by running a meta-regression including a covariate for published countries of study for its common effect across the different methods. A leave-one-out analysis was done to determine whether there was any influential study and if the result was robust. To assess the consistency assumption, the design-by-treatment-interaction model was used adding the inconsistency parameters into the main model for ≥ F2 [
16]. Characteristics that might influence accuracy measures were evaluated visually to check the homogeneity and transitivity assumption. Deeks’ funnel plots were used to assess publication bias [
17]. In a post-hoc manner, prediction intervals by plugging in posterior means of variance parameters were generated to see whether prediction intervals resulted from the main analysis were influenced by the posterior uncertainty in variance parameters. Posterior medians were used for point estimates because it was anticipated that the accuracy measures skewed distributions, and equally-tailed intervals for 95% CrIs and PIs were chosen. All analyses were done by R (version 4.1.2) and Stan (version 2.21.0) through the rstan package. The datasets, Stan and R codes for model fitting are available (
https://github.com/tetsuroda/dta_nma_nafld_2024).
Discussion
This study compared sensitivity and specificity of 4 elastographic methods for liver fibrosis in adult NAFLD patients by NMA. For ≥ F2, the diagnostic accuracy was comparable between TE and 2D-SWE given their 95% PIs and prediction regions, however, sensitivity of pSWE was slightly lower than the other 2 methods while specificity was similar. For ≥ F3 and ≥ F1, TE was slightly more accurate than SWEs, and pSWE had the lowest accuracy. Nonetheless, SWEs had relatively similar accuracy to TE in a prediction probability given a − 5% margin for all the outcomes, except for sensitivity of pSWE.
The purpose of potential ultrasonographic elastography use is to identify patients with “at risk” NASH [
8] and to reduce unnecessary LB to patients without fibrosis at hepatology care after NAFLD patients screening with blood tests and/or abdominal ultrasonography in the general population at a primary/non-hepatology care. Two-dimensional SWE could be recommended as a diagnostic tool more officially in ≥ F2. Considering the increasing number of patients with NAFLD, it is reasonable to assess liver fibrosis at an earlier stage by TE and 2D-SWE as a choice of diagnostic methods because more facilities would then have the capability to assess fibrosis at an earlier stage. This would result in better management of NAFLD, even though MRE has the best accuracy when conducted in limited facilities.
Our results were consistent with some other existing MA studies comparing only part of all 4 elastographic methods: a MA with diagnostic accuracy of fibrosis stages to TE and pSWE [
38], a MA to TE and SWE [
39] and an individual participant data meta-analysis (IPDMA) comparing the diagnostic accuracy of TE and MRE [
40]. Selvaraj’s review [
41] compared to TE, pSWE, 2D-SWE and MRE by MA, and the aggregated diagnostic accuracy of TE was similar, but that of 2D-SWE was lower while that of pSWE was higher, compared to ours. These differences might come from study population in the present study; restricted patients with NAFLD adults not including a non-NAFLD population or other etiology of liver diseases; and the duration within 3 months between LB and elastography reducing influence of patients with lifestyle changes while 6 months are often employed. The stringent inclusion criteria of this study should reduce spectrum bias. The published year of included studies was also different and only literature [
7] was included in both analyses. As such, we expected that the studies included in our analysis would be more homogenous and revealing of relevant data that could be applied to current practices.
The present study is the first NMA comparing the diagnostic accuracy of all available ultrasonographic methods of liver fibrosis. The NMA analysis enabled more precise estimation than the separate MA, though the number of studies included was relatively small due to stringent inclusion criteria. Our analysis also estimated the prediction intervals/regions that show a range of possible accuracy values in the future, which would be more useful than posterior estimates when clinicians think what level of accuracy would be obtained in the next diagnosis. The wider prediction intervals/regions indicated that any measures to reduce heterogeneity would be needed. In addition, probabilities where differences are equal to or greater than 0 or the margin allowed us more intuitive evaluation of accuracy compared to the dichotomized notion of null hypothesis testing because probability closer to 50% means the difference between the 2 diagnosis methods compared were due to random chance. Although careful interpretation of the results is needed, no obvious concerns have been found; characteristics of included studies’ samples appear to be relatively homogeneous within and between studies or the different diagnosis methods, and the consistency and inconsistency models provided similar results, as did the separate MA and a set of other sensitivity analyses.
This study does have some limitations. First, it was not possible to examine patient characteristics, which might affect the diagnostic accuracy, including the stages of steatosis and inflammation, DM and obesity. Further, the included studies had variations or unknown status (Table
1) in some characteristics including the type of probe (M and/or XL) [
42] and transaminase level in TE study [
43]. Therefore, the homogeneity and transitivity assumptions might not hold. However, the observed estimates of sensitivity and specificity against various factors (DM, cirrhosis, NASH, fasting hours, and needle gauge) did not show qualitatively obvious trends between them (Figs.
S2,
S15 and
S26 in SI.2). We could not conduct a subgroup analysis for obesity because of its missingness, and we considered that the result obtained in the meta-regression between Asian and non-Asian countries would be a proxy of an analysis for obesity. However, it turned out to be inadequate because the average BMI values in the studies conducted in Malaysia were higher and close to those in non-Asian countries (Table
S4 in SI.2). Considering the increase of obese patients in the world, it is becoming more important to screen such patients from the general population but there are currently few existing studies which clarify the features of this subgroup. In ultrasonographic elastography, the success rate of patients with BMI ≥ 30 kg/m
2 decreased as BMI increased, and they showed the different trends; relatively young (43.3 ± 4.0 years), common in females, a lower percentage of ≥ F3, and a lower ALT level compared to non-obese NAFLD [
6,
44]. It means that obesity patients need long-term follow-up, even though some have a normal ALT level, and the unreliable assessment of elastography might affect the adequate timing of the intervention. The recent metabolic dysfunction-associated steatotic liver disease diagnostic criteria includes BMI ≥ 25 kg/m
2 (23 in Asia) and suspected or diagnosed DM [
45], and we expected that further study might clarify the more detailed relationship of NAFLD with such factors and the diagnostic performance by using elastography, and what characteristics of NAFLD patients should require careful follow-up and treatment in primary and tertiary hospitals. Second, in the QUADAS assessment, some studies lack information on patient selection; failure rates of elastography; refusal rate of LB and the number of patients diagnosed pathologically as non-NAFLD, and/or lack criteria; the quality assessment of LB; diagnostic reliability [
46‐
48] and diagnostic concordance among pathologists [
48,
49]; and pre-specified thresholds of fibrosis stages for elastographic diagnosis. Such high-bias study settings and unclear information might be attributed to the heterogenous estimates and potentially the violation of homogeneity assumption although the leave-one-out analysis showed that there were no particular influential studies. Third, the heterogeneity might also come from our diffused prior distributions [
50] though the sensitivity analysis yielded almost identical results, and the results might differ if each diagnosis method had uncommon heterogeneity unlike our assumption. Finally, we did not know the percentage of patients with cirrhosis who had been excluded from LB in the included studies. This caused selection bias and the results may be different when elastography is conducted in the general population. However, from the AASLD guidance [
8], screening ‘at risk’ NASH is conducted only in hepatology care, not at a primary care level. Therefore, our results may be similar to clinical practice in hospitals. Further studies are needed to clarify whether the diagnostic accuracy of elastography is different in general population screening or hospital settings.
In conclusion, we conducted NMA to compare the diagnostic accuracy of 4 elastographic methods, and the results showed that 2D-SWE could be recommended as an alternative to TE for the assessment of liver fibrosis before LB. However, caution must be exercised in the use of pSWE because of low sensitivity. Further research is needed to reduce the heterogeneity in diagnostic accuracy, and to evaluate diagnostic accuracy for NAFLD.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.