Published online Jan 03, 2022.
https://doi.org/10.3348/kjr.2021.0421
Quality of Radiomics Research on Brain Metastasis: A Roadmap to Promote Clinical Translation
Abstract
Objective
Our study aimed to evaluate the quality of radiomics studies on brain metastases based on the radiomics quality score (RQS), Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) checklist, and the Image Biomarker Standardization Initiative (IBSI) guidelines.
Materials and Methods
PubMed MEDLINE, and EMBASE were searched for articles on radiomics for evaluating brain metastases, published until February 2021. Of the 572 articles, 29 relevant original research articles were included and evaluated according to the RQS, TRIPOD checklist, and IBSI guidelines.
Results
External validation was performed in only three studies (10.3%). The median RQS was 3.0 (range, -6 to 12), with a low basic adherence rate of 50.0%. The adherence rate was low in comparison to the “gold standard” (10.3%), stating the potential clinical utility (10.3%), performing the cut-off analysis (3.4%), reporting calibration statistics (6.9%), and providing open science and data (3.4%). None of the studies involved test-retest or phantom studies, prospective studies, or cost-effectiveness analyses. The overall rate of adherence to the TRIPOD checklist was 60.3% and low for reporting title (3.4%), blind assessment of outcome (0%), description of the handling of missing data (0%), and presentation of the full prediction model (0%). The majority of studies lacked pre-processing steps, with bias-field correction, isovoxel resampling, skull stripping, and gray-level discretization performed in only six (20.7%), nine (31.0%), four (3.8%), and four (13.8%) studies, respectively.
Conclusion
The overall scientific and reporting quality of radiomics studies on brain metastases published during the study period was insufficient. Radiomics studies should adhere to the RQS, TRIPOD, and IBSI guidelines to facilitate the translation of radiomics into the clinical field.
INTRODUCTION
Brain metastases are the most common type of intracranial tumors in adults [1], and these substantially affect the overall prognosis of patients with underlying cancers [2]. Although the clinical outcomes of patients with brain metastases have largely improved due to advances in neuro-oncology treatment, there is still room for improvement, because the management of brain metastases is often complex and controversial [3]. As MRI is the imaging modality of choice for brain metastasis [4], many studies have attempted to use MRI to address many questions in the field of brain metastasis. Radiomics is a recently introduced method that enables data mining from MRI; therefore, many studies have used MRI with radiomics in patients with brain metastases. As radiomics is more likely to be used with MRI in the field of brain metastases in the near future, it is highly desirable to assess the quality of published studies to evaluate the current status.
Radiomics involves the exploitation of MRI data to extract high-dimensional quantitative imaging features, which can be used to support clinical decision-making [5, 6]. Although previous radiomics studies in the field of neuro-oncology have mainly focused on gliomas [7, 8, 9, 10, 11, 12], there has been an increase in the number of studies on brain metastases. Indeed, radiomics studies have demonstrated promising results in the discrimination of brain metastasis from other tumors [13, 14, 15, 16, 17, 18, 19, 20], identification of primary tumor types in patients with brain metastases [21, 22, 23, 24], prediction of specific genetic mutations [25, 26, 27, 28, 29], prediction of survival [30, 31], differentiation between radiation necrosis and brain metastasis [32, 33, 34, 35], and prediction of outcome after radiosurgery [36, 37, 38, 39, 40, 41]. However, such studies on brain metastases are confined within the limits of experimental settings, without translation into real-world clinical settings [42]. To lessen this translational gap, scientific and reporting quality must be high, which will enable a standardized evaluation of performance and increase the reproducibility and clinical utility of radiomics.
The radiomics quality score (RQS) is determined using a scoring system that incorporates the crucial aspects of radiomics studies and assesses their quality [43, 44]. The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) checklist is also a well-known tool for assessing the quality of prediction models [45]. Additionally, the Image Biomarkers Standardization Initiative (IBSI) provides a comprehensive and detailed review of each mandatory step for radiomics analyses, including nomenclature of radiomics features, general schemes, and datasets for calibration [46]. To the best of our knowledge, no study has evaluated radiomics studies on brain metastases according to the criteria of these three quality assessment tools. Assessing the quality of current radiomics studies on brain metastases may further promote the use of radiomics as a clinical tool. Therefore, in light of the RQS, TRIPOD checklist, and IBSI guidelines, this study aimed to evaluate the quality of radiomics studies that focused on brain metastases.
MATERIALS AND METHODS
Systematic Search Strategy and Study Selection
PubMed MEDLINE (n = 516) and EMBASE (n = 56) databases were searched to collect all original research papers on radiomics analysis published until February 2021, and the following terms were used for the search: (“brain metastasis” OR “brain metastases”) AND (“radiomic” OR “radiogenomic” OR “texture”). From a total of 572 papers retrieved using the specified search terms, 39 duplicate articles were removed. Of the remaining articles, 504 were excluded because they were non-radiomics studies (n = 100), conference abstracts (n = 25), not in the field of interest (n = 255), non-human (n = 6), non-brain (n = 20), review articles (n = 60), technical notes (n = 18), editorial articles (n = 1), erratum (n = 1), comments (n = 5), or case reports (n = 11), or included < 50 brain metastases (n = 2). Finally, 29 articles were included in the analysis (Fig. 1).
Fig. 1
Flow chart of the study selection process.
Analysis of Method Quality Based on RQS
The RQS was determined using 16 items classified into six domains, as reported in previous studies (Supplementary Table 1) [47, 48, 49]. Two reviewers (with 8 and 10 years of experiences in neuroradiology) independently evaluated the domain scores of all included articles (Supplement, Radiomics Quality Scoring) after first achieving consensus on the evaluation criteria through discussion. If disagreements occurred between the two reviewers, a final decision was made after a consensus was reached.
Additional discussions were required to reach consensus on the following topics: issues of “validation” (domain 2), “comparison with the gold standard” (domain 3), and “detection and discussion of biologic correlation” (domain 3) (Supplement, Consensus Reached for RQS Scoring) due to the distinct nature of studies on brain metastasis.
Analysis of Reporting Completeness Based on the TRIPOD Checklist
The TRIPOD checklist, consisting of 37 items in 22 criteria, was applied to each article to determine the completeness of the report [45]. The analysis type of the prediction model was determined as follows: development only, type 1a; development and validation using resampling, type 1b; random split-sample development and validation, type 2a; nonrandom split-sample development and validation, type 2b; development and validation using separate data, type 3; or validation only, type 4. Details of the TRIPOD checklist and data extraction are shown in Supplement (Reporting Completeness Based on TRIPOD Statement).
Reporting of Image Processing and Radiomics Feature Extraction according to the IBSI Guidelines
The IBSI guidelines provide a total of 76 items for complete reports on image processing and image biomarker extraction (https://ibsi.readthedocs.io/en/latest/04_
Statistical Analysis
All statistical analyses were performed using R software (version 4.0.2; R Foundation for Statistical Computing).
In total, 29 articles were reviewed. In the cases where a score of one point per item was obtained, the study was considered to have basic adherence to each item of the RQS, TRIPOD checklist, or IBSI guideline. The basic adherence rate (%) was calculated as the proportion of the number of articles with basic adherence to the total number of articles. Sixteen items of the RQS were scored. The RQS for each item was reported as the median and range values. The percentage of the ideal score (%) was calculated as the ratio of the mean score to the ideal score for each item, and the total RQS (-8 to 36) was calculated for all articles. A total of 35 items on the TRIPOD checklist was scored. When calculating the overall adherence rate, validation items (10c, 10e, 12, 13, 17, and 19a) and “if done” items (5c and 11) were excluded from both the denominator and numerator. Seven items from the IBSI guidelines are scored.
RESULTS
Characteristics of Radiomics Studies on Brain Metastasis
The characteristics of the 29 included radiomics studies [13, 15, 16, 17, 18, 19, 22, 24, 25, 26, 28, 29, 30, 31, 32, 33, 35, 37, 38, 39, 40, 51] are documented in Table 1. In the included studies, the median number of patients included was 77 (range, 24–439). The types of journals were clinical journals (7, 24.1%), imaging journals (18, 62.1%), and computer/neuroscience journals (4, 13.8%). Radiomics studies were either diagnostic (21, 72.4%) or prognostic (8, 27.6%). Except for eight prognostic studies covering prediction of survival and prediction of outcome after radiosurgery, all other studies were diagnostic. Detailed characteristics of the included studies are presented in Supplementary Table 2.
Table 1
Characteristics of the 29 Included Radiomics Studies
Except for one study using positron emission tomography, 28 studies used MRI. Twenty-three (79.3%) studies used conventional images for feature extraction, and five (17.2%) used additional advanced images. Most studies performed manual segmentation (62.1%). Only three studies performed an external validation. Multiple brain metastases (> 1 lesion per patient) were included in the dataset of 20 studies (69.0%). Among them, seven studies demonstrated the statistical analysis that was relevant to the handling of cluster-correlated data. In three studies, patients with multiple metastases were assigned to the validation set or the training set to prevent bias from cluster correlation [18, 22, 31]. Correction for false discovery rate [30], correlation tests between the features [24, 27], and marginal proportional-hazards models were used for multiple lesions within the same patient [40]. In the remaining nine studies, only one lesion per patient constituted the dataset.
Rate of Basic Adherence to the Reporting Quality according to the Six Key Domains
The rate of basic adherence to reporting quality for the 16 items (six domains) of the RQS is documented in Table 2. In terms of domain 1, 20 studies (69.0%) included well-documented image protocols. Nine studies (31.0%) involved multiple segmentations performed by different readers to evaluate segmentation reliability [14, 15, 16, 17, 23, 26, 29, 35, 40]. No study has used a test-retest approach or a phantom study.
Table 2
Radiomics Quality Score according to the Six Key Domains
In terms of domain 2, 24 studies (82.8%) performed feature reduction or adjustment for multiple tests. Ten studies (34.5%) performed validation using a dataset obtained from the same institution, gaining two points [13, 15, 17, 22, 27, 29, 30, 31, 32, 38]. One study (3.4%) performed validation using a dataset from another institute, earning three points [14]. In the remaining 18 studies (36%), validation was missing.
For domain 3, 13 studies (44.8%) performed multivariate analyses with non-radiomics features. Four studies (13.8%) earned 1 point each for using biological correlation components [23, 29, 33, 41]. Only three studies (10.3%) compared radiomics-based methods with the “gold standard” method (e.g., neuroradiologists’ decision to distinguish brain metastasis from other tumors) [14, 17, 20]. Three studies (10.3%) provided potential applications of radiomics for the prediction of mutation status or progression-free survival through decision curve analysis or nomograms [26, 31, 41].
In terms of domain 4, only one study (3.4%) performed a cut-off analysis to stratify patients with significantly different progression-free survival [31]. Among the 26 studies (90.0%) that used discrimination statistics or a resampling method, 21 studies (80.8%) used both discrimination statistics and a resampling method [13, 14, 15, 16, 17, 22, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 35, 37, 38, 39, 40]. Only two studies (6.9%) used calibration statistics with statistical significance [31, 41], with only one study also applying a resampling method [41].
In terms of domain 5, none of the studies were prospective or executed a cost-effectiveness analysis. For domain 6, only one study (3.4%) made the code publicly available, which defined image pre-processing and feature extraction.
Assessment of the RQS
The median RQS for the 25 radiomics studies was 3.0 (range, -6 to 12). The mean score was 3.4, which was 9.6% of the ideal score (Table 2, Fig. 2). When considering each domain, the mean score and percentage of the ideal score were the lowest in domain 2 (feature selection and validation) and highest in domain 4 (model performance). Among the six domains, domains 5 (high level of evidence) and 6 (open science and data) had mean scores of 0 and 0.03, respectively. In five studies (17.2%), neither feature selection nor validation was performed [19, 20, 21, 34, 41].
Fig. 2
RQS assessment results according to the six key domains.
RQS = radiomics quality score
Completeness of Reporting a Radiomics-Based Multivariable Prediction Model Using the TRIPOD Checklist
Of the 35 items in the TRIPOD checklist, the mean number of items reported ± standard deviation was 17.2 ± 3.8 (range, 6–26). When we excluded “if relevant” and “if done” (item 5c) items from both the numerator and denominator, the adherence rate to this checklist was 60.3%. The adherence rates to each TRIPOD checklist are shown in Table 3.
Table 3
Adherence of Radiomics Studies to Individual Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis Checklist Items
Reporting of Image Pre-Processing and Radiomics Feature Extraction according to the IBSI Guidelines
Among the 29 studies, 20 (69.0%) performed signal intensity normalization. Only six studies (20.7%) performed bias-field correction (five performed N4 bias correction while one performed N3 bias correction), and nine (31.0%) performed isovoxel resampling, four (13.8%) performed skull stripping, and four (13.8%) performed gray-level discretization (Table 4, Fig. 3). The software used for the extraction of radiomics features included Pyradiomics (27.6%), Matlab (27.6%), LIFEx (13.8%), IBEX (6.9%), and others (13.8%). Two studies (6.9%) did not mention the software used. Among these, Pyradiomics and LIFEx adhered to the IBSI guidelines (Table 4, Fig. 3).
Fig. 3
Reporting of image pre-processing and radiomics feature extraction according to Image Biomarker Standardization Initiative.
N/A = not available
Table 4
Quality of Image Processing and Radiomics Feature Extraction according to the Image Biomarker Standardization Initiative Guidelines
DISCUSSION
In this study, radiomics studies on brain metastasis were evaluated with respect to scientific and reporting quality. Overall, the scientific and reporting quality of radiomics studies on brain metastases was suboptimal, with a low overall adherence rate for the RQS (50.0%) and TRIPOD checklist (60.3%) items. Only three studies performed external validation, indicating that most of the studies lacked generalizability. According to the RQS, none of the studies involved test-retest or phantom studies, prospective studies, or cost-effectiveness analyses. The majority of the studies did not address biologic correlations, compare radiomics-based methods to the gold standard method, state the potential clinical utility, perform cut-off analysis or calibration statistics, and provide open science and data. According to the TRIPOD checklist, the majority of studies did not report the title adequately. No study has described the blind assessment of outcomes and handling of missing data or presented a full prediction model. The detailed pre-processing steps were missing in most of the studies.
The basic adherence rate to the RQS items was low (50.0%). Specifically, the evaluation of feature robustness was insufficient, with only a few studies performing multiple segmentations, while no studies performed phantom studies or collected images at multiple time points. A considerable proportion of the studies executed feature extraction, which is essential to avoid overfitting [52]. However, only three studies performed external validation, whereas 18 studies did not perform validation at all. As validation is the key process to allow radiomics to be integrated as a reliable tool for clinical practice, future radiomics studies should include a validation step. Most studies did not describe biological correlations or comparisons with the gold standard. However, there are no established gold standards for the prediction of outcomes following radiosurgery or prediction of primary tumor types, except for pathologic confirmation, which inevitably decreases the adherence rate. Few studies have provided the potential clinical utility of radiomics and made the code publicly available. None of the studies were prospective and performed cut-off analysis, calibration statistics, and cost-effectiveness analysis, resulting in low adherence rates. Thus, the quality of radiomics studies on brain metastases has the potential to improve several categories of RQS.
In the reporting of radiomics studies according to the TRIPOD checklist, there were several highly problematic items. Only one study included the terms “development” and “validation” with the target population and outcome mentioned in the title. Similarly, most of the studies lacked the description of “development” and “validation” in the abstract and did not describe the specific study design. None of the studies reported handling of missing data. Furthermore, only a few studies reported their actions for blind assessment of predictors. These results of suboptimal reporting are in line with those of previous studies that investigated multivariable prediction model studies and oncologic studies according to TRIPOD [47, 53]. Meanwhile, since the current TRIPOD checklists focus on regression-based prediction model approaches, several studies that were irrelevant to the regression analyses were not applicable to the assessment of TRIPOD. Specifically, the checklists associated with model development and specification are based on regression analyses. The use of TRIPOD in artificial intelligence and machine learning (ML) studies is limited, as these studies frequently did not entail regression analyses [54]. Therefore, the development of a new version of the TRIPOD statement specific to ML (TRIPOD-ML) is currently in progress, focusing on ML prediction algorithms and building an established methodology for prediction research [54].
Adherence to the nomenclature of standard radiomics features and calculations described by the IBSI are important for improving the reproducibility of studies [52, 55]. However, only a few studies have reported the quality of radiomics studies that assessed adherence to the IBSI of each journal [44, 56]. In our study, we focused on the pre-processing steps in IBSI and reported the number of published radiomics studies on brain metastases following the guidelines. Bias-field correction aims to reduce low-frequency intensity nonuniformity [57], while isotropic resampling reduces directional biases [58] and gray-level discretization cluster pixels according to intensity values to enhance feature reproducibility [59]. Image intensity normalization is also a necessary step in increasing repeatability [60]. However, the majority of the studies did not execute all pre-processing steps and only performed a few. Radiomics studies on brain metastases should focus on pre-processing, which will increase the reproducibility of radiomics features.
Among the 20 studies that enrolled multiple brain metastases of more than one lesion per patient to constitute the dataset, only seven studies demonstrated the statistical analysis relevant to the handling of cluster-correlated data. Assignment of the respective metastases either to the validation set or to the training set for patients with multiple metastases was most frequently used to avoid bias from cluster correlation. As brain metastases tend to be multiple and multiple lesions from one patient may correlate with each other, specific considerations in the statistical analysis should be considered in future studies.
Our study had several limitations. First, the sample size was relatively small. Second, radiomics is still a developing imaging biomarker, and the suggested RQS and TRIPOD criteria may be too idealistic and strict. Phantom studies and multiple imaging acquisitions have rarely been applied to real-world practice. Additionally, it may be too extensive to provide the whole unadjusted association between each candidate predictor and outcome and the full prediction model. However, considering these details is necessary to improve the scientific and reporting quality of radiomics to enable the clinical translation of radiomics in the future.
In conclusion, the overall scientific and reporting quality of radiomics studies on brain metastases published during the study period was insufficient, and this low quality may hamper the use of radiomics in the clinical field. The RQS, TRIPOD checklist, and IBSI guidelines should be adhered to make radiomics a more robust decision-making tool in clinical practice.
Supplement
The Supplement is available with this article at https://doi.org/10.3348/kjr.2021.0421.
Click here to view.(56K, pdf)
Conflicts of Interest:The authors have no potential conflicts of interest to disclose.
Author Contributions:
Conceptualization: Yae Won Park, Seung-Koo Lee.
Data curation: Chae Jung Park, Sung Soo Ahn.
Formal analysis: Chae Jung Park, Dain Kim.
Funding acquisition: Yae Won Park, Eui Hyun Kim, Seok-Gu Kang.
Investigation: Yae Won Park, Jong Hee Chang, Se Hoon Kim.
Methodology: Jong Hee Chang, Seung-Koo Lee.
Project administration: Se Hoon Kim, Seung-Koo Lee.
Resources: Jong Hee Chang, Se Hoon Kim.
Software: Chae Jung Park, Dain Kim.
Supervision: Yae Won Park, Sung Soo Ahn, Seung-Koo Lee.
Validation: Dain Kim, Eui Hyun Kim, Seok-Gu Kang.
Visualization: Sung Soo Ahn, Seung-Koo Lee.
Writing—original draft: Chae Jung Park, Yae Won Park.
Writing—review & editing: Chae Jung Park, Yae Won Park, Sung Soo Ahn.
Funding Statement:This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2020R1I1A1A01071648) and the “Team Science Award” of Yonsei University College of Medicine (6-2021-0009).
Availability of Data and Material
The datasets generated or analyzed during the study are available from the corresponding author on reasonable request.
References
-
Ortiz-Ramón R, Ruiz-España S, Mollá-Olmos E, Moratal D. Glioblastomas and brain metastases differentiation following an MRI texture analysis-based radiomics approach. Phys Med 2020;76:44–54.
-
-
Béresová M, Larroza A, Arana E, Varga J, Balkay L, Moratal D. 2D and 3D texture analysis to differentiate brain metastases on MR images: proceed with caution. MAGMA 2018;31:285–294.
-
-
Ortiz-Ramón R, Larroza A, Ruiz-España S, Arana E, Moratal D. Classifying brain metastases by their primary site of origin using a radiomics approach based on texture analysis: a feasibility study. Eur Radiol 2018;28:4514–4523.
-
-
Kawahara D, Tang X, Lee CK, Nagata Y, Watanabe Y. Predicting the local response of metastatic brain tumor to gamma knife radiosurgery by radiomics with a machine learning method. Front Oncol. 2021 Jan; [doi: 10.3389/fonc.2020.569461][Epub].
-
-
Zwanenburg A, Leger S, Vallières M, Löck S. Image biomarker standardisation initiative. arXiv [Preprint]. 2016 [cited 2021 May 1].Available at: https://doi.org/10.1148/radiol.2020191145 .
-
-
Crombé A, Fadli D, Italiano A, Saut O, Buy X, Kind M. Systematic review of sarcomas radiomics studies: bridging the gap between concepts and clinical applications? Eur J Radiol 2020;132:109283
-
MeSH Terms
Figures
Tables
ORCID IDs
Funding Information
-
National Research Foundation of Korea
2020R1I1A1A01071648
-
Yonsei University College of Medicine
6-2021-0009