Background
Bone and soft-tissue primary malignant tumors or sarcomas are rare entities with several histological subtypes, and each has an incidence < 1/100,000/year [
1,
2]. Among them, osteosarcoma is the most common sarcoma of the bone. Along with Ewing sarcoma, it has a higher incidence in the second decade of life, while chondrosarcoma is the most prevalent bone sarcoma in adulthood [
1]. The most frequent soft-tissue sarcomas are liposarcoma and leiomyosarcoma [
2]. Due to the rarity of these diseases, bone and soft-tissue sarcomas are managed in tertiary sarcoma centers according to current guidelines [
1,
2]. Both biopsy and imaging integrate clinical data prior to the beginning of any treatment, with the former representing the reference standard for preoperative diagnosis [
1,
2]. However, biopsy may be inaccurate in large, heterogeneous tumors due to sampling errors, and, in turn, inaccurate diagnosis may lead to inadequate treatment and subsequent need for further interventions, with increased morbidity. Additionally, the risk of biopsy tract contamination remains a concern. Imaging already plays a pivotal role in the assessment of bone and soft-tissue sarcomas. Magnetic resonance imaging (MRI) and computed tomography (CT) are employed for local and general staging, respectively [
1,
2]. These modalities may certainly benefit from new imaging-based tools such as those based on radiomics, which may potentially provide additional information regarding both diagnosis and prognosis noninvasively [
3].
The term “radiomics” derives from a combination of “radio,” referring to medical images and “omics,” which indicates the analysis of high amounts of data representing an entire set of some kind, like genome (genomics) and proteome (proteomics) [
3]. Therefore, “radiomics” includes extraction and analysis of large numbers of quantitative parameters, known as radiomic features, from medical images [
4]. This technique has recently gained much attention in oncologic imaging as it can potentially quantify tumor heterogeneity, which can be challenging to capture by means of qualitative imaging assessment or sampling biopsies. Particularly, radiomic studies to date have focused on discriminating tumor grades and types before treatment, monitoring response to therapy and predicting outcome [
5].
Despite its great potential as a noninvasive tumor biomarker, radiomics still faces challenges preventing its clinical implementation. Two main initiatives have addressed methodological issues of radiomic studies to bridge the gap between academic endeavors and real-life application. In 2017, Lambin et al. proposed the Radiomics Quality Score that details the sequential steps to follow in radiomic pipelines and offers a tool to assess methodological rigor in their implementation [
6]. In 2020, the Image Biomarkers Standardization Initiative produced and validated reference values for radiomic features, which enable verification and calibration of different software for radiomic feature extraction [
7]. However, numerous challenges still remain to ensure clinical transferability of radiomics. As radiomics is essentially a two-step approach consisting of data extraction and analysis, in the first step (i.e., data extraction), the main challenge is reproducibility of radiomic features, which can be influenced by image acquisition parameters, region of interest segmentation technique and image post-processing technique [
8,
9]. In the second step (i.e., data analysis), models can be built upon either conventional statistical methods or machine learning algorithms with the aim of predicting the diagnosis or outcome of interest. In either case, the main challenge is model validation [
9].
The challenges of reproducibility and validation strategies in radiomics have been recently addressed in a review focusing on renal masses [
10]. The aim of our study is to systematically review radiomic feature reproducibility and predictive model validation strategies in studies dealing with CT and MRI radiomics of bone and soft-tissue sarcomas. The ultimate goal is to promote and facilitate achieving a consensus on these aspects in radiomic workflows.
Methods
Reviewers
No Local Ethics Committee approval was needed for this systematic review. Literature search, study selection, and data extraction were performed independently by two recently boarded radiologists with experience in musculoskeletal tumors and radiomics (S.G. and F.M.). In case of disagreement, agreement was achieved by consensus of these two readers and a third reviewer with radiology specialty and doctorate in artificial intelligence and radiomics (R.C.). The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines [
11] were followed.
Literature search
An electronic literature search was conducted on EMBASE (Elsevier) and PubMed (MEDLINE, U.S. National Library of Medicine and National Institutes of Health) databases for articles published up to December 31, 2020, and dealing with CT and MRI radiomics of bone and soft-tissue sarcomas. A controlled vocabulary was adopted using medical subject headings in PubMed and the thesaurus in EMBASE. Search syntax was built by combining search terms related to two main domains, namely “musculoskeletal sarcomas” and “radiomics.” The exact search query was: (“sarcoma”/exp OR “sarcoma”) AND (“radiomics”/exp OR “radiomics” OR “texture”/exp OR “texture”). Studies were first screened by title and abstract, and then, the full text of eligible studies was retrieved for further review. The references of selected publications were checked for additional publications to include.
Inclusion and exclusion criteria
Inclusion criteria were: (1) original research papers published in peer-reviewed journals; (2) focus on CT or MRI radiomics-based characterization of sarcomas located in bone and soft tissues for either diagnosis- or prognosis-related tasks; (3) statement that local ethics committee approval was obtained, or ethical standards of the institutional or national research committee were followed.
Exclusion criteria were: (1) papers not dealing with mass characterization, such as those focused on computer-assisted diagnosis and detection systems; (2) papers dealing with head and neck, retroperitoneal or visceral sarcomas; (3) animal, cadaveric or laboratory studies; (4) papers not written in English language.
Data were extracted to a spreadsheet with a drop-down list for each item, as defined by the first author, grouped into three main categories, namely baseline study characteristics, radiomic feature reproducibility strategies, and predictive model validation strategies. Items regarding baseline study characteristics included first author’s last name, year of publication, study aim, tumor type, study design, reference standard, imaging modality, database size, use of public data, segmentation process, and segmentation style. Those concerning radiomic feature reproducibility strategies included reproducibility assessment based on repeated segmentations, reproducibility assessment related to acquisition or post-processing techniques, statistical method used for reproducibility analysis, and cut-off or threshold used for reproducibility analysis. Finally, data regarding predictive model validation strategies included the use of machine learning validation techniques, clinical validation performed on a separate internal dataset, and clinical validation performed on an external or independent dataset.
Discussion
This systematic review focused on the radiomics literature regarding MRI and CT of bone and soft-tissue sarcomas with particular emphasis on reproducibility and validation strategies. The number of papers reporting the assessment of radiomic feature reproducibility and the use of independent or external clinical validation was relatively small. This finding is in line with recent literature reviews showing that the quality of sarcoma radiomics studies is low [
53,
54], which may hamper performance generalizability of radiomic models on independent cohorts and, consequently, their practical application [
53]. Thus, these issues need to be addressed in the radiomic workflow of future studies to facilitate clinical transferability.
Baseline study characteristics
MRI and CT radiomics of bone and soft-tissue sarcomas have progressively gained attention in musculoskeletal and oncologic imaging. The number of papers has rapidly increased over the recent years, and almost half of those included in our review (47%) was published in 2020. Radiomics was used in attempt to answer clinical questions related to both diagnosis and prognosis of musculoskeletal sarcomas. Most studies (88%) were retrospective in nature, as this design allowed including relatively large number of patients with imaging data already available and bone or soft-tissue sarcomas, which are rare diseases. A prospective analysis, while not strictly necessary in radiomic studies [
5], may, however, have advantages for controlling data gathering in reproducibility assessment and matching certain patient or imaging characteristics in independent datasets. Public data were used in no study regarding bone sarcomas and in a small proportion of the studies (6%) concerning soft-tissue sarcomas. A public database [
55] available on The Cancer Imaging Archive (
https://www.cancerimagingarchive.net) was used in all these studies. Public databases afford opportunities for researchers who do not have sufficient data at their institution and allow research groups from around the world to test and compare new radiomic methods using common data. Thus, research employing radiomics in this field would certainly be enhanced if further imaging databases are made publicly available in the near future.
Regarding segmentation, the process was performed manually in most of the studies (92%) and semiautomatically in the remaining, both requiring human intervention to some extent. Even though the influence of inter-observer and/or intra-observer variability on the reproducibility of radiomic features can be assessed as part of the radiomic workflow, fully automated segmentation algorithms would ideally achieve higher reliability and deserve future investigation. Annotations included the entire lesion volume (3D segmentation) in most of the studies (71%) and a single slice (2D), without multiple sampling, in the remaining (23%). However, to date no study has compared the outcome of 2D and 3D segmentations in musculoskeletal sarcomas. As 2D annotations are time saving and have recently proven higher performance than 3D segmentation in oropharyngeal cancers [
56], this should represent another area of research in the near future. Of note, a limited number of studies (6%) used a 2D segmentation style with multiple sampling as a data augmentation technique to increase the number of labeled slices [
26,
48,
57]. This practice can be useful for an uncommon entity as musculoskeletal sarcomas but should be employed with care to avoid the introduction of bias in the final model. The inclusion of samples from the same case in both the training and test sets could lead to overly optimistic results.
Reproducibility strategies
A great variability in radiomic features has emerged as a major issue across studies and attributed to different segmentation, image acquisition, and post-processing approaches [
4]. Therefore, methodological analyses are advisable prior to conducting radiomic studies in order to assess feature robustness and avoid biases due to non-reproducible, noisy features. This concept is in line with recent literature emphasizing the importance of reproducibility in artificial intelligence and radiology [
58]. In our review, we noted that about one third of the included papers described a reproducibility analysis in their workflow. In this subgroup of papers, inter- and/or intra-reader segmentation variability was the main focus of the reproducibility analysis. Segmentation variability-related analyses outnumbered those addressing reproducibility issues due to image acquisition or post-processing differences, which were reported in one paper per each [
30,
31]. This finding underlines that further research should deal with dependencies of radiomic features on image acquisition and post-processing. While these analyses may already be performed in retrospective series, when patients underwent more than one study in a short interval, prospective studies could facilitate the identification of reliable radiomic features within this domain. Finally, ICC was the statistical method used in most of the papers evaluating radiomic feature reproducibility. Of note, guidelines for performing and assessing ICC are available and can be followed to achieve consensus on the cutoff and threshold values [
59].
Validation strategies
Proper validation of radiomic models is highly desirable to bridge the gap between concepts and clinical application [
53]. Machine learning validation techniques are employed to avoid any information leak from the test to the training set during model development [
60]. Resampling strategies can be extremely useful, especially with relatively limited samples of data, which may not be truly representative for the population of interest, with the aim of reducing overfitting and better estimating the performance of the radiomics-based predictive model on new data (i.e., the test set) [
61,
62]. K-fold cross-validation was the most commonly used technique for this task in the studies included in this review.
Ideally, in both prospective and retrospective studies, a clinical validation of the model is performed against completely independent sets of data, i.e., the external or independent test set [
4]. We found that clinical validation was performed against an independent dataset from the primary institution (using different scanners) or from a different institution only in a small number of studies (10%) included in this systematic review. More studies (29%) validated the model using a separate set of data from the primary institution, i.e., an internal test set. Therefore, future studies should be carried out in more than one institution and include external testing of the model with large and independent sets of data.
Limitations and conclusions
This study is limited to a systematic review of the literature, and no meta-analysis was performed due to the lack of homogeneity between studies in terms of objectives and subgroups of sarcoma with a rather limited number of papers per each objective and subgroup. Different metrics were also used, preventing us from providing an estimation of model performance for each objective. Furthermore, it was outside of the scope of the review to perform a formal assessment of the quality of each included study, as our focus was on reporting methodological data that can be in and of themselves quality indicators. Limitations notwithstanding, we reviewed the radiomics literature regarding bone and soft-tissue sarcomas with emphasis on the methodologic issues of feature reproducibility and predictive model validation. They varied largely among the included studies, and, in particular, no reproducibility analysis was provided in more than half the papers. Additionally, less than half the studies included a clinical validation, and only 10% used an independent dataset for this purpose. Thus, in order to bring the field of radiomics from a preclinical research area to the clinical stage, both these issues should be addressed in future studies dealing with musculoskeletal sarcomas.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.