Introduction
Radiomics is an innovative and emerging field that focuses on the high-throughput extraction of mineable and—possibly—reproducible quantitative imaging features from routinely acquired medical imaging. Radiomics is based on the hypothesis that a quantitative analysis of medical images by specific software can provide the physician with more information that otherwise would have not been possible to infer. The radiomic analysis starts with the acquisition of high quality and standardized imaging, in which a region of interest is manually or automatically delineated. Quantitative imaging features, which rely on the delineated region and surroundings, are extracted from the defined area and analyzed. After a process of features selection, the most informative features are identified even relying on their independence from other sources, which could potentially include clinical, genomic, proteomic data.
Results from the groundbreaking publication by Lambin et al. in 2012 [
1] have laid the basis for the main rationale for Radiomics studies, which is the identification of image-based biomarkers for diagnostic, prognostic, or predictive purposes. To date, several studies have been published on the correlation between radiomic features and specific disease phenotypes (e.g., benign vs malignant lesions [
2,
3]), genotypes (e.g., lung [
4] and gynecological cancers [
5]), and treatment response (e.g., head and neck cancers [
6]). Given these premises, the cost-effectiveness of Radiomics studies and the broad availability of digitalized imaging, it is quite straightforward to understand why Radiomics have rapidly become a trend topic in the field of Oncology.
While several narrative and systematic reviews, and some metanalyses have been published, the topic of these works has been necessarily specialistic, with dedicated focus on specific diseases, imaging modalities, and/or methodological aspects [
7‐
10].
We could state that, even if a large amount of published research is present, outside of academic literature there is still a limited range of clinical application of these technologies, which, in addition, have not been easily translated as commercial products [
11]. In addition, the continuous development of new data tools may potentially lead to a hazardous delay in the clinical implementations, as originally intended, leading to a mismatch between the need of more consolidated literature in clinical practice and the daily availability of new sophisticated and promising tools which have not been tested as bedside allies. In this fragmented scenario, recent studies have been providing new scores to assess the quality of science and reporting of radiomics in oncologic studies, resulting in a need of further literature consolidation and a deeper analysis of current literature.
Bibliometrics is a rigorous methodology for the analysis of large quantity of literature data and metadata, coming from high-quality public available scientific databases. This big-data methodology is easily accessible, reproducible, and objective, not including the human interaction step in the qualitative evaluation of the analyzed manuscripts such as it happens in other forms of literature analysis. Furthermore, it helps to highlight the evolutionary steps undertaken by a specific field, while revealing promising areas and future developments [
12,
13].
The aim of this work is to perform an unbiased bibliometric analysis on Radiomics 10 years after the first work on this topic became available to the scientific community and to analyze how the scientific interest and the harmonization in this field is growing. Collaboration networks, trending keywords, citation analysis, and thematic maps will have been built and analyzed to provide a comprehensive overview on Radiomics research, to underline its strengths and weaknesses, and to critically orient future publications in the field.
Discussion
Our study shows the state of art of the current literature in Radiomics by using a standardized and easily reproducible ML methodology. In 10 years since the publication of the first article about Radiomics, the improvement in data imaging technology has supported the increasing interest in Radiomics, heading to a quantitative approach demand in the image analysis path. Further developments, such as commercial and open-source software implementing artificial intelligence, have rocketed the scientific production in the last 5 years, leading to the consolidation of the computational mean of analysis in medical imaging. The fast-growing number of documents is well indicated by the 10-year annual percentage growth rate of almost 120%.
The fast-growing interest in the topic has led Radiomics not to achieve a standardized methodology of analysis, which could vary among different authors and articles. This is reflected by the relative lack of consolidation in current literature, which is characterized by a consistent presence of articles (n = 3829) and conference papers (n = 604), with only a scarce representation of books (n = 2) and books-chapter (n = 45). A trend intent of consolidation could be showed by the increasing number of reviews (n = 758) and by the desumed ratio between original articles and reviews (R = 5.05). Admittedly, the ML approach implemented here has not allowed further analysis on the nature (either narrative or systematic), of the included reviews, which have further addressed this issue. Additionally, in Scopus it is not possible to address metanalyses as a distinct article category, being them classified as part of reviews.
Considering international collaborations, our data show a considerable prevalence of SCPs among the top five most productive countries, with percentages ranging from 66.3 to 91.3% for Germany and Korea, respectively. Of note, MCPs have been more common in Canada, the Netherlands, and in the UK, with a prevalence approximating the one of SCPs.
From a qualitative standpoint, it is also possible to notice the existence of a broader cluster of European countries, albeit with varying degrees of edges between individual nodes. In this regard, it is worth noticing that—to date—few European-level initiatives exist to foster international collaborations on Radiomics. Of these, the EuCanImage—funded by the European Union’s Horizon 2020 research and innovation program—is a promising example of academic-industrial-clinical partnership of 20 institutions, with the aim of realizing integrative decision support systems for precision oncology through optimized data sharing, Radiomics, and artificial intelligence applications [
17]. A comparable effort is being sponsored by European funds to promote cross-border collaboration, under the name of Euradiomics [
18]. Collectively, these initiatives may represent a concrete step forward in the field, thanks to the possibility of collecting large amounts of imaging and clinical metadata, to encourage collaboration across participating institutions and to promote the development of advanced algorithms under the principles of federated learning, while promoting the real-life clinical application of this emerging discipline [
19,
20].
The collaboration network analysis of the top 30 most productive countries provides further insight into these data, indicating that the most relevant cluster of collaboration occurs between China and USA, followed by two smaller clusters constituted by Italy and Germany, and Canada and the Netherlands as the second and third most productive players, respectively.
It is widely known that Radiomics needs specific skills to ensure an adequate and reliable its pipeline. Furthermore, multidisciplinary competences can help critical reasoning in achieving correct conclusions from data analysis. Therefore, the need of dedicated teams should be considered crucial for an effective scientific production. This scenario is well represented in Table
1, which shows the unclustered data analysis of the main institutions contributing to document production. Nowadays, most of the literature is limited to a small number of institutions, arguably having high specialization in the field, and probably relying on well-established cooperative networks. The three most relevant affiliations for number of articles produced are “Fudan University” (
n = 292), “Memorial Sloan Kettering Cancer Center” (
n = 230), and “Sichuan University” (
n = 204). No European institutions are present in the top 10 more relevant affiliations ranking.
An analysis of the most cited documents has been performed to ascertain the interest and nature of the articles which have had major impact in the scientific literature of the field (Table
2). The first three most cited articles are milestones in the Radiomics research and account for 7698 citations in total, which is the 7.97% of the total number of overall citations of the papers included in our analysis. The first most cited article [
21] was written in 2016 by Gillies et al. and introduces the new paradigm of radiological images as valuable data that could be mined and classified by semi-automatic methods to accommodate patients’ features even beyond imaging, representing a promising tool for the diagnosis and prognosis of malignancies. Among the other topics, the challenges of reproducibility, of the big data analysis and the data sharing are discussed, as well. The second most cited article [
22] was written in 2014 by Aerts et al. and focuses on the usage of the quantitative Radiomics approach for defining a prognostic value of a Radiomic signature in non-small cell lung cancer. This study tightly associates the usage of the Radiomic data with the clinical-world ones, showing the feasibility of a pipeline which includes a features unsupervised clustering technique. The third most cited article was written in 2012 [
1] by Lambin et al. and it is not only the first written article on Radiomics, but it also introduces the concept of quantitative analysis of medical imaging data through automatic or semi-automatic software that can provide more and better information than those inferred by physicians. Among the 10 most cited documents, it is of valuable interest the presence of the article “The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping” [
23], published in 2020 by Zwanenburg et al., and is one of the first initiatives in the Radiomic field for the standardization of features extraction and analysis, mirroring the growing sensibility towards the topic of reproducibility in Radiomics studies. Furthermore, the latter study could be broadly seen as a bastion of the need of effective validation of radiomic signatures in the clinical practice, which is advisable to be concentrated on specific pathologies and clinical outcomes that have demonstrated the beneficial role of the clinical impact of radiomics.
Considering the keywords, we have analyzed the absolute top 50 author-keywords, dividing them in functional semantic groups (Fig.
5). The first group of words, which is the most represented in the document population, includes the methodological insight of the field, with keywords such as “machine learning”, “magnetic resonance imaging”, “deep learning”, “computed tomography”, “artificial intelligence”, “quantitative imaging”. The second group includes word that are close to the prognostic value of Radiomics, such as “nomogram”, “prognosis”, “radiogenomics”, “classification”, “biomarkers”, “diagnosis”. The third group includes the pathologies in which Radiomics has been performed, such as “breast cancer”, “lung cancer”, “prostate cancer”, “glioma”, “glioblastoma”. In the current analysis, many words have been repeated different times, using acronyms or substantive/adjective forms (e.g., “Radiomics” and “Radiomic”, “magnetic resonance imaging” and “mri”, etc.), showing a deep fragmentation of the keyword panorama. The present keyword heterogenicity could be probably due to the emerging nature of the field, but further consolidation may be needed to optimize the search strategy for system reviewing approach.
The adopted ML approach has showed five thematic co-occurrence clusters, mirroring the most common associations in the interdisciplinary common ground. The first cluster involves the term “Radiomics” tightly bound to “machine learning”, involving different keywords regarding some cancer pathologies and their prognosis (hepatocellular carcinoma, lung adenocarcinoma, breast cancer, nomogram, prediction) and some others regarding technical aspects (feature selection, classification, reproducibility). The presence of “nomogram” in the 2% of the total number of author keywords probably reflects the intent of providing clinicians with user-friendly tools, for a more immediate integration of Radiomics-derived information into routine pathways of care. This cluster in the thematic map analysis is both a basic and motor theme, in which most of the scientific articles are so far concentrated and in which it is our fervent aspiration that, through further consolidated efforts, more studies would ensure the respect of the impact of radiomics in the current clinical practice in the near future. The second cluster involves the terms “radiogenomics” and “precision medicine”, interconnecting with “immunotherapy”, “biomarker” and brain tumors, showing multi-omics application as emerging field in the current literature. The third cluster found its root in the machinery basin, involving terms as “PET”, “CT” and “Quantitative imaging”, covering a central role as basic and motor theme. The fourth cluster deals with the “artificial intelligence” and “deep learning”, including “convolutional neural network” and “COVID-19”, covering a central position as emerging theme, but still limited to a small number of documents in comparison to other fields of scientific production. It is interesting to notice that—as of the 31st of December 2021—63 publications had “COVID-19” among the author keywords, thus ranking among the top 50 keywords in Radiomics. Considering the relatively small timeframe in which COVID-19 became of interest in the longer lifespan of Radiomics, it is straightforward to observe that researchers worldwide have promptly tried to exploit the potentials of Radiomics from the very earliest phases of the pandemic outbreak. As last cluster, we found the co-occurrence of “X-ray computer” and “tomography”, which has many inbound and outbound connections with the Radiomic cluster and covers a supportive role for the keywords inside the methodology area.
We could perform a complete set of ML-based bibliometric analyses, thus providing an unbiased overview of the state of art of Radiomics in its first 10 years of life. Several aspects of the publication landscape that would otherwise be left unnoticed at human inspection, could be highlighted, and commented, including, but not limited to, type of publications, SCPs/MCPs pattern, institutions collaboration, and author keywords networks. In this sense, our work can be considered an integration to a previous effort in the field by Ding et al. [
24]. In their work, the authors have realized the first bibliometric analysis of Radiomics publications using CiteSpace, a “generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature” [
25]. However, Ding et al. have restricted their search to the sole field of oncology and have decided to exclude all publications other than full-texts and reviews, which somehow limits the field of analysis and excludes potentially relevant topics, such as COVID-19. Moreover, we believe that the inclusion of all publication types (e.g., books, chapters, and conference abstracts) can be informative in delivering a complete overview on how knowledge on a specific topic can be spread out of the more common track of indexed journals, which may be affected by long publication times, need of subscription, and other factors.
Admittedly, our work still presents some limitations. Firstly, the dataset was retrieved from a single electronic database (Scopus), which may have at least partially affected our findings. Secondly, it was not possible to perform a reliable analysis of neither citation trends or co-citation networks, as we could not account for citations deriving from works deriving from other sources and/or not included in our search. Finally, the number of citations changes over time, so this part of the analysis should be considered provisional, and prone to modification in the upcoming months/years.