Background
Traditional pathological examination is often realized by microscopy. By observing the histomorphological characteristics of cells or tissues that have been paraffin-fixed and mounted on glass slides, well trained pathologists can achieve disease diagnosis and classification [
1,
2]. To date, the assessment of histopathological slides by pathologists is still the gold standard for tumor diagnosis [
3,
4]. However, in spite of following the same diagnosis principles, diagnosis interpretations stand for the subjective analysis of pathologists, showing the non-standardized and low-repeatable decision-making process. This is the reason why significant interobserver variation often occurs even among highly-trained pathologists, which seriously affects the accuracy of tumor diagnosis [
5]. Therefore, it is urgent to find an objective and reproducible method to realize tumor diagnosis and improve the diagnostic accuracy.
With the rise of digital pathology (DP), DP has changed the practice of traditional pathology, including its application in medical education and clinical practice [
6‐
8]. As whole-slide scanner has become more widespread and popular, most glass slides can be digitized into whole slide images (WSI) for storing and analyzing through a computer-aided method [
9]. DP plays a crucial role in modern clinical practice and is also a great solution to overcome the challenges that traditional pathology faced, such as heavy workload or low diagnostic accuracy [
10]. Moreover, the application of WSI makes it easier and convenient for pathologists to enable a digital workflow, so as to achieve telepathology and clinical practice, which potentially changes the way of tumor diagnosis [
11,
12].
Artificial intelligence (AI) was proposed by McCarthy et al. in the 1950s [
13]. Since then, AI has been rapidly evolved and been extensively used in different fields ranging from science and technology, finance and medicine. The medical image analysis field has been an important field of AI-based research [
14]. Through the predictive analytics of AI-based CT/MRI or other medical images, physicians can make better diagnosis and therapy decisions [
15]. In term of DP, the introduction of WSI allows for AI-based predictive analytics in histopathology and WSI serves as a major platform for the application of AI in DP. With the progress of algorithm and network technology, especially the emergence of machine learning and deep learning, AI has been widely applied in the subfield of DP, particularly in oncology and precision medicine [
16]. Compared with traditional pathology, the whole glass slice images can be obtained by AI-based WSI over a short period of time, then quantitative and qualitative analysis on the images can be conducted through deep learning to faster and more accurately identify new histopathological features, which is helpful for pathologists and physicians to understand and predict the progress and prognosis of the disease, and carry out in-time treatment intervention, so as to optimize individualized treatment and realize precision medical treatment. Moreover, the application of artificial intelligence algorithm makes the pathological diagnosis process more rapid, automatic and standardized [
8].
In view of the aspects described above, research on AI-based DP has gained more and more attention of researchers, particularly for tumor pathology, which is the most major branch of DP research [
17‐
20]. However, the explosive growth in the number of publications in this field has made it increasingly difficult for most researchers to keep up with the latest research findings. To date, there are only a few reviews or meta-analysis to summarize a certain aspect of AI-based tumor pathology research, while some important information is ignored, such as the contributions of authors, institutions, and future research forefront or foci. Bibliometric analysis, as a method that can quantitatively and qualitatively analyze and visualize all the documents published in a certain research field, has been widely used in medical fields [
21‐
24].
Therefore, to gain much deeper insight into the AI-based tumor pathology research, this study aimed to identify the most productive countries, institutions or authors, and make an overall knowledge structure of scientific publications on AI-based tumor pathology research from 1999 to 2021 by bibliometric analysis, so as to provide the current research foci or hotspots and help scholars who have or are about to devote to this field.
Discussion
In the era of explosive growth of information, it is very difficult to maintain sensitivity to research hotspots, master the latest research results and maintain a leading position in the research field. Therefore, bibliography retrieval and knowledge management are the routine tasks of every scientific researcher. Different from systematic review or meta-analysis, bibliometric analysis has the advantages of summarizing the development of specific research fields as well as analyzing research hotspots. This is the first study to summarize the application and development of AI-based tumor pathology through bibliometric methods, showing the development trend of AI-based tumor pathology in the past 23 years, and predicting future research hotspots in this field.
To a certain extent, the number of scientific articles reflect the development of research in a particular field. The results of this study showed that during 1999–2021, publications on AI-based tumor pathology had been increasing, especially in the past 6 years, the number of published papers accounted for 81% of all publications, which benefits from the sharp development of deep learning. In addition, the number of papers published has increased rapidly after 2016, mainly due to the proposal and application of a variety of new deep learning frameworks, such as deep residual networks, spatially constrained convolutional neural network (SC-CNN), etc. AI-based tumor pathology has become an important research field in clinical practice, and has a bright prospect.
According to the results of countries/regions distribution, among the 86 countries/regions involved in this study, the United States (1138, 41.34%) was the country with the largest number of published articles, followed by China (541, 19.65%), which together accounted for 60.99% of all papers, demonstrating their leadership in the study of AI-based tumor pathology. However, the total citations in China was unsatisfactory, especially the average citation per paper, which was the lowest among the top 10 countries/regions in terms of productivity (Table
1). China was the country with the fastest growth in the number of publications in this study, but it still lacked highly-cited or high-quality research, which leaded to its insufficient international influence. It can be seen from Fig.
3D that China, India and many other countries participated in the field of AI-based tumor pathology later than the United States, Canada and Germany, showing that they were newly active in this field and may have a more important position in the future.
As for countries/regions cooperation, the United States was the center of research and had close cooperation with China, Germany and the United Kingdom. However, most cooperation and research communication were limited to North America, Europe and a few Asian countries. Therefore, international transboundary cooperation was essential in the future, especially with developing countries/regions. It cannot be denied that economic support also plays an important role in supporting scientific research output. Increased investment of encouragement and funding support in scientific research may need in many countries, so that they may become important participants in this field in the future.
The top 10 productive institutions were all from North America, of which 8 belong to the United States and 2 were from Canada. Harvard Medical School was the most productive and influential institution, and it also maintained close cooperative relationships with multiple countries/regions, including institutions from China. However, although some institutions in China, such as Shanghai Jiao Tong University and Southern Medical University, had also published many papers and achieved a certain academic influence, there were not much close cooperation and exchanges with academic institutions in other countries. In addition, the BC value of all institutions was lower than 0.1, which suggested that research institutions in this field were scattered. Therefore, academic institutions in various countries needed to strengthen cooperation with each other, to further improve the academic status of the country.
Identification of important journals and journal co-citation analysis can provide researchers with a wealth of reliable reference information and is helpful for them to determine the most suitable target journals when searching for literature or submitting their research [
37]. In addition to total citations, impact factor (IF) and JCR [
38,
39] category are two important indicators for evaluating the academic status of journals. Most of the journals listed in Table
2 were comprehensive journals, mainly including oncology, medical imaging and AI. It could be found that all the top 10 journals located in Q1/Q2, and the IF ranged from 3.367 to 20.096, indicating that AI-based tumor pathology related articles could also be published in high-impact journals.
Scientific Reports was the journal with the largest number published articles, showing that most articles related to this field would be considered for publication in this journal. Furthermore, it is worth noting that
BJU International and
European Urology both were important journal in urology, indicating that urogenital neoplasm was one of the hotspots in AI-based tumor pathology research.
Journal co-citation analysis provides insight into the connections between different research findings [
40].
Scientific Reports, Lecture Notes in Computer Science, IEEE Transactions on Medical Imaging and
Medical Image Analysis were the journals with TLS over 100,000, which indicated that the research papers related to AI-based tumor pathology in such journals were more likely to be cited. The results in Fig.
5B showed that
Computerized Medical Imaging and Graphics and
WMJ had the largest BC value (0.19). It is suggested that researchers in this field could pay more attention to the research findings published in these journals to obtain the latest research progress.
In author co-authorship analysis, five of the top 10 most active authors were from the United States and they published a total of 100 papers. Madabhushi, Anant from the United States contributed the most papers, followed by Rajpoot, Nasir M. from UK and Yang Lin from China with 25 and 20 papers respectively. A point worth noting was that although Van Der Laak Jeroen A. W. M. and Litjens Geert published few papers, their total citations exceeded 5000 times, indicating their important position in this field. Meanwhile, from Fig.
6A we found that Van Der Laak, Jeroen A. W. M. and Litjens Geert were also the critical authors connecting multiple research clusters, which may explain the reason for their high citations. However, the BC value for each author was lower than 0.1 in the author co-authorship analysis, which reflected the little cooperation between different research teams. Consequently, international transboundary cooperation should be strengthened.
As for author co-citation analysis, the BC values of Jemal Ahmedin, Madabhushi Anant and Ficarra Vincenzo reached 0.25, 0.21 and 0.21, respectively. Jemal Ahmedin is a well-known expert in the field of oncology and has published several Cancer statistics in the
CA-A Cancer Journal for Clinicians [
41,
42,
43]. Madabhushi Anant, who works at the Department of Biomedical Engineering in Case Western Reserve University, and his colleagues published a key paper using an instance of a deep learning strategy, Stacked Sparse Autoencoder (SSAE), paved the way for efficient nuclei detection on high-resolution histopathological images of breast cancer [
44]. Ficarra Vincenzo is an expert in urology, focusing on the research of surgical treatment of urogenital cancer and many of his articles have been cited more than 200 times [
45‐
48]. Therefore, we believe that in terms of the AI-based tumor pathology research, more important articles may be published by the above team members, strengthening cooperation with these top teams is a good choice for research.
Citation analysis and co-citation analysis of reference are important means in a bibliometric study, which use to identify important literature as well as evaluate the research evolution and predict the frontiers of research development. High-cited articles are usually high-quality research with strong innovation and significant impact in a certain field. Table
4 listed the top 10 most cited studies, all of which had more than 500 citations and have significant influence in this field. Specifically, the review of Litjens Geert, “A survey on deep learning in medical image analysis” published on
Medical Image Analysis had been cited 3777 times, which was the most cited article in this field [
49]. The article summarized the main deep learning concepts related to medical image analysis and multiple contributions to this field. Also, it discussed the state-of-the-art technology and future research foci of deep learning. Another article with more than 2400 citations was published in 2004 by Rhodes DR. His team demonstrated “ONCOMINE”, a cancer microarray database and web-based data-mining platform that facilitated the discovery of genome-wide expression analysis [
31].
Burst detection is an algorithm developed for capturing the sharp increases of references or keywords popularity within a certain period, which can serve as an efficient method to identify hotspots or topics. Our findings suggested that the first reference citation burst in the field started in 2011 and continued until 2021. It was due to the research on Random Forests published by Breiman L in 2001 [
35], which introduced a machine learning algorithm with more robustness to noise, and laid the foundation for a series of subsequent studies. Figure
7C showed that most of the reference citation burst were still in progress, and the latest one began in 2019, caused by multiple researches. Among them, the strongest strength value was the literature on new deep residual nets published by Kaiming He et al. in 2016 [
36]. His research team introduced a new deep learning model to deal with deeper neural training and achieved good results, having a certain impact on visual recognition in the future.
Co-occurrence analysis of keywords is a common method used in bibliometrics to identify popular research topics, which can reflect the changing process of research topics in the whole field and better grasp the research hotspots [
50]. As shown in Fig.
8A, “deep learning”, “machine learning” and “artificial intelligence” were the most frequently occurring keywords, which were consistent with the topic of this study. “Breast cancer” and “prostate” cancer” were the most keywords among all tumor keywords. To date, breast cancer is the cancer with the highest incidence among women, while prostate cancer is the second most common cancer in men, and both are currently the most common causes of cancer related death [
51,
52]. How to achieve quick and accurate tumor staging or grading through pathology for precise treatment is the current research focus in this field. In addition, the combination of multiomics analysis [
53‐
55] such as radiomics [
56,
57] is the focus of future breakthrough in digital tumor pathology. Of course, this process requires more powerful algorithm updates and funding support.
Keywords burst detection in Fig.
8C showed that the first detected keyword was “artificial neural network” in 2002. from 2007 to 2019, keywords related to tumor treatment such as “radiotherapy”, “robotic surgery” or “chemotherapy” had become popular researches topics. The latest burst began in 2019, including the following keywords: “convolutional neural network”, “magnetic resonance image” and “histopathological image”. With the popularization of artificial intelligence and the renewal of deep learning algorithm, convolutional neural network has become the most important algorithm for processing medical images, especially in radiology and histopathology [
58‐
60]. However, deep learning-based AI has been queried by both clinician and pathologists for the lack of good interpretability, hindering the clinical application of AI model [
61‐
63]. Therefore, the development of interpretable deep learning algorithm is the focus of breakthrough for better application of deep learning-based AI in clinical practice. In addition, gone were the days of diagnosing or classifying diseases through a single pathological tissue section or radiological imaging. Many studies have shown that multimodal fusion methods, integrating proteomics, radiomics, genomics, etc. are much more accurate in tumor diagnosis, staging or prognosis prediction [
64,
65]. The multi-modal fusion model may be also an important topic for the future development of tumor pathology.