Introduction
Artificial Intelligence (AI) is one of the fastest-growing areas of informatics and computing with great relevance to radiology. A recent PubMed search for the term “Artificial Intelligence” returned 82,066 publications; when combined with “Radiology,” 5,405 articles were found. Most of these papers have been published since 2005. Practicing radiologists, trainees, and potential future radiologists need to understand the implications of AI for the specialty, what it means, how it can contribute to the radiological profession, and how may change it in the future. The European Society of Radiology (ESR) is aware of the impact that AI is having on the field of Radiology, from technical-scientific, ethical-professional, and economic perspectives. Much fear has been generated among radiologists by the statements in public media from researchers engaged in AI development, predicting the imminent extinction of our specialty. For example, Andrew Ng (Stanford) stated that “[a] highly-trained and specialised radiologist may now be in greater danger of being replaced by a machine than his own executive assistant” [
1], whereas Geoffrey Hinton (Toronto) said “[i]f you work as a radiologist, you’re like the coyote that’s already over the edge of the cliff, but hasn’t yet looked down so doesn’t realise there’s no ground underneath him. People should stop training radiologists now. It’s just completely obvious that within 5 years, deep learning is going to do better than radiologists […] We’ve got plenty of radiologists already [.]” [
2].
As one of the responsibilities of the ESR eHealth and Informatics subcommittee, this paper aims to provide a review of the basis for application of AI in radiology, to discuss the immediate ethical and professional impact of AI in radiology, and to consider possible future evolution of such technology within diagnostic imaging.
Definitions
Artificial Intelligence (AI) represents the capacity of machines to mimic the cognitive functions of humans (in this context, learning and problem solving). AI can be subdivided into artificial narrow intelligence, where a computer can perform a very specific task as well as or better than humans (e.g., IBM’s Watson computer which beat two Jeopardy champions in 2011), and artificial general intelligence, where a computer goes beyond specific tasks to perform higher-order syntheses, emulating human thought processes [
3]. In 1950, British computer scientist Alan Turing enunciated the basis of the Turing test: a computer passes the test if a human interrogator, after posing a number of written questions, cannot tell whether the written responses come from a person or a computer [
4,
5]. A refinement of this is the so-called Smith test: data is provided to a computer to analyse in any way it wants; the computer then reports the statistical relationships it thinks may be useful for making predictions. The computer passes the Smith test if a human panel concurs that the relationships selected by the computer make sense [
6].
AI can be understood as a set of tools and programs that make software “smarter” to the extent that an outside observer thinks the output is generated by a human. It operates similarly to the way a normal human brain functions during regular tasks like common-sense reasoning, forming an opinion, or social behaviour [
7].
The term “artificial intelligence” was first used in 1956 at the summer workshop at Dartmouth College in Hanover, New Hampshire, organised by John McCarthy, an American computer scientist, pioneer, and inventor [
8].
The term machine learning (and its subcategories) implies the situation in which an
agent (anything that can be viewed as perceiving its environment through
sensors and acting upon that environment through
actuators) is learning if it improves its performance on future tasks after making observations about the world [
9]. Machine learning is a term introduced by Arthur Samuel in 1959 to define a field of AI in which computers learn automatically from data accumulation; it has been extensively applied to big data analysis. Machine learning algorithms evolve with increasing exposure to data; they are not based exclusively on rules, but improve with experience, learning to give specific answers by evaluating large amounts of data [
10].
The learning can be unsupervised, reinforced, supervised, and semi-supervised. In
unsupervised learning, the agent learns patterns in the input even though no explicit feedback is supplied. In
reinforcement learning, the agent learns from a series of reinforcements—rewards or punishments.
Supervised learning provides the agent with a teacher’s output that the agent then learns. In
semi-supervised learning, fewer teacher’s outputs are given to the agent. In this context, the concept of “ground truth,” which means checking the results of machine learning for accuracy against the real world, is fundamental for validating AI performance. In a radiology context, this might mean confirming diagnoses suggested by AI by comparison to pathological or surgical diagnoses; ground truth is the data assumed to be true [
11]. Machine learning has been likened to training a dog: “reinforcing good behaviour, ignoring bad, and giving her enough practice to work out what to do for herself” [
12].
Deep learning is a subset of machine learning and is the basis of most AI tools for image interpretation. Deep learning means that the computer has multiple layers of algorithms interconnected and stratified into hierarchies of importance (like more or less meaningful data). These layers accumulate data from inputs and provide an output that can change step by step once the AI system learns new features from the data. Such multi-layered algorithms form large artificial neural networks [
9].
Artificial Neural Networks are composed of nodes or
units (thousands to millions) connected by links. A link propagates
activation from one unit to another, and each link activation is weighted by a numeric value which determines the strength of the connection. The activation function can be based on a threshold of activation (the strength), and the unit that receives the activation is called a
perceptron. Perceptrons are connected by links and create a network; the network can be
feed-forward (where connections are only in one direction) or it could be a
recurrent network, which feeds its outputs back into its own inputs (a loop). Feed-forward networks are usually arranged in
layers. The goal is that the answer for each examination should match the examination’s labels. Mathematically, the algorithm is designed to maximise the number of right answers as the inputs are processed through its layers [
9].
Artificial neural networks must be “trained” using training data sets from which the network “learns.” In radiology, these usually consist (at least initially) of hand-labeled image data sets used by the algorithm to improve its fit to match the ground truth. Once a network has been trained using a training data-set, it would then be tested using a different set of data (validation data sets), designed to evaluate the fit of the model to new data. In this step, it is common to observe “
overfitting” of the model. Yamashita et al. describe overfitting as the situation
“where a model learns statistical regularities specific to the training set, i.e., ends up memorising the irrelevant noise instead of learning the signal, and, therefore, performs less well on a subsequent new dataset.” [
13]. The consequence of overfitting is that the network will not be generalisable to never-seen-before data and will be a source of more interpretation errors than those which occurred in the training data set.
The best solution for reducing overfitting is to obtain more training data. Multiple rounds of training and testing on different datasets may be performed, gradually improving network performance, and permitting assessment of the accuracy and generalisability of the algorithm, before the algorithm is released for general use. Another solution is the so-called “data augmentation” which means modifying the training data by adding some variability so that the model will not see the exact same inputs from the training data set during the training iterations. A simple example: if the network is being trained to recognise a cystic lesion on ultrasound, on the training data set the lesion could be perfectly cystic, but in a further iteration of training, some internal hyperechoic spots caused by artifacts could be added in order to train or fine-tune the network to recognise a “non-perfect cyst” in the validation data sets [
13].
As a general rule, the “deeper” the network (more layers) and the more rounds of training, the better the performance of the network.
Use cases
The term “use case” describes a specific clinical application of AI in radiology. Use cases can be considered as precise scenarios within the radiology service chain where automation could add significant value and establish standards.
Computer-aided detection (CAD) represents the earliest clinical applications of basic AI in radiology. CAD system has been progressively implemented in radiological practice in the last two decades in the detection of lung, colon, breast, and prostate cancer, but the beginning of research in CAD, according to Kunio Doi [
14], a scientist and pioneer in CAD research, can be attributed to articles published between 1963 and 1973 [
13,
15‐
19]. Multiple CAD applications have been reported since then, and CAD has become common in clinical practice, with the main application to the detection of lung, colon, and breast cancers [
20‐
22].
The main difference between CAD and “true” AI is that CAD only makes diagnoses for which it has been specifically trained and bases its performance on a training dataset and a rigid scheme of recognition that can only be improved if more datasets are given to the CAD algorithm. True AI is characterised by the process of autonomous learning, without explicit programming of each step, based on a network of algorithms and connections, similar to what humans do.
In the last decade, there has been an explosion in studies employing artificial intelligence for image interpretation that embrace disease detection and classification, organ and lesion segmentation (determining the boundaries of an organ or lesion), and assessment of response to treatment. However, it is difficult to discriminate papers related to the use of CAD and those reporting the pure application of machine or deep learning, since both terms are included in the wider term “artificial intelligence.” Some of many recent applications of AI include the RSNA paediatric bone age machine learning challenge on plain radiographs [
23], breast cancer detection in mammography and MRI [
24‐
29], chest radiograph interpretation [
30‐
33], liver lesion characterisation on ultrasound and CT [
34‐
36], brain tumour [
37,
38], and prostate cancer detection [
39,
40].
A step beyond disease detection is disease classification into low or high risk, with good or poor prognosis. Much of the work in this field has been in brain imaging, in both benign and malignant disease. There has been considerable effort to develop AI classifiers in paediatrics, where brain mapping and functional connectivity can be linked to neurodevelopmental outcome. In a study evaluating resting state-functional MRI data from 50 preterm-born infants, binary support vector machines distinguished them from term infants with 84% accuracy (p < 0.0001), based primarily on inter- and intra-hemispheric connections throughout the brain [
41]. In multiple sclerosis, AI has been used to evaluate the performance of combinations of MRI sequences to optimise brain lesion detection [
42]. Classification of glioma grade based on MR images has been attempted with some success [
43].
Automated segmentation is crucial as an AI application for reducing the burden on radiology workflow of the need to perform segmentation manually. It also provides vital information on the functional performance of tissues and organs, disease extent, and burden. Avendi et al. developed a deep learning and deformable model for left ventricular (LV) segmentation from cardiac MRI datasets, to obtain an automated calculation of clinical indices such as ventricular volume and ejection fraction [
44]. Multiple studies have been published about abdominal (liver, pancreas, vessels) and pelvic (prostate) organ segmentation using a deep learning approach [
45‐
51].
A similar approach has been applied to segmenting brain metastases on contrast-enhanced T1W MR for planning for stereotactic radiosurgery [
52].
AI, radiology training and future directions
Radiologists’ skills are based on many years of training during which the trainee is taught to interpret large numbers of examinations based on a combined process of reading coupled with the knowledge of clinical information. Interpretation skills strongly depend on the number of exams interpreted and the accuracy of the visual image analysis. AI can perform image reading by exploiting deep learning tools and is able not only to extract visual information but also quantitative information, such as Radiomic signatures or other imaging biomarkers, which would not be identified by the human brain. AI is going to become part of our image viewing and analysis toolset. When software becomes part of the process of interpretation, trainees may not make enough direct (“unaided”) interpretations during their training years and therefore may not acquire adequate interpretation skills. The other face of this coin is that trainees will be helped by AI to perform better interpretations; nonetheless, a strong dependence of future radiologists on aid from AI software is a risk, with potentially deleterious consequences.
The implementation of AI in radiology requires that trainees learn how to best integrate AI in radiological practice, and therefore a specific AI and informatics module should be included in the future radiology training curricula.
AI involvement in our professional lives is inevitable. We need to work with software developers and computer engineers to assist the process of integration of AI tools into our workflows (PACS/RIS systems, task automation, etc.), always protecting the interests of patients primarily.
A vast amount of AI research is ongoing; image interpretation is an attractive target for researchers, given that the tasks involved (at least in part) involve analysis of large amounts of data to produce an output. Radiologists cannot, and should not, wish this research away, but rather embrace it and integrate it as much as possible into the daily work, guiding AI research directions to ensure the maximum clinical benefit to patients from new developments.
Another task that must be taken on is leadership in educating policymakers and payers about radiology, AI, their integration, and the associated pitfalls. In any rapidly developing industry, there is initial excitement, often followed by disappointment when early promises are unfulfilled [
69].
The hype around new technology, often commercially-driven, may promise more than it can deliver, and tends to underplay difficulties. It is the responsibility of clinical radiologists to educate themselves about AI in radiology, and in turn to educate those who manage and fund our hospitals and healthcare systems, to maintain an appropriate and safe balance protecting patients while implementing the best of new developments.
Will radiologists be replaced by AI?
The simple answer is: NO. However, radiologists’ working lives will undoubtedly change in this era of artificial intelligence. Many of the single routine tasks in the radiology workflow will be performed faster and better by AI algorithms, but the role of the radiologist is a complex one, focused on solving complex clinical problems [
68]. The real challenge is not to oppose the incorporation of AI into the professional lives (a futile effort) but to embrace the inevitable change of radiological practice, incorporating AI in the radiological workflow [
12]. The most likely danger is that “[w]e’ll do what computers tell us to do, because we’re awestruck by them and trust them to make important […] decisions” [
6]. Radiologists can avoid this by educating themselves and future colleagues about AI, collaborating with researchers to ensure it is deployed in a useful, safe, and meaningful way, and ensuring that its use is always directed primarily towards the patient benefit. In this way, AI can enhance radiology, and allow radiologists to continually improve their relevance and value [
70,
71].
Acknowledgments
This paper was prepared by Prof. Emanuele Neri (Chair of the ESR eHealth and Informatics Subcommittee), Prof. Nandita deSouza (Chair of the ESR European Imaging Biomarkers Alliance - EIBALL Subcommittee), and Dr. Adrian Brady (Chair of the ESR Quality, Safety and Standards Committee), on behalf of and supported by the eHealth and Informatics Subcommittee of the European Society of Radiology (ESR).
The authors gratefully acknowledge the valuable contribution to the paper of Dr. Angel Alberich Bayarri, Prof. Christoph D. Becker, Dr. Francesca Coppola, and Dr. Jacob Visser, as members of the ESR eHealth and Informatics Subcommittee.
The paper was approved by the ESR Executive Council in February 2019.