Key points
-
Most artificial intelligence tools for fracture detection on children have focussed on plain radiographic assessment.
-
Almost all eligible articles used training, validation and test datasets derived from a single institution.
-
Strict inclusion and exclusion criteria for algorithm development may limit the generalisability of AI tools in children.
-
AI performance was marginally higher than human readers, but not significantly significant.
-
Opportunities exist for developing AI tools for very young children (< 2 years old), those with inherited bone disorders and in certain clinical scenarios (e.g. suspected physical abuse).
Background
Materials and methods
Literature review
Eligibility criteria
Methodological quality
Data extraction and quantitative data synthesis
Results
Eligible studies
Methodological quality assessment
Patient demographics and study setting
Author, year | Country | Body part | Type of injury | Patient inclusion criteria | Patient exclusion criteria | Study aim |
---|---|---|---|---|---|---|
Zhou, [36] | USA | Forearm | Plastic bowing deformities | Forearm radiographs of children aged 1–18 years with history of trauma | None stated | Development of a computer-aided detection application for plastic bowing deformity fractures in paediatric forearms |
Malek [32] | Malaysia | Lower limb (femur, tibia, fibula) | Any fracture | Radiographs of fractured femur, tibia or fibula in children < 12 years of age | None stated | Development of an artificial neural network to analyse normal (< 12 weeks) versus delayed healing time for paediatric lower limb fractures |
England [31] | USA | Elbow | Traumatic elbow joint effusions | Elbow radiographs of children aged 1–19 years attending the emergency department with history of blunt trauma. Lateral view of radiograph technically adequate | Images with cast applied, elbow dislocation/displacement, comminuted fracture, metallic surgical hardware | Detection of traumatic paediatric elbow joint effusions using a deep convolutional neural network |
Rayan [33] | USA | Elbow | Any elbow fracture | Elbow radiographs in children | None stated | Binomial classification of elbow fractures using a deep learning approach |
Choi [17] | South Korea | Elbow | Supracondylar fractures | Elbow radiographs (two views) in children with suspected supracondylar fracture | Follow-up imaging (only initial radiographs included) Non-supracondylar fractures Elbow dislocation Underlying bone dysplasia | Development of a dual input convolutional neural network for detection of supracondylar fractures |
Starosolski [34] | USA | Distal tibia | Most fracture types | Radiographs of the foot, ankle, tibia or fibula in children | Plastic bowing fractures or any fracture without discrete fracture line. Images with surgical fixation, cast or other alternative pathology than fracture | Development of a convolutional neural network for detection of tibial fractures |
Dupuis [30] | France | Appendicular skeleton | Any appendicular fracture type | Radiographs of any body part from consecutive patients < 18 years old with suspected trauma attending emergency department | Radiographs of the axial skeleton (skull, spine, chest) | External validation of a commercially available deep learning algorithm for appendicular fracture detection in children |
Zhang [35] | Canada | Distal radius | Any fracture type | Children aged < 17 years with unilateral distal radial tenderness following trauma with asymptomatic contralateral wrist as normal comparator | Existing cast over forearm, laceration of the forearm, open fractures, inability to tolerate ultrasound study, lack of time for scanning | Diagnostic accuracy of 3D ultrasound and use of artificial intelligence for detection of paediatric wrist injuries |
Tsai [58] | USA | Distal tibia | Corner metaphyseal fractures | Children aged < 1 years referred for suspected abuse | None stated, AP projections for normal and abnormal distal tibial radiographs included only | Develop and evaluate a machine learning based binary classification algorithm to detect distal tibial corner metaphyseal fractures on radiographic skeletal surveys performed for suspected infant abuse |
Author, year | Dataset study period | Patient ages (years, unless otherwise stated) | % Male | No. centres | Type of centre(s) | Index test | Ground truth / reference | Ground truth blinded to clinical detail? |
---|---|---|---|---|---|---|---|---|
Zhou [36] | Not stated | Range: 1–18 | Not stated | Single | Tertiary Paediatric | Plain radiography | Two radiologists, over 10-year experience each | Yes |
Malek [32] | 4 years (2009–11, 2014) | Median: 8.5 SD: 3.9 Range: 0–12 | Not stated | Single | Tertiary Paediatric | Plain radiography | Time to fracture healing where no fracture line can be identified on radiography, as determined by single orthopaedic surgeon | No, but all cases were fractured |
England [31] | 3.6 years (Jan 2014–Sept 2017) | Mean: 11.4 SD: 5.1 Range: 1–19 Percentage of children in age groups (1–5, 6–10, 11–15, 16–19) per dataset are also provided in manuscript. | 64.6% | Single | Tertiary Paediatric | Plain radiography | Radiology reports by consultant radiologist. A sub selection of 262 mages re-reviewed by three musculoskeletal radiologists | Musculoskeletal radiologists assessing a sub selection of the radiographs were blinded. Original radiologist report unblinded |
Rayan [33] | 4 years (Jan 2014–Dec 2017) | Mean: 7.2 Range: 0–18 | 57% | Single | Tertiary Paediatric | Plain radiography | Radiological reports by a single radiologist (experience unspecified) | No |
Choi [17] | 6 years (Jan 2013 to Dec 2018) | Percentage of children in age groups (0–4, 5–9, 10–14, 15–19) per dataset are provided in manuscript. No mention of mean, median ages overall. Range: 0–19 | Not stated | Two centres, same city | Tertiary Paediatric | Plain radiography | All radiographs re-reviewed by two paediatric radiologists | Yes |
Starosolski [34] | 8 years (2009–2017) | Mean: 6.4 SD: 4.4 | 33% | Single | Tertiary Paediatric | Plain radiography | Radiology reports by a single radiologist | Unclear |
Dupuis [30] | 1 year (March 2019–2020) | Median: 9.2 Mean: 8.5 Range: 0–17 SD: 4.5 | 57.3% | Single | Tertiary Paediatric | Plain radiography | Radiology report by one of a possible eleven radiologists with 2.5–35 years’ experience | No, but this reference was not used for training |
Zhang [35] | Not stated | Mean: 9.9 Range: 3.8–14.8 | 70% | Single | Tertiary Paediatric | 3D ultrasound | Plain radiography acquired within 30 days of ultrasound of affected wrist, reported by consultant radiologist of affected limb. The contralateral limb was also imaged with ultrasound but without radiography confirmation of injury. In these cases normality was presumed where asymptomatic | Not for the 3D ultrasound, unclear regarding radiography reporting |
Tsai [58] | 13.4 years (1 Jan 2009 to 31 May 2021) | ‘Normal’ Cohort Mean: 5 months Range: 0.2–11.6 months SD: 3.3 months. ‘Abnormal’ Cohort Mean: 3.3 months Range: 0.4–12 months SD: 2.9 months | ‘Normal’ Cohort = 68.5%; ‘Abnormal’ Cohort = 73% | Single | Tertiary Paediatric | Plain radiography | Radiology report issued by consultant radiologist with subsequent confirmation by primary study author (experienced paediatric radiologist) | Unclear, likely not blinded |
Imaging dataset sizes
Author, year | Body part | Total dataset (patients) | Total dataset (exams and images) | Training set | Validation set | Test set |
---|---|---|---|---|---|---|
Zhou [36] | Forearm | 226 | 226 radiographs (59 bowing fractures) | 226 radiographs (59 bowing fractures) | N/A | N/A |
Malek [32] | Lower limb (femur, tibia, fibula) | 57 | Unclear, presumed 57 exams. No mention of projections or total images. (25, 50% normal healing time; 25, 50% delayed healing time) | 39 exams (18, 50% normal; 18, 50% abnormal) | 9 exams (4, 44.4% normal; 5, 55.6% abnormal) | 17 exams (11, 64.7% normal; 6, 35.3% abnormal) |
England [31] | Elbow | 882 | 901 lateral radiographs (images) | 657 images (500, 76.2% normal; 157, 23.8% abnormal) | 115 images (82, 71.3% normal; 33, 28.7% abnormal) | 129 images (96, 74.4% normal; 33, 25.6% abnormal) |
Rayan [33] | Elbow | Not stated | 21,456 exams; 58,817 images | 20,350 exams; 55,721 images (4966, 24% normal, 15,384, 76% abnormal) | 1106 exams; 3096 images (516, 47% normal, 590, 53% abnormal) | N/A |
Choi [17] | Elbow | 810 | 1619 elbow exams; 3238 images | 1012 exams (780, 77.1% normal; 232, 22.9% abnormal) | 254 examinations (196, 77.2% normal; 58, 22.8% abnormal) | Temporal set: 258 exams (192, 74.4% normal; 66, 25.6% abnormal) Geographic set: 96 exams (72, 75.8% normal, 23, 24.2% abnormal) |
Starosolski [34] | Distal tibia | 490 | 490 exams; 245, 50% abnormal 245, 50% normal | Not stated | Not stated | 98 images (49, 50% normal; 49, 50% abnormal) |
Dupuis [30] | Appendicular skeleton | 2549 | 2634 exams; 5865 images | N/A | N/A | 1825, 69.2% normal; 809, 30.8% abnormal exams |
Zhang [35] | Distal radius | 30 | 55 × 3D ultrasound ‘sweeps’ of both wrists (injured and contralateral); Each ‘sweep’ having ~ 382 image slices Overall 19 cases of distal wrist fracture | 21 sweeps (~ 6000 images) Abnormal: Normal split not stated | 1640 image slices selected from 72 sweeps of 36 patients.23, 64% normal; 13, 36% abnormal cases 990, 60% normal; 650, 40% abnormal images Unclear how this validation dataset was acquired | N/A |
Tsai [58] | Distal tibia | 124 patients (35 abnormal, 89 normal) | 250 radiographs (177 normal, 73 abnormal) | 187 radiographs | 13 radiographs | 50 radiographs |
Imaging algorithm methodology
Algorithm diagnostic accuracy rates
Author, year | Dataset | Body part | AUC | Accuracy, % (95% CI) | Sensitivity, % (95% CI) | Specificity, % (95% CI) | PPV, % (95% CI) | NPV, % (95% CI) | TP | FP | FN | TN |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Upper limb—elbow | ||||||||||||
England [31] | Validation | Elbow effusions | 0.985 (0.966–1.00) | NS | NS | NS | NS | NS | NS | NS | NS | NS |
Test | Elbow effusions | 0.943 (0.884–1.00) | 0.907 (0.843–0.951) | 0.909 (0.788–1.00) | 0.906 (0.844–0.958) | NS | NS | 87 | 9 | 3 | 30 | |
Rayan [33] | Validation | Elbow fractures | 0.947 (0.930–0.960) | 0.877 (0.856–0.895) | 0.908 (0.882–0.929) | 0.841 (0.807–0.870) | 0.867 (0.838–0.892) | 0.889 (0.858–0.914) | 536 | 82 | 54 | 434 |
Choi [17] | Validation | Supracondylar fractures | 0.976 (0.949–0.991) | 0.945 (0.910–0.967) | 0.948 (0.859–0.982) | 0.944 (0.902–0.968) | 0.833 (0.726–0.904) | 0.984 (0.954–0.995) | 55 | 11 | 3 | 185 |
Temporal test set | Supracondylar fractures | 0.985 (0.962–0.996) | 0.904 (0.855–0.938) | 0.939 (0.852–0.983) | 0.922 (0.874–0.956) | 0.805 (0.717–0.871) | 0.978 (0.945–0.991) | 62 | 15 | 4 | 117 | |
Geographical test set | Supracondylar fractures | 0.992 (0.947–1.000) | 0.895 (0.817–0.942) | 1.000 (0.852–1.000) | 0.861 (0.759–0.931) | 0.697 (0.564–0.803) | 1.000 | 23 | 10 | 0 | 62 | |
Dupuis [30] | Test | Elbow fractures (subgroup) | NS | 0.888 (0.847–0.919) | 0.918 (0.846–0.958) | 0.873 (0.819–0.913) | 0.781 (0.969–0.847) | 0.956 (0.915–0.977) | 89 | 25 | 8 | 172 |
Upper limb—other | ||||||||||||
Zhou [35] | Test set (best performing for AP ulnar view, using optimal central angle measurement of bone) | Forearm (Bowing fracture) | 0.992 (NS) | NS | 1.000 (NS) | 0.940 (NS) | NS | NS | NS | NS | NS | NS |
Zhang [35] | Test set—analysed per patient | Distal radius (ultrasound) | NS | 0.92 | 1.0 | 0.87 | NS | NS | NS | NS | NS | NS |
Lower limb | ||||||||||||
Malek [32] | Training | Lower limb fracture healing | 0.8 (NS) | 0.821 (0.673–0.910) | 0.792 (0.595–0.908) | 0.867 (0.621–0.963) | 0.905 (0.711–0.973) | 0.722 (0.491–0.875) | 19 | 2 | 5 | 13 |
Validation | Lower limb fracture healing | NS | 0.556 (0.267–0.811) | 0.600 (0.231–0.882) | 0.500 (0.150–0.850) | 0.600 (0.231–0.882) | 0.500 (0.150–0.850) | 3 | 2 | 2 | 2 | |
Test | Lower limb fracture healing | NS | 0.889 (0.565–0.980) | 1.000 (0.566–1.000) | 0.750 (0.301–0.954) | 0.833 (0.436–0.970) | 1.000 (0.439–1.000) | 5 | 1 | 0 | 3 | |
Starosolski [34] | Test | Distal tibia | 0.995 (NS) | 0.979 (0.929–0.994) | 0.959 (0.863–0.989) | 1.000 (0.927–1.000) | 1.000 (0.924–1.000) | 0.961 (0.868–0.989) | 47 | 0 | 2 | 49 |
Tsai [58] | Test (mean and SD for accuracy across models in fivefold cross-validation) | Distal tibia (corner metaphyseal fracture) | NS | 0.93 ± 0.018 | 0.88 ± 0.05 | 0.96 ± 0.015 | 0.89 ± 0.036 | 0.95 ± 0.023 | 13 | 2 | 2 | 33 |
Test (best performing model) | Distal tibia (corner metaphyseal fracture) | NS | 0.960 (0.865–0.989) | 0.929 (0.685–0.987) | 0.972 (0.858–0.995) | 0.929 (0.685–0.987) | 0.972 (0.858–0.995) | 13 | 1 | 1 | 35 | |
All appendicular skeleton | ||||||||||||
Dupuis [30] | Test | Appendicular skeleton | NS | 0.926 (0.915–0.936) | 0.957 (0.940–0.969) | 0.912 (0.898–0.925) | 0.829 (0.803–0.852) | 0.979 (0.971–0.985) | NS | NS | NS | NS |
Author, year | Human/AI | Accuracy, % (95% CI) | Sensitivity, % (95% CI) | Specificity, % (95% CI) | TP | FP | FN | TN |
---|---|---|---|---|---|---|---|---|
England [31] | AI | 0.907 (0.843–0.951) | 0.909 (0.788–1.000) | 0.906 (0.844–0.958) | 87 | 9 | 3 | 30 |
PGY5 emergency medicine trainee (non-radiologist) | 0.915 (0.852–0.957) | 0.848 (0.681–0.949) | 0.938 (0.869–0.977) | 90 | 6 | 5 | 28 | |
Choi, [17] | AI (Geographical test set) | 0.895 (0.817–0.942) | 1.000 (0.852–1.000) | 0.861 (0.759–0.931) | 23 | 10 | 0 | 62 |
Summated score of three radiologists (2–7-year experience) from different institution to test dataset | 0.975 (0.950–0.988) | 0.957 (0.880–0.985) | 0.981 (0.953–0.993) | 66 | 4 | 3 | 212 | |
Lowest performing radiologist alone | NS (AUC 0.977 (0.924–0.997)) | 0.957 (0.781–0.999) | 0.972 (0.903–0.997) | NS | NS | NS | NS | |
Lowest performing radiologist with AI assistance | NS (AUC 0.993 (0.949–1.000)) | 1.000 (0.852–1.000) | 0.972 (0.903–0.997) | NS | NS | NS | NS | |
Zhang [35] | AI (Test set—data undefined) | 0.920 | 1.000 | 0.870 | NS | NS | NS | NS |
Human: paediatric musculoskeletal radiologist | 0.89 (0.782–0.949) | 1.000 (0.833–1.000) | 0.833 (0.681–0.921) | 19 | 6 | 0 | 30 |