Key points
-
The MRI-radiomics in predicting neoadjuvant chemotherapy response is supported by weak evidence.
-
The quality of osteosarcoma radiomics studies has been improved recent two years.
-
CLAIM can adapt the increasing trend of deep learning application in radiomics.
Introduction
Methods
Protocol and registration
Literature search and selection
Data extraction and quality assessment
Data synthesis and analysis
Results
Literature search
Study characteristics
Study Characteristics | Data |
---|---|
Sample size, mean ± standard deviation, median (range) | 86.6 ± 45.8, 81 (17–191) |
Journal type, n (%) | N = 29 |
Imaging | 13 (44.8) |
Non-imaging | 16 (55.2) |
First authorship, n (%) | N = 29 |
Radiologist | 19 (65.5) |
Non-radiologist | 10 (34.5) |
Imaging modality, n (%) | N = 29 |
CT | 9 (31.0) |
MRI | 14 (48.3) |
PET | 6 (20.7) |
Biomarker, n (%) | N = 33 |
Diagnostic | 3 (9.1) |
Predictive | 18 (54.5) |
Prognostic | 12 (36.4) |
Model type, n (%) | N = 33 |
Type 1a: Developed model validated with exactly the same data | 8 (24.2) |
Type 1b: Developed model validated with resampling data | 8 (24.2) |
Type 2a: Developed model validated with randomly splitting data | 12 (36.4) |
Type 2b: Developed model validated with non-randomly splitting data | 1 (3.0) |
Type 3: Developed model validated with separate data | 4 (12.1) |
Type 4: Validation only | 0 (0.0) |
Study quality
16 items according to 6 key domains | Range | Median (range) | Percentage of ideal score, n (%) | Adherence rate, n (%) |
---|---|---|---|---|
Total 16 items | − 8–36 | 10 (3–18) | 305/1044 (29.2) | 207/464 (44.6) |
Domain 1: protocol quality and stability in image and segmentation | 0–5 | 2 (0–3) | 50/145 (34.5) | 50/116 (43.1) |
Protocol quality | 0–2 | 1 (0–1) | 22/58 (37.9) | 22/29 (75.9) |
Multiple segmentations | 0–1 | 1 (0–1) | 20/29 (69.0) | 20/29 (69.0) |
Test–retest | 0–1 | 0 (0–1) | 8/29 (27.6) | 8/29 (27.6) |
Phantom study | 0–1 | 0 (0–0) | 0/29 (0.0) | 0/29 (0.0) |
Domain 2: feature selection and validation | − 8 to 8 | 5 (− 8 to 8) | 94/232 (40.5) | 49/58 (84.5) |
Feature reduction or adjustment of multiple testing | − 3 to 3 | 3 (3–3) | 69/87 (79.3) | 26/29 (89.7) |
Validation | − 5 to 5 | 2 (− 5 to 5) | 25/145 (17.2) | 23/29 (79.3) |
Domain 3: biologic/clinical validation and utility | 0–6 | 2 (0–5) | 69/174 (39.7) | 61/116 (52.6) |
Non-radiomics features | 0–1 | 1 (0–1) | 18/29 (62.1) | 18/29 (62.1) |
Biologic correlations | 0–1 | 1 (0–1) | 27/29 (93.1) | 27/29 (93.1) |
Comparison to “gold standard” | 0–2 | 0 (0 to 2) | 16/58 (27.6) | 8/29 (27.6) |
Potential clinical utility | 0–2 | 0 (0–1) | 8/58 (13.8) | 8/29 (27.6) |
Domain 4: model performance index | 0 to 5 | 2 (1–4) | 61/145 (42.1) | 35/87 (40.2) |
Cut-off analysis | 0–1 | 0 (0–0) | 0/29 (0.0) | 0/29 (0.0) |
Discrimination statistics | 0–2 | 2 (1–2) | 49/58 (84.5) | 29/29 (100.0) |
Calibration statistics | 0–2 | 0 (0–2) | 12/58 (20.7) | 6/29 (20.7) |
Domain 5: high level of evidence | 0–8 | 0 (0–7) | 21/232 (9.1) | 3/58 (5.2) |
Prospective study | 0–7 | 0 (0–7) | 21/203 (10.3) | 3/29 (10.3) |
Cost-effectiveness analysis | 0–1 | 0 (0–0) | 0/29 (0.0) | 0.29 (0.0) |
Domain 6: open science and data | 0–4 | 0 (0–2) | 10/116 (8.6) | 9/29 (31.0) |
37 Selected items in 22 criteria according to 7 sections (N = 29) | Study, n (%) |
---|---|
Overall (excluding items 5c, 11, 14b, 10c, 10e, 12, 13, 17, and 19a) | 481/812 (59.2) |
Section 1: Title and Abstract | 18/58 (31.0) |
1. Title—identify developing/validating a model, target population, and the outcome | 2/29 (6.9) |
2. Abstract—provide a summary of objectives, study design, setting, participants, sample size, predictors, outcome, statistical analysis, results, and conclusions | 16/29 (55.2) |
Section 2: Introduction | 36/58 (62.1) |
3a. Background—Explain the medical context and rationale for developing/validating the model | 29/29 (100.0) |
3b. Objective—Specify the objectives, including whether the study describes the development/validation of the model or both | 7/29 (24.1) |
Section 3: Methods | 218/277 (57.8) |
4a. Source of data—describe the study design or source of data (randomized trial, cohort, or registry data) | 29/29 (100.0) |
4b. Source of data—specify the key dates | 29/29 (100.0) |
5a. Participants—specify key elements of the study setting including number and location of centers | 29/29 (100.0) |
5b. Participants—describe eligibility criteria for participants (inclusion and exclusion criteria) | 22/29 (75.9) |
5c. Participants—give details of treatment received, if relevant (N = 25) | 16/25 (64.0) |
6a. Outcome—clearly define the outcome, including how and when assessed | 27/29 (93.1) |
6b. Outcome—report any actions to blind assessment of the outcome | 3/29 (10.3) |
7a. Predictors—clearly define all predictors, including how and when assessed | 10/29 (34.5) |
7b. Predictors—report any actions to blind assessment of predictors for the outcome and other predictors | 4/29 (13.8) |
8. Sample size—explain how the study size was arrived at | 3/29 (10.3) |
9. Missing data—describe how missing data were handled with details of any imputation method | 6/29 (20.7) |
10a. Statistical analysis methods—describe how predictors were handled | 29/29 (100.0) |
10b. Statistical analysis methods—specify type of model, all model-building procedures (any predictor selection), and method for internal validation | 21/29 (72.4) |
10d. Statistical analysis methods—specify all measures used to assess model performance and if relevant, to compare multiple models (discrimination and calibration) | 6/29 (20.7) |
11. Risk groups—provide details on how risk groups were created, if done (N = 0) | n/a |
Section 4: Results | 117/174 (67.2) |
13a. Participants—describe the flow of participants, including the number of participants with and without the outcome. A diagram may be helpful | 16/29 (55.2) |
13b. Participants—describe the characteristics of the participants, including the number of participants with missing data for predictors and outcome | 26/29 (89.7) |
14a. Model development—specify the number of participants and outcome events in each analysis | 23/29 (79.3) |
14b. Model development—report the unadjusted association between each candidate predictor and outcome, if done (N = 5) | 4/5 (80.0) |
15a. Model specification—present the full prediction model to allow predictions for individuals (regression coefficients, intercept) | 21/29 (72.4) |
15b. Model specification—explain how to the use the prediction model (nomogram, calculator, etc.) | 11/29 (37.9) |
16. Model performance—report performance measures (with confidence intervals) for the prediction model | 20/29 (69.0) |
Section 5: Discussion | 86/87 (98.9) |
18. Limitations—Discuss any limitations of the study | 28/29 (96.6) |
19b. Interpretation—Give an overall interpretation of the results | 29/29 (100.0) |
20. Implications—Discuss the potential clinical use of the model and implications for future research | 29/29 (100.0) |
Section 6: Other information | 6/58 (10.3) |
21. Supplementary information—provide information about the availability of supplementary resources, such as study | 0/29 (0.0) |
22. Funding—give the source of funding and the role of the funders for the present study | 6/29 (20.7)) |
Section 7: Validation for Model type 2a, 2b, 3, and 4 (N = 16) | 32/64 (50.0) |
10c. Statistical analysis methods—describe how the predictions were calculated | 15/16 (93.8) |
10e. Statistical analysis methods—describe any model updating (recalibration), if done (N = 0) | n/a |
12. Development versus validation—Identify any differences from the development data in setting, eligibility criteria, outcome, and predictors | 10/16 (62.5) |
13c. Participants (for validation)—show a comparison with the development data of the distribution of important variables | 2/16 (12.5) |
17. Model updating—report the results from any model updating, if done (N = 0) | n/a |
19a. Interpretation (for validation)—discuss the results with reference to performance in the development data and any other validation data | 5/16 (31.3) |
CLAIM items (N = 29) | Study, n (%) |
---|---|
Overall (excluding item 27) | 961/1508 (63.7) |
Section 1: Title and Abstract | 53/58 (91.4) |
1. Title or abstract—Identification as a study of AI methodology | 29/29 (100.0) |
2. Abstract—Structured summary of study design, methods, results, and conclusions | 24/29 (82.8) |
Section 2: Introduction | 55/87 (63.2) |
3. Background—scientific and clinical background, including the intended use and clinical role of the AI approach | 29/29 (100.0) |
4a. Study objective | 22/29 (75.9) |
4b. Study hypothesis | 4/29 (13.8) |
Section 3: Methods | 700/1044 (67.0) |
5. Study design—Prospective or retrospective study | 29/29 (100.0) |
6. Study design—Study goal, such as model creation, exploratory study, feasibility study, non-inferiority trial | 29/29 (100.0) |
7a. Data—Data source | 29/29 (100.0) |
7b. Data—Data collection institutions | 29/29 (100.0) |
7c. Data—Imaging equipment vendors | 25/29 (86.2) |
7d. Data—Image acquisition parameters | 22/29 (75.9) |
7e. Data—Institutional review board approval | 28/29 (96.6) |
7f. Data—Participant consent | 24/29 (82.8) |
8. Data—Eligibility criteria | 22/29 (75.9) |
9. Data—Data pre-processing steps | 20/29 (69.0) |
10. Data—Selection of data subsets (segmentation of ROI in radiomics studies) | 26/29 (89.7) |
11. Data—Definitions of data elements, with references to Common Data Elements | 29/29 (100.0) |
12, Data—De-identification methods | 3/29 (10.3) |
13. Data—How missing data were handled | 6/29 (20.7) |
14. Ground truth—Definition of ground truth reference standard, in sufficient detail to allow replication | 27/29 (93.1) |
15a. Ground truth—Rationale for choosing the reference standard (if alternatives exist) | 0/29 (0.0) |
15b. Ground truth—Definitive ground truth | 29/29 (100.0) |
16. Ground truth—Manual image annotation | 17/29 (586) |
17. Ground truth—Image annotation tools and software | 10/29 (34.5) |
18. Ground truth—Measurement of inter- and intra-rater variability; methods to mitigate variability and/or resolve discrepancies | 9/29 (31.0) |
19a. Data Partitions—Intended sample size and how it was determined | 29/29 (100.0) |
19b. Data Partitions—Provided power calculation | 4/29 (13.8) |
19c. Data Partitions—Distinct study participants | 23/29 (79.3) |
20. Data Partitions—How data were assigned to partitions; specify proportions | 22/29 (75.9) |
21. Data Partitions—Level at which partitions are disjoint (e.g., image, study, patient, institution) | 22/29 (75.9) |
22a. Model—Provided reproducible model description | 21/29 (72.4) |
22b. Model—Provided source code | 0/29 (0.0) |
23. Model—Software libraries, frameworks, and packages | 20/29 (69.0) |
24. Model—Initialization of model parameters (e.g., randomization, transfer learning) | 23/29 (79.3) |
25. Training—Details of training approach, including data augmentation, hyperparameters, number of models trained | 16/29 (55.2) |
26. Training—Method of selecting the final model | 21/29 (72.4) |
27. Training—Ensembling techniques, if applicable (N = 14) | 8/14 (57.1) |
28. Evaluation—Metrics of model performance | 29/29 (100.0) |
29. Evaluation—Statistical measures of significance and uncertainty (e.g., confidence intervals) | 20/29 (69.0) |
30. Evaluation—Robustness or sensitivity analysis | 10/29 (34.5) |
31. Evaluation—Methods for explainability or interpretability (e.g., saliency maps), and how they were validated | 11/29 (37.9) |
32. Evaluation—Validation or testing on external data | 16/29 (55.2) |
Section 4: Results | 90/174 (51.7) |
33. Data—Flow of participants or cases, using a diagram to indicate inclusion and exclusion | 16/29 (55.2) |
34. Data—Demographic and clinical characteristics of cases in each partition | 25/29 (86.2) |
35a. Model performance—Test performance | 16/29 (55.2) |
35b. Model performance—Benchmark of performance | 8/29 (27.6) |
36. Model performance—Estimates of diagnostic accuracy and their precision (such as 95% confidence intervals) | 20/29 (69.0) |
37. Model performance—Failure analysis of incorrectly classified cases | 5/29 (17.2) |
Section 5: Discussion | 57/58 (98.3) |
38. Study limitations, including potential bias, statistical uncertainty, and generalizability | 28/29 (96.6) |
39. Implications for practice, including the intended use and/or clinical role | 29/29 (100.0) |
Section 6: Other information | 6/87 (6.9) |
40. Registration number and name of registry | 0/29 (0.0) |
41. Where the full study protocol can be accessed | 0/29 (0.0) |
42. Sources of funding and other support; role of funders | 6/29 (20.7) |
Meta-analysis
Clinical question | MRI-driven radiomics prediction model for NAC response in osteosarcoma patients |
---|---|
Number of studies | 4 |
Good responder/sample size | 44/115 |
Pooled analysis | |
DOR (95%CI) | 28.83 (10.27–80.95) |
p value for DOR | p < 0.001 |
Sensitivity (95% CI) | 0.84 (0.70–0.92) |
Specificity (95% CI) | 0.85 (0.74–0.91) |
PLR (95% CI) | 5.43 (3.11–9.49) |
NLR (95% CI) | 0.19 (0.09–0.37) |
AUC (95% CI) | 0.91 (0.88–0.93) |
Heterogeneity | |
Higgins I2 test | I2 = 42.04% |
Cochran’s Q test | Q = 5.18, p = 0.160 |
Publication bias | |
Egger’s test | p = 0.035 |
Begg’s test | p = 0.089 |
Deeks’ test | p = 0.069 |
Trim and fill method | |
Number of missing studies | 2 |
Adjusted DOR (95%CI) | 20.53 (7.80–54.06) |
p value for adjusted DOR | p < 0.001 |
Level of Evidence | Weak |