Introduction
The work of the radiologist includes interpretation of images and communicating relevant findings through the radiology report. Traditionally, radiology reports have been free narratives with variations in structure and quality [
1,
2]. The form of the radiology report has been debated and structured reporting has been suggested as a method to improve quality; however, consensus regarding form and style has not been reached [
3‐
7].
Recently, contextual reporting has been suggested as an intermediate between structural reporting and free narrative reporting [
8]. Contextual reports are structured in a disease-specific way with findings reported in a checklist manner but they are less strict compared with structural reporting. Another method to potentially improve quality is the use of established visual rating scales (VRS). An example from the field of neuroradiology is VRS developed for the investigation of cognitive impairment, which are endorsed in clinical practice [
9‐
11].
Previous studies have shown that structural findings are underreported in the diagnostic work up of cognitive impairment [
12‐
14]. A recent European survey showed that VRS were used in 75% of responding centers but structural reporting was used in only 28% [
15]. In order to improve accuracy and clarity of radiology reports, our department introduced contextual reports as an endorsed routine. The purpose of this study is to investigate the effect on reporting of structural radiological findings after introducing a template with VRS in the primary care diagnostic work up of cognitive impairment.
Results
We identified 251 eligible subjects, ten subjects had cancelled examinations, four subjects had failed to perform the exam, and the referral was unclear for one subject. These were excluded and 236 subjects were included with 111 examined “before” and 125 “after” the introduction of contextual reporting. There were no significant differences between the groups with respect to prevalence of abnormal findings and gender. Subjective cognitive impairment was the reported symptom in 97% of all subjects (see Table
2 for demographic data). Evans’ index was only reported for one subject and was included in WLV.
Table 2
Data on subjects and prevalence of evaluated parameters for the two groups “before” and “after”
Subjects |
Number of subjects (n) | 111 | 125 | - |
Age in years (median, interquartile range) | 72 (7.8) | 73 (7.3) | 0.02* |
Females | 47% | 49% | 0.76 |
Clinical symptom (as reported in referrals) |
Subjective cognitive impairment | 95% | 98% | 0.19 |
Personality changes | 7% | 2% | 0.03 |
Confusion | 1% | 2% | 0.63 |
Prevalence (abnormal in the second reading) |
Abnormal MTA | 18% | 18% | 0.93 |
Abnormal WMC | 25% | 22% | 0.51 |
Abnormal GCA | 17% | 22% | 0.39 |
Abnormal WLV | 17% | 15% | 0.69 |
Intra-rater agreement was excellent for MTA κ = 0.82 95% CI (0.72 to 0.91), p < 0.001 and WLV κ = 0.87 95% CI (0.78 to 0.96), p < 0.001; substantial for WMC κ = 0.79 95% CI (0.70 to 0.88), p < 0.001; and moderate for GCA κ = 0.57 95% CI (0.44 to 0.71), p < 0.001. The highest inter-rater agreement for 100 randomly selected subjects showed substantial agreement for MTA κ = 0.73 95% CI (0.55 to 0.92), p < 0.001; excellent agreement for WMC κ = 0.81 95% CI (0.69 to 0.94), p < 0.001; and fair agreement for GCA κ = 0.44 95% CI (0.19 to 0.70), p < 0.001.
Data on grading of clinical reports and concordance with our second reading are summarized in Table
3. In total, MTA was reported in 54% of the reports. The corresponding number for WMC, GCA, and WLV was 78%, 69%, and 59% respectively. Where MTA was reported as moderate to severe (i.e., Torisson’s scale grades 2–3), 88% was correctly reported as abnormal compared with the second reading. Medial temporal lobe atrophy was reported as mild (i.e., grade 1), in 18% of the reports. The corresponding number in the second reading was 45%; when age correction was applied, 36% was normal and 9% abnormal of which 0% (
n = 0) was correctly reported as abnormal. Where WMC was reported as moderate to severe, 83% was correctly reported as abnormal compared with the second reading. White matter changes were reported as mild in 31% of the reports; the corresponding number in the second reading was 18%; when age correction was applied, 17% was normal and 1% abnormal of which 0% (
n = 0) was correctly reported as abnormal. Where GCA was reported as moderate to severe, 47% was correctly reported as abnormal, and for WLV, the figure was 38% compared with the second reading.
Table 3
Grading of clinical reports and concordance with second reading
MTA | | | | | | 18% |
“0” | 29% | 47% | -- | 47% | -- | |
“1” | 18% | 45%* | 9%1 | 36%1 | 0% | |
“2” | 5% | 6% | 6% | -- | 4% | |
“3” | 2% | 2% | 2% | -- | 2% | |
“NA” | 46% | -- | -- | -- | -- | |
WMC | | | | | | 23% |
“0” | 27% | 60% | -- | 60% | -- | |
“1” | 31% | 18% | 1%2 | 17%2 | 0% | |
“2” | 10% | 9% | 9% | -- | 7% | |
“3” | 10% | 13% | 13% | -- | 10% | |
“NA” | 22% | -- | -- | -- | -- | |
GCA | | | | | | 19% |
“0” | 31% | 27% | -- | 27% | -- | |
“1” | 30% | 54% | -- | 54% | -- | |
“2” | 7% | 18% | 18% | -- | 3% | |
“3” | < 1% | 1% | 1% | -- | < 1% | |
“NA” | 31% | -- | -- | -- | -- | |
WLV | | | | | | 16% |
“0” | 39% | 84% | -- | 84% | -- | |
“1”+“2”+“3”5 | 20% | 16% | 16% | -- | 8% | |
“NA” | 41% | -- | -- | -- | -- | |
Data on frequencies and compliance, including differences between the groups, are summarized in Table
4. Reporting of MTA, WMC, and GCA increased significantly. There was no significant change in the reporting of WLV. Altogether, the percentage of reports with all parameters mentioned increased from 6% (
n = 7) to 29% (
n = 36). Full compliance remained low; the percentage of reports in strict full compliance with the template increased from 2% (
n = 2) to 8% (
n = 10).
Table 4
Percentage of original reports mentioning the evaluated parameters outlined in the contextual reporting template and compliance with contextual reporting for the two groups “before” and “after”
Evaluated parameter | Percentage of original reports with parameters mentioned |
MTA | 29% | 76% | < 0.001 |
WMC | 69% | 86% | < 0.01 |
GCA | 54% | 82% | < 0.001 |
WLV | 55% | 62% | 0.25 |
| Percentage of original reports with all parameters mentioned |
All parameters reported | 6% | 29% | < 0.001 |
Results regarding TPR and TNR are summarized in Table
5. A significant increase in TPR was observed for MTA with an increase from 10 to 55%. There were no significant changes in TPR for the other parameters but an increase from 0 to 33% was observed for GCA and an increase from 37 to 58% was observed for WLV. There was high to almost perfect TNR with no significant changes for any parameter in the two groups.
Table 5
True positive rate and true negative rate for the evaluated parameters, expressed as percentages, of original reports in the two groups “before” and “after”
TPR % (95% CI) | 10 (1–32) | 68 (48–84) | 0 (0–18) | 37 (16–62) | 55 (32–76) | 78 (58–91) | 33 (17–54) | 58 (34–80) |
TNR % (95% CI) | 99 (94–100) | 99 (93–100) | 96 (89–99) | 88 (80–94) | 99 (95–100) | 93 (86–97) | 94 (87–98) | 82 (73–89) |
Discussion
In this retrospective, observational study, we evaluated compliance and compared TPR of radiology reports before and after the introduction of contextual reporting in the diagnostic work up of cognitive impairment. We found an increase in the reporting of MTA, GCA, and WMC and an increased TPR for MTA. Although an increase in the reporting of evaluated parameters was observed, full compliance with the template remained low (8%) and the percentage of reports where all parameters were mentioned only reached 29%. Due to small numbers, it is difficult to draw any definitive conclusions from the reports with full compliance or mentioning of all parameters why we chose to evaluate each parameter separately.
We had anticipated that full compliance would reach at least 50% why our results would seem disappointing. In a study by Powell et al, 9% compliance was observed when a structured template for assessing maxillofacial trauma was evaluated [
24]. This figure is close to our result but differences in methodology make further comparisons difficult. Another study by Larson et al showed that structured reporting could successfully be implemented if enforced by the department leadership [
4]. Since visual ratings are subjective, it has also been suggested that differences in structure of radiology reports may be explained by local traditions and, in the case of cognitive impairment, imaging has traditionally been used to exclude secondary causes to cognitive impairment [
25,
26]. Taking all of this into consideration, we believe our observed low compliance is similar to what have been previously reported and could be explained by adherence to local traditions. Also, the use of our template was not enforced by the department leadership.
Previous studies have shown that abnormal findings, in particular MTA, are underreported in radiology reports even when assessment is warranted [
12,
14]. Medial temporal lobe atrophy is an important structural finding in Alzheimer’s disease (AD) but it can also be found in other dementias [
27,
28]. Our results showed an increase in reporting and TPR for MTA. Moderate to severe MTA was correctly reported as abnormal in 88% but mild atrophy was underreported; also, when age correction was applied, there remained an underreporting of abnormal mild atrophy (i.e., MTA 2 in subjects < 75 years). The underreporting of mild MTA probably explains the observed low TPR (10% “before” and 55% “after”) for MTA and it cannot be excluded that a study population with a higher prevalence of moderate to severe MTA would have resulted in better TPR and compliance. In line with previous studies, excellent intra-rater and substantial inter-rater agreement was observed for MTA [
14,
29]. With respect to the clinical importance of MTA and previously observed underreporting, we believe our results regarding MTA have an important clinical impact [
12,
14].
The reporting of GCA increased significantly, although TPR remained low (33%). Where abnormal GCA was reported, it was erroneous in 53% which probably explains our observed low TPR. The GCA scale covers a larger brain region compared with the MTA scale. Although ratings are based on the highest grade of atrophy, the risk of potential underreporting cannot be eliminated since moderate parietal cortical atrophy and mild frontal cortical atrophy still could be interpreted as overall mild GCA (normal) by one rater and moderate GCA (abnormal) by another. This would probably also explain our observed levels of agreement.
There was an increase in the reporting of WMC and WLV, where the increase for WMC was significant, but changes in TPR were not significant. In many reports, the phrase “normal appearing cerebrospinal fluid spaces” (CSF spaces) was used. This resulted in difficulties in our grading of the reports since this could mean normal width of sulci (i.e., normal GCA) and normal WLV combined. We chose to interpret this as normal WLV. White matter changes are preferably assessed on magnetic resonance imaging (MRI), but for abnormal findings, NECT is considered sufficient [
9,
11]. In other words, when WMC is reported on NECT, it is most likely to be Fazekas grade 2 or 3. Our results showed that moderate to severe WMC was reported in 20% of the reports compared with 22% in the second reading while mild changes were overreported. In most reports, the phrases “white matter changes” or “focal parenchymal changes” were used to describe WMC which posed no difficulties in our grading why we do not believe this explains our results. Radiologists have been shown to be keener to report WMC which we believe is a more probable explanation to our results [
14]. Our results would suggest that contextual reporting only had a significant effect on the reporting of MTA but the increased reporting of the other parameters would suggest that discrepancies in reporting styles were reduced to some extent. However, the low compliance with our contextual template and our assumption that reports with no mentioning of the evaluated parameters were normal may hamper such conclusions.
There are limitations to this study: (i) The use of two different cohorts could result in a potential bias from cohort effects. Differences in prevalence of abnormal findings were not significant between the groups but the observed prevalence was probably lower than would be expected in a memory clinic population. Our study population was derived from a population with cognitive impairment where none was diagnosed with dementia since we believe the clinical benefit from using VRS would be greater in this group [
30,
31]. It cannot be excluded that an older study population or a memory clinic population would have yielded a higher prevalence of abnormal findings where a different compliance with our template cannot be excluded. (ii) Visual ratings are subjective and quantitative data such as volume segmentation would be preferable but are difficult to perform with NECT. We chose a gold standard based on high inter-rater agreement but this approach does not exclude a potential rater bias. (iii) The retrospective design hampers the possibility to obtain reliable data on potential effects of training, education, or individual experiences of using VRS among the neuroradiologists at our department. (iv) We have not compared our template with other structural reporting templates and we have not followed up on how the use of VRS affects the final diagnosis. (v) The use of the scale suggested by Torisson et al can be questioned. This is an attempt to grade the qualitative data in the radiology reports to make comparisons possible but the scale has not been tested for rater reliability, although it has been used in previous studies [
12,
14].
This study adds knowledge to how reporting frequency of radiology findings can be improved in the diagnostic work up of cognitive impairment. Our results suggest that there is a possibility to increase the overall reporting of structural findings but only the results for MTA were significant. In conclusion, this study suggests that contextual radiological assessment using VRS could increase the reporting frequency of radiology findings in the diagnostic work up of cognitive impairment, but compliance with templates may be difficult to endorse.
Compliance with ethical standards
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.