Original article
RADPEER Scoring White Paper

https://doi.org/10.1016/j.jacr.2008.06.011Get rights and content

The ACR's RADPEER program began in 2002; the electronic version, e-RADPEER, was offered in 2005. To date, more than 10,000 radiologists and more than 800 groups are participating in the program. Since the inception of RADPEER, there have been continuing discussions regarding a number of issues, including the scoring system, the subspecialty-specific subcategorization of data collected for each imaging modality, and the validation of interfacility scoring consistency. This white paper reviews the task force discussions, the literature review, and the new recommended scoring process and lexicon for RADPEER.

Introduction

The ACR established a task force on patient safety in 2000, in response to the 1999 Institute of Medicine report To Err Is Human [1], which estimated that as many as 98,000 people die in hospitals as a result of preventable medical errors. Although medical imaging was not cited as an area of practice with high error rates, the ACR's task force established several committees to address patient safety issues. One of the committees in this task force addressed model peer review and self-evaluation. That committee developed the RADPEER program, a radiology peer-review process, and conducted a pilot of the program at 14 sites in 2001 and 2002. After the pilot study, the program was offered to ACR members in 2002.

RADPEER was designed to be a simple, cost-effective process that allows peer review to be performed during the routine interpretation of current images. If prior images and reports are available at the time a new study is being interpreted, these prior studies and the accuracy of their interpretation would typically be evaluated at the time the radiologist interprets the current study. In addition, at the time of the interpretation of the current study, the radiologist may have additional information that is helpful in assessing the interpretation of the prior study. This may include the progression or regression of findings on the current imaging study or additional history, including the findings of intervening nonimaging studies or procedures. The process requires no additional interpretive work on the part of radiologists beyond what already is currently being done. RADPEER simply creates a system that allows “old images” and “old interpretations” to be collected and structured in a reviewable format. The accuracy of prior reports is scored by the current interpreter of the new study using a standardized, 4-point rating scale (Table 1).

Although this scoring system has worked well for the past 5 years, there has been continued confusion over the meaning of some categories. Scores 1 and 4 are easy to understand. However, score 3 does not mention misinterpretation or disagreement and could potentially be used in a situation in which an image was correctly interpreted but the reviewer merely felt that it was an easy diagnosis. Likely the most confusing is score 2, “difficult diagnosis, not ordinarily expected to be made.” It is unclear whether there is an actual disagreement with the original interpretation or if the score is being used because it was a great pickup. Scores of 1 and 2 require no action, but scores of 3 and 4 require internal review by the local peer-review committee to validate or change, if necessary, the original RADPEER score.

Each institution (radiology group) is assigned a unique identifier number. To maintain confidentiality, facilities assign each physician a numeric identifier (such as 101) to use when information is submitted to the ACR. The actual names of the participating radiologists are not provided to the ACR.

RADPEER scoring was originally performed using machine readable cards; in 2005, a Web-based program, e-RADPEER, was established. Completed cards or electronic scores are submitted to the ACR, and reports are generated that provide

  • summary statistics and comparisons for each radiologist by modality,

  • summary data for each facility by modality, and

  • data summed across all participating facilities.

The reports should demonstrate trends that radiologists may use to focus their continuing medical education activities. Efforts to optimize interpretive skills should result in improvements in patient care.

The original model peer review committee members reviewed several documents, including examples of scoring from the literature [2, 3] (W. Thorwarth, personal communication) and samples submitted from committee members' own practices. Everyone agreed that for any program to be effective and widely accepted, it needs to be simple and user friendly. The committee members reviewed several examples of scoring and eventually decided on the 4-point system shown in Table 1. Although there was discussion regarding the inclusion of “clinically significant” in the scoring language, the committee steered away from this because of the difficulty in tracking a case for evidence of clinical significance and outcome. The committee also discussed the categories selected and whether modality or body system was preferable. Because the original card-based system used in RADPEER required radiologists to manually enter data onto cards, the need to keep the categories and scoring simple was emphasized.

RADPEER participation has increased every year, with substantial growth in 2007 after the ACR's mandate that all sites applying for any of the voluntary accreditation programs (computed tomography, magnetic resonance, ultrasound, positron emission tomography, nuclear medicine, and breast ultrasound) have evidence of a physician peer-review program, either RADPEER or their own internal programs. The number of participating radiologists has grown to more than 10,000.

The summarized RADPEER data collected through December 2007 are shown in Table 2. These data raise important questions. Fewer than 0.5% of the scores are 3 or 4. Does this reflect the quality of the interpretative skills of radiologists or a reluctance to assign less than perfect scores to colleagues? However, the RADPEER data are similar to those reported in the literature. For example, Soffa et al [4] found a disagreement rate of 3.48%. Combining RADPEER scores of 2, 3, and 4 gives a total disagreement rate of 2.91%. If the scores are not a true reflection of individual radiologists' interpretative skills, does the RADPEER process serve as a tool for improving patient safety or continuous quality improvement?

Since the inception of RADPEER, there have been continuing discussions regarding a number of issues, including the scoring system, the subspecialty-specific subcategorization of data collected for each imaging modality, and the validation of interfacility scoring consistency. In addition, there has been controversy regarding the inclusion of the clinical significance for scores of 2, 3, and 4. When RADPEER was originally developed, the committee members felt that adding clinical significance would require follow-up that either could not be done or would place an additional burden on radiology resources.

Section snippets

Task Force on RADPEER Scoring

Because of the issues regarding the RADPEER scoring process, a task force was formed to review the literature and various scoring methods to determine if a change would be warranted. The task force met on September 15, 2007, and consisted of members of the RADPEER committee, representatives from ACR leadership, and a radiology resident. The task force members reviewed the current language, the literature, and several proposed changes to the current scoring system.

There were several suggestions

Conclusion

In summary, the task force is proposing a scoring system that will build on the current system, maintaining the current system of numbers for scoring but making the categories clearer. In addition, radiologists would have the option to give their opinions regarding the clinical significance of discrepancies in interpretation, more in keeping with other peer-review methods described in the literature. The task force members all strongly agreed that better explanation of the scoring, with

References (8)

There are more references available in the full text version of this article.

Cited by (0)

View full text