Background
Polytrauma continues to be one of the leading causes of mortality, especially for persons under the age of 45, and has socioeconomic implications despite modern developments in acute medical care and prevention [
1,
2]. Accurate identification of these patients and consistent grading of the respective injury patterns play a pivotal role in hospital quality benchmarking, allocation of resources, and data comparability between different trauma centers and countries [
3‐
6].
The assessment of injury severity of polytraumatized patients is mainly based on the use of standardized anatomical-based coding employing the Abbreviated Injury Scale (AIS) and Injury Severity Score (ISS), as well as the New Injury Severity Score (NISS) [
7‐
9]. Following the introduction of its first version in 1969 [
10], the AIS has gone through validation processes with multiple updates, the latest in 2015. Since the early 1990s, the AIS has become an integral part of the anatomical definitions of polytrauma [
11‐
16], which were established as an attempt to create more specificity than the older descriptions by Border et al. (1975), Faist et al. (1983), and Tscherne et al. (1986) [
17‐
19]. While being primarily created for communication between medical and nonclinical investigators, the AIS and consequently the ISS are currently considered the ‘gold standard’ of injury severity assessment in trauma registries worldwide [
20‐
25]. Nevertheless, issues concerning its high interobserver variability and subjectivity were recognized early on [
26‐
28].
Injury assessment according to the AIS is being taught today within the scope of respective courses for the purpose of providing dedicated coding specialists. Discussing their results in the context of current literature, Maduz et al. suggested there was a negative influence of the coder’s medical experience on the accurate assessment of the injury severity through the AIS grading system [
22]. This assumption contradicts the observations of the primary analysis on the subject from MacKenzie et al., whose research supported the hypothesis that medical personnel fare better than non-medical technicians [
27]. The chronological gap between these statements could imply the confounding role of the trauma system evolution or the newer versions of the AIS grading system. To our experience, coding is often not conducted by specially trained medical personnel and clinicians with varying coding experience, and it is primarily based on evaluating the patient charts after their discharge. This raises the question of how accurately clinicians nowadays, who did not participate in respective educational programs but actively take part in the everyday medical care of injured individuals, evaluate injury severity in the context of this coding system.
Therefore, the aim of the current study is to measure the interobserver variability in the assessment of injury severity among medical clinicians interested in trauma management from around the world. The influence of the demographic backgrounds of the surveyed clinicians is also being investigated, with special focus on the different injured anatomical regions. We hypothesized that injury severity assessment is highly variable between observers, with values depending on the respective anatomical injury pattern.
Discussion
The accurate recognition and evaluation of polytraumatized patients is a main prerequisite of current traumatological research. Therefore, grading systems are required with a high level of agreement between experts in the field [
22,
36,
37]. The presented study confirmed our primary hypothesis and revealed the following results:
1.
The assessment of the injury severity of polytraumatized patients among surgical experts varied widely, and;
2.
the variation depended considerably on the various injured anatomical regions (fair-to-good interobserver agreement: anatomical region of thorax (incl. thoracic spine) and extremity (incl. osseous pelvis/shoulder girdle), poor interobserver agreement: anatomical regions of head and neck, face, abdomen (incl. visceral pelvis/lumbar spine), and external (incl. skin/soft tissues)). This could also imply the influencing role of the coder’s medical field of expertise.
The highly variable assessment of injury severity among surgical experts delineates the possible influence of respective individual traits as well as the complexity of the current coding system. Discrepancies in injury coding, indicating over- or underestimation of the injury severity, between clinicians can result in relevant differences in therapeutic decisions over the treatment course of polytraumatized patients. Furthermore, direct comparability of research data from different institutions is restricted when it comes to developing novel polytrauma management systems. Therefore, specially trained coding specialists are still needed to ensure the reliability of quality hospital benchmarking, accurate documentation in the various polytrauma registries, and consequently, the comparability of studies in this field. Our results confirmed the variability issues of the AIS and injury severity scoring reported in previous studies conducted by McKenzie et al. and Zoltie et al. [
26,
27,
38]. The Zoltie et al. study found that for 16 patients assessed by 15 observers, there was a 28% probability of 2 observers agreeing on the same score [
38], almost reflecting the results of our study (α
ISS: 0.33, α
NISS: 0.23). Maduz et al. regarded the inconsistent ISS-AIS cut-off values as a pivotal influential factor in accurate polytrauma identification, despite reporting excellent interrater agreement for the AIS and ISS utilizing the intraclass correlation coefficient (ICC) on three specially trained observers [
22]. On the contrary, Ringdal et al. questioned the reliability of the AIS-based ISS-NISS [
30]. In that study, 10 Norwegian AAAM-certified trauma registry AIS coders evaluated 50 cases of polytraumatized patients. ICC was again used to measure the interobserver reliability, resulting in fair interrater agreement for both the ISS and NISS (ICC: 0.51). The observer’s experience in coding did not seem to significantly influence the results. While the ISS anatomical regions were used for descriptive statistics, there was no assessment of the respective interobserver variability or analysis of the observers’ demographic backgrounds.
Investigating the AIS coding in the Queensland Trauma Registry, Neale et al. [
31], despite recording a high variability in AIS estimates (39% probability of agreement between two observers), found excellent interrater reliability for the ISS (ICC: 0.9), which disagrees with the results of our study. For the purposes of the Neale et al. study, six specially educated coders assessed 120 injury cases. The high interobserver variability of the AIS-based definitions of a polytraumatized patient was confirmed by a recent study by Pothmann et al. [
39]. In their study, two observer groups coded a total of 187 polytrauma cases. One observer group consisted of a doctoral student, while the coding for the second observer group was conducted by four interns with at least 3 years of clinical experience. The dependence of the interobserver variability on anatomical region or on the demographic characteristics of the observers was not a subject of investigation in this study. Furthermore, the focus was mainly on the different cut-off values of the various polytrauma definitions, therefore only indirectly assessing the interrater variability of the current injury severity coding systems. Discussing the results, Pothmann et al. advocated the comparatively greater interobserver agreement on polytrauma identification based on MAIS, which partly confirms the respective results on pediatric trauma from Brown et al. [
39,
40]. This could also imply the influence of the injured anatomical regions on the measured interobserver variability.
While most of the interobserver studies on this subject to date have mainly attempted to define polytrauma, there has been little evidence found concerning the direct interobserver variability of injury severity assessments depending on different anatomical regions or injury patterns. There is also scarce information regarding the influence of the different demographic characteristics of the surveyed observers. The current study attempts to focus on these issues by including more participants than similar studies and supports the argument that there is no standardized perception of trauma magnitude among surgical specialists from around the world.
The scientific literature provides limited analyses of the effect of raters’ experiences or training, but there is a definite pattern to be recognized. Waydhas et al. observed a significant deviation of measured trauma scores based on different professions and education [
41]. Clinicians fared slightly better than non-clinicians in the study of MacKenzie et al. (1985), and Josse et al. supported the role of training in improving agreement in injury coding [
27,
42].
The high overall interobserver variability among coders/specialists who were not specially trained supports the belief that specific education is necessary to improve the quality of injury severity assessment in polytraumas. Moreover, we observed distinct differences based on the injured anatomical region and the main specialty field of the participants. The measured interobserver variability was lower in anatomical regions with higher incidences of involvement in polytraumatized patients, such as the thorax and the extremities. In this context, the influencing role of familiarity with respective injury patterns as well as the differing complexity of assessment depending on the anatomical region could be implied. The lower interrater reliability in the ISS regions of the head and face, despite their high incidence in severe trauma, could be explained by the lack of neurosurgeons and maxillofacial surgeons in the surveyed population. At the same time, general surgeons showed higher interobserver agreement on assessing abdominal injuries, while orthopedic traumatologists could reach fair-to-good agreement on extremity injury patterns, further suggesting the influence of the respective working field. Furthermore, while the injuries of the thorax or the extremities show a repetitive simple pattern, assessment of head injuries underlies severity variation that is not always apparent.
Limitations and strengths
Employing a questionnaire in paper form with considerable time needed for completion and the multiplicity of its requirements led to a restricted number of participants, different medical specialties and presented cases (10 polytrauma patients) with possible influence of the respective variation on the measured interobserver variability. Another study limitation was the assessment of the injuries based on written descriptions or small-sized depictions of conventional X-ray and CT examinations, rather than on modern radiography image processing. Manual or electronic tools as a reference guide for AIS coding were not provided. Studies with more simplified layouts based on online digital formats could be the solution to these limitations, enabling the inclusion of more participants and expanding their demographic or occupational backgrounds.
Nevertheless, our study also demonstrated certain strengths. We included 54 participants, thus forming an international cohort of surgical experts with various demographic characteristics. Utilizing Krippendorff’s alpha (α) reliability coefficient, we were able to analyze the interobserver variability results according to patients’ different injured anatomical regions or the demographic backgrounds of the observers in order to understand the factors influencing the injury assessment. The questionnaire was processed under defined conditions (Cooperative Course: Polytrauma Management Beyond ATLS).
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.