Timely identification of patients in need of MT due to trauma-induced haemorrhage is essential to improve survival [
13,
14]. Numerous scores have been developed and validated in the past [
7,
10]. Among those, scores including a mixture of clinical parameters, laboratory results, radiological findings and mathematical algorithms presented with the best results [
7]. However, the more sophisticated the score, the later an estimation for the potential need of MT can be made in trauma patients with potential traumatic haemorrhage. This is especially true for some relevant variables such as laboratory results and radiological findings, which are not available before the early phase after hospital admission.
As more complex scores have been described to be superior to more compact systems for the prediction of MT, more demanding scores, such as the Trauma-Associated Severe Haemorrhage (TASH) score and the Prince of Wales Hospital (PWH) score, have found wide distribution in clinical practice [
15]. Both scores include demographic data, physical variables, laboratory results, injury patterns and sonography. As the TASH presented with the highest area under the curve (AUC 0.889; CI 0.871–0.907) when compared to other established scores like the PWH (AUC 0.860; CI 0.839–0.881), Vandromme (AUC 0.840; CI 0.817–0863), Assessment of Blood Consumption (ABC) (AUC 0.763; CI 0.732–0.794) and the Schreiber and Larsen score (AUC 0.800; CI 0.773–0.828), it is widely accepted as the gold standard in the prediction of MT [
7].
The newly developed and easy applicable mTICCS, which is based on emergency room activation, blood pressure and the presence of severe injuries in different body regions, was proven to be a comparably useful tool to the established algorithms [
10]. For instance, the AUC of mTICCS (0.776; CI 0.736–0.812) was not significantly different from the AUC of the complex TASH (0.782; CI 0.743–0.819) or PWH score (0.648; CI 0.603–0.691) [
10]. Additionally, the present study now proves substantial inter-rater results for the mTICCS. These results were even more remarkable when the score was applied to residents. Thus, the results do not depend on work experience. Literature investigating the established MT scores with regard to their inter-rater reliability are non-existent. However, data from other scoring systems in severely injured patients present with markedly different results with regard to inter-rater reliability. Butcher et al. proved that defining polytrauma by individual subjective perception as well as using scores like ISS or AIS > 3 led to significant disagreement among raters from the same and different institutions [
16]. In contrast, using the simple ASA (American Society of Anaesthesiologists) physical classification system showed substantial agreement strength for the reliability of the ASA score among anaesthesiologists (specialists and residents) when evaluating orthopaedic trauma patients [
17]. With regard to the highly sophisticated TASH and PWH scores, civilian scenarios with severely injured patients or combat zones, there is an urgent need for more simple tools to stratify patient’s risk for MT [
18‐
20]. Although scores like the Larson, ABC and ET scores are less complex, they still use either laboratory (e.g. base deficit, haemoglobin) or other diagnostic (e.g. x-ray, FAST) variables. However, these variables are probably not available on the scene or hamper timely identification of patients at risk for MT [
21]. Even though the requested data might be available in a relatively short time after hospital admission, there will still be a loss of time during data collection and calculation of the scores. Thus, the applicability of these scores as “early” identification tools is questionable and probably explains why these scores are still not used routinely in clinical practice. Moreover, and despite being validated multiple times, studies regarding the inter-rater reliability of the aforementioned scores are not available. Obviously, some of the variables being used for complex scores such as the TASH or PWH score are objectifiable (i.e. sex, laboratory results) and therefore are not susceptible to incorrect scoring. This might explain the missing inter-rater reliability tests for these scores. However, there are still some variables, such as sonography and questions in regard to fracture stability, that are associated with the investigator’s experience and thus provide insecurity in scoring. For instance, focus-assessed sonography in trauma (FAST) is used for many scores aside of the complex ones (i.e. ABC, Emergency Room Transfusion Score (ETS), Traumatic Bleeding Severity Score (TBSS), Massive Transfusion Score (MTS)). However, limitations of FAST exist, especially for particular groups of patients such as children and those with a high injury severity [
22]. In this context, Becker et al. reported that the sensitivity and false-negative rate of FAST performed in blunt abdominal trauma patients with a high injury severity score (ISS > 25) are impaired compared to those in patients with an ISS < 25 [
23]. Thus, a lower accuracy of FAST due to a higher likelihood of overseen injuries was concluded in more severely injured patients [
23]. Accordingly, FAST may not correlate well with the need for an emergent operation [
24]. Furthermore, the quality of FAST diagnostics has clearly been shown to depend on the experience of the observer [
22,
25]. As a consequence, despite its undisputed value as an extremely useful diagnostic tool in the treatment of trauma patients, the inclusion of FAST as a variable into a scoring system for MT prediction should result in the assessment of the inter-rater reliability of the score.
In addition, the grading of fracture stability plays a crucial role in some scores. Aside from the TASH score, the PWH, the ETS and the TBSS also rely on pelvic fractures as a bleeding source. While an open or dislocated femur fracture, as also used to calculate the TASH score, can be diagnosed easily either by clinical or radiological examination, diagnosis of pelvic stability is more difficult. Against this background, Shlamovitz et al. proved that the presence of either a pelvic deformity or an unstable pelvic ring by physical examination has poor sensitivity for detection of mechanically unstable pelvic fractures in blunt trauma patients [
26]. Moreover, Berger-Groch et al. investigated several pelvic scoring systems and found that all classifications reach their maximum reliability with advanced expertise in the surgery of pelvic fractures [
27]. Thus, aside FAST another relevant variable used in many established scores offers some susceptibility to grading errors when the diagnosis is not made by experienced medical staff.