Introduction
The introduction of trauma teams has led to improved management and outcomes of severely injured patients [
1‐
3]. A trauma team is a multidisciplinary group of health-care workers who collectively work together on the initial assessment and treatment of severely injured patients [
4]. In this context, optimal technical performance of interventions is emphasized in resuscitation guidelines [
5]. However, coordinated performance of such interventions within trauma teams requires more than mastering technical skills. Non-technical skills such as task management, leadership, situational awareness, communication and decision-making could be defined as cognitive, behavioral and social skills that contribute to safe and efficient team performance [
6‐
10].
As the added value of non-technical skill training on patient safety, process efficiency and medical errors is shown by a growing number of studies [
6‐
17], the issue of assessment becomes increasingly relevant. Therefore, there is a demand for a simple, validated and reliable assessment tool to lower the threshold for trauma centers to incorporate such assessments in their quality audits.
The T-NOTECHS is a tool developed to assess non-technical skills of the trauma team during trauma resuscitation [
18]. The T-NOTECHS, stands for Trauma NOn-TECHnical Skills and is based on the NOTECHS, which was initially used to assess non-technical skills in aviation [
19] and later on adapted and applied to assess non-technical skill performance of surgical teams [
20]. As described by Steinemann et al. [
18], the T-NOTECHS was developed by a panel of trauma practitioners composed of two trauma surgeons, one trauma/medical intensivist, and two critical care nurses. The T-NOTECHS consists of five behavioral domains: leadership, cooperation and resource management, communication and interaction, assessment and decision making, and situation awareness/coping with stress [
18].
The T-NOTECHS is, to our opinion, a simple and validated instrument, but the reliability as found by Steinemann et al. [
18] was low (ICC 0.48). An ICC of 0.48 means that 48% of the observed variance in T-NOTECHS scores is due to systematic differences compared to the total variance in achievement scores [
21]. These values are especially low when aiming to assess the impact of training on non-technical skills over time.
To our knowledge, the reliability of the T-NOTECHS has only been tested during actual resuscitations by real-time observers and not by video analysis [
18]. Video recordings particularly provide an indisputable, unbiased and accurate documentation of complex events and could therefore improve the reliability of the T-NOTECHS. Furthermore, video allows to assess the same resuscitation by multiple assessors, without interfering with the resuscitation process. In this study, the primary aim was to assess the reliability of the T-NOTECHS tool by assessing non-technical skills of trauma team with video analysis during actual trauma resuscitation. Secondarily, we investigated to what extent reliability increased in case T-NOTECHS was assessed by three assessors (average ICC) instead of one (individual ICC).
Discussion
Our most important finding is that assessment of non-technical skills of the trauma team in real trauma resuscitation using the T-NOTECHS is reliable using video analysis. We found an excellent reliability for the overall T-NOTECHS score. Our second most important finding is that the T-NOTECHS is even more reliable when scores are demonstrated as the mean of three assessors, while all five individual domains instead of two of the T-NOTECHS achieved the highest reliability score. We hope that our research will be helpful in solving the difficulty of measuring non-technical skills during trauma resuscitation. The most important implication of the excellent reliability of the T-NOTECHS tool using video analysis is the possibility to assess the development of non-technical skills over time.
We found a much higher ICC for T-NOTECHS scores than reported by Steinemann et al. [
18]. We found an ICC of 0.94 and 0.84, respectively, when measured as the mean of three assessors or a single assessor using video analysis, while in their study an ICC of 0.48 was found for assessment of actual resuscitations by live observers. A possible explanation could be that video analysis instead of live observation may have a positive influence on the reliability of the T-NOTECHS. This suggestion is further supported by results of T-NOTECHS reliability for simulated resuscitation in their study. They found higher T-NOTECHS values using video analysis compared to assessment by live observers (ICC0.44 vs ICC 0.71). In their study, in contrast to this study, no video analysis was used for actual trauma resuscitations, because of hospital policies. Another explanation that our ICC was higher than the study of Steinmann et al. [
18] could be a result of our training and experience in trauma resuscitation assessment of the assessors prior to the start of the study.
Overall, other variants of the NOTECHS measuring teamwork during surgery have shown to be reliable. Nevertheless, the results of previous studies investigating the reliability of the NOTECHS are not comparable to our study in exact terms, while different study designs, populations and statistics were used [
18,
28,
29]. In the study of Sevdalis et al. [
20], the NOTECHS was used by a psychiatrist who observed and assessed non-technical skills among surgical teams in a simulated setting. In this study, the reliability was calculated using Cronbach's alpha (
α) internal consistency coefficients, which provide the same values as a two-way consistency ICC of average measurements (in our study a two-way mixed ICC was used) and, therefore, not completely, but most comparable to our mean ICC results [
20,
30]. The NOTECHS tool used in their study had also five domains, which are comparable to t T-NOTECHS, but adjusted for surgical team performance. Like the T-NOTECHS, the NOTECHS in their study had a five-point Likert-scale for each five individual domain. The most reliable domain had a Cronbach’s
α of 0.87 and the least reliable domain had a score of 0.77. In the study of Mishra et al. [
28], a single observer assessed non-technical skills of individual team members, subteams and the team as a whole using the Oxford NOTECHS. The Oxford NOTECHS is comparable to T-NOTECHS in number and sort domains, but adjusted for surgical team assessment. Thereby, domains were scored on a four-point Likert-scale for each member and points were summed up for each subteam (4–16 points) and overall team score (12–48). Reliability was tested using inter-rater agreement (Rwg). The overall NOTECHS Rwg for the team was 0.99. and the lowest domain for the team had an overall score of Rwg 0.93. These high scores indicate that the tool is very reliable; however, using Rwg to assess reliability in their study design may have introduced analytical bias. Analyses by Rwg uses a null hypothesis of complete lack of agreement among raters, which is in their study means that all of the 37 options for overall team score (all possible outcomes when individual scores are summed up) had an equal chance (i.e., 1/37 or 2,7%) of being scored by the assessor. Such a distribution is very unlikely, which was more or less confirmed by the statement in the article of Robertson et al. [
29] presenting the successor of the Oxford NOTECHS, the Oxford NOTECHS II. The authors wrote that the successor intended to provide greater discrimination, as teams scored within a narrow middle range in the first Oxford NOTECHS version. The Oxford NOTECHS II had the same fundamentals compared to the Oxford NOTECHS, but the scale was altered. Reliability of the Oxford NOTECHS II was measured using ICC, without description of what kind of ICC model, type or definition was used and therefore no proper comparison to our results could be made. The ICC for the individual domains was between 0.68 and 0.88.
Although our sample size was intuitively small, our study design included a sample size calculation and our study was able to adequately indicate the reliability of T-NOTECHS for a single and multiple assessor by video analyses. Another strength of this study is that real trauma resuscitation was analyzed (instead of simulations). However, our study has also several limitations that should be considered. First, we were not able to properly assess intra-observer variability. Videos of trauma resuscitations are automatically deleted from the server after 30 days, because of local hospital’s security and privacy policies. Assessing the same video within 2 weeks would have introduced recall bias. Second, in this study we assessed non-technical skills of the trauma team during resuscitations. The trauma team is activated for potentially severely injured patients, which is predefined by anatomical, physiological criteria or mechanism of trauma; however, the mean ISS of resuscitated patients in this study was relatively low [
9]. Therefore, our results may be less representative for resuscitations of more severely injured patients. Third, we used a two-way mixed-effects, as only three research assistants were used to assess non-technical skills. We have chosen to assess non-technical skills by adequately trained personnel with the intention to improve the validity and reliability of our measurements. The downside of choosing a limited number of trained personnel is, in exact terms, that we tested the reliability of non-technical skills assessment of our trained research assistants. Therefore, caution should be exercised when generalizing our results, while our results might overestimate the reliability of T-NOTECHS. Finally, our assessors were trained medical students, which intuitively might be inferior to assessment by experienced clinical experts. However, these students had already had training and gained experience in the assessment of trauma resuscitation and had extensive training in the assessment of non-technical skills. To our knowledge, for trauma resuscitations specifically, no study has investigated the effect of raters’ education on the reliability of non-technical skills assessment of trauma teams. Nevertheless, a considerable amount of literature has been published on the use of objective structured clinical examinations (OSCEs), which have become widely used in medical education [
31]. Medical schools have invested significant resources in designing and implementing OSCE in assessment programs, with the rigor of the process highly dependent on whether OSCEs provide reliable and valid indicators of student competence [
32] Research suggests that untrained raters may be less consistent than trained raters [
33,
34]. In addition, raters with more clinical experience are not naturally better assessors of non-technical skills. A recently published study of Pradarelli et al. [
35] showed that clinical experience of raters, in their study surgeons, had no effect on reliability of non-technical skill assessment of other surgeons. Furthermore, from a practical viewpoint, routine assessment of resuscitation is very time consuming and, in our opinion, not feasible to be performed in the precious time of experienced clinicians. Overall, assessment by other personnel than experienced clinicians is more likely to be incorporated in daily practice. Therefore, we believe that the reliability we found is appropriate for the purpose of T-NOTECHS.
As evidence supporting the importance of non-technical skill for trauma team resuscitation is growing rapidly [
6‐
17], training of non-technical skills becomes more important. For instance, closed loop communication has shown to reduce overall resuscitation time [
36]. Furthermore, enhanced leadership is positively associated with improvement of processes during resuscitation [
37]. The T-NOTECHS might be a useful, and to our knowledge, best available tool to assess non-technical skills of the trauma team. For daily practice, one rater to assess non-technical skills using the T-NOTECHS seems legitimate as part of quality assessment, while the overall score is a reliable value. For research or quality improvement, it might be interesting to secondarily assess non-technical skills with three raters. For example, when (relatively) low overall T-NOTECHS scores are correlated to a certain factor (e.g., trauma mechanism, severity of injury, experience of trauma team), an analysis with three raters would be useful.