Scolaris Content Display Scolaris Content Display

Virtual reality training for improving the skills needed for performing surgery of the ear, nose or throat

Collapse all Expand all

Abstract

Background

Virtual reality simulation uses computer‐generated imagery to present a simulated training environment for learners. This review seeks to examine whether there is evidence to support the introduction of virtual reality surgical simulation into ear, nose and throat surgical training programmes.

Objectives

1. To assess whether surgeons undertaking virtual reality simulation‐based training achieve surgical ('patient') outcomes that are at least as good as, or better than, those achieved through conventional training methods.

2. To assess whether there is evidence from either the operating theatre, or from controlled (simulation centre‐based) environments, that virtual reality‐based surgical training leads to surgical skills that are comparable to, or better than, those achieved through conventional training.

Search methods

The Cochrane Ear, Nose and Throat Disorders Group (CENTDG) Trials Search Co‐ordinator searched the CENTDG Trials Register; Central Register of Controlled Trials (CENTRAL 2015, Issue 6); PubMed; EMBASE; ERIC; CINAHL; Web of Science; ClinicalTrials.gov; ICTRP and additional sources for published and unpublished trials. The date of the search was 27 July 2015.

Selection criteria

We included all randomised controlled trials and controlled trials comparing virtual reality training and any other method of training in ear, nose or throat surgery.

Data collection and analysis

We used the standard methodological procedures expected by The Cochrane Collaboration. We evaluated both technical and non‐technical aspects of skill competency.

Main results

We included nine studies involving 210 participants. Out of these, four studies (involving 61 residents) assessed technical skills in the operating theatre (primary outcomes). Five studies (comprising 149 residents and medical students) assessed technical skills in controlled environments (secondary outcomes). The majority of the trials were at high risk of bias. We assessed the GRADE quality of evidence for most outcomes across studies as 'low'.

Operating theatre environment (primary outcomes)

In the operating theatre, there were no studies that examined two of three primary outcomes: real world patient outcomes and acquisition of non‐technical skills. The third primary outcome (technical skills in the operating theatre) was evaluated in two studies comparing virtual reality endoscopic sinus surgery training with conventional training. In one study, psychomotor skill (which relates to operative technique or the physical co‐ordination associated with instrument handling) was assessed on a 10‐point scale. A second study evaluated the procedural outcome of time‐on‐task. The virtual reality group performance was significantly better, with a better psychomotor score (mean difference (MD) 3.20, 95% CI 2.05 to 4.34; 10‐point scale) and a shorter time taken to complete the operation (MD ‐5.50 minutes, 95% CI ‐9.97 to ‐1.03).

Controlled training environments (secondary outcomes)

In a controlled environment five studies evaluated the technical skills of surgical trainees (one study) and medical students (three studies). One study was excluded from the analysis.

Surgical trainees: One study (80 participants) evaluated the technical performance of surgical trainees during temporal bone surgery, where the outcome was the quality of the final dissection. There was no difference in the end‐product scores between virtual reality and cadaveric temporal bone training.

Medical students: Two other studies (40 participants) evaluated technical skills achieved by medical students in the temporal bone laboratory. Learners' knowledge of the flow of the operative procedure (procedural score) was better after virtual reality than conventional training (SMD 1.11, 95% CI 0.44 to 1.79). There was also a significant difference in end‐product score between the virtual reality and conventional training groups (SMD 2.60, 95% CI 1.71 to 3.49). One study (17 participants) revealed that medical students acquired anatomical knowledge (on a scale of 0 to 10) better during virtual reality than during conventional training (MD 4.3, 95% CI 2.05 to 6.55). No studies in a controlled training environment assessed non‐technical skills.

Authors' conclusions

There is limited evidence to support the inclusion of virtual reality surgical simulation into surgical training programmes, on the basis that it can allow trainees to develop technical skills that are at least as good as those achieved through conventional training. Further investigations are required to determine whether virtual reality training is associated with better real world outcomes for patients and the development of non‐technical skills. Virtual reality simulation may be considered as an additional learning tool for medical students.

PICOs

Population
Intervention
Comparison
Outcome

The PICO model is widely used and taught in evidence-based health care as a strategy for formulating questions and search strategies and for characterizing clinical studies or meta-analyses. PICO stands for four different potential components of a clinical question: Patient, Population or Problem; Intervention; Comparison; Outcome.

See more on using PICO in the Cochrane Handbook.

Plain language summary

Virtual reality training for improving the skills needed for performing surgery of the ear, nose or throat

Review question

This review evaluated whether surgeons trained in virtual reality simulation achieved outcomes for their patients that were equivalent to, or better than, those obtained through conventional training methods. It also evaluated whether virtual reality training helped surgeons acquire equivalent (or better) technical skills required to achieve good surgical outcomes, or the non‐technical skills to make good decisions and lead the operating room team. Another consideration evaluated was the level of experience of participants in the trials, given that some of the study participants were surgeons in training, while others were medical students.

Background

Virtual reality simulation provides an alternative to current training programmes for ear, nose and throat surgery. However, the capability of virtual reality simulation training to provide an equivalent, or superior, approach to traditional training methods needs to be reliably identified. As virtual reality is an emerging technology, few comparative studies exist, making it difficult to identify accurately its value or worth for surgical training.

Study characteristics

We included nine studies involving 210 ear, nose and throat residents and medical students. Four studies compared virtual reality endoscopic sinus surgery training with conventional training; one study compared virtual reality endoscopic dacryocystorhinostomy training versus textbook reading; two studies compared virtual reality temporal bone dissection training versus cadaveric temporal bone dissection training and two studies compared virtual reality temporal bone dissection training versus a small group tutorial with temporal bone models. None of the studies were funded by an agency with a commercial interest in the results of the studies.

Key results

None of the studies evaluated whether training in virtual reality influences patient outcomes or non‐technical skills. There is evidence to support the introduction of virtual reality into surgical training on the basis that the technical skills acquired by this method are as good as, or better than, those learnt through conventional training. Virtual reality can be added to the extensive range of activities that constitutes a comprehensive surgical training programme. Virtual reality simulation should also be considered as an additional learning tool for medical students.

Quality of the evidence

We assessed the quality of the evidence in this review for most outcomes as 'low' (using the GRADE system). The key reasons for this were issues related to study design. The evidence in this review is up to date to 27 July 2015.

Authors' conclusions

Implications for practice

  1. There is limited evidence to support the integration of virtual reality simulation into surgical training programmes, on the basis that the technical skills acquired in this environment are as good as, or better than, those acquired during conventional training. This applies to medical students and also to surgical trainees.

  2. Virtual reality simulation should be considered as an additional training tool, rather than a replacement for activities already established in formal surgical training programmes. Surgical training programmes integrate activities that cross many domains, including didactic education, mentorship, on‐the‐job training, seminars, workshops and specific surgical skills activities.

  3. Current evidence supports the integration of virtual reality simulation into training for functional endoscopic sinus surgery and temporal bone surgery.

Implications for research

Further studies of sufficient sample size to detect clinically important effects are desirable to evaluate real world outcomes for patients when surgeons are trained in virtual reality. Similarly, larger trials in both functional endoscopic sinus surgery and temporal bone surgery would be desirable to strengthen the evidence that virtual reality simulation allows trainees to acquire technical skills that are commensurate with traditional training.

The role of virtual reality surgical simulation in the development of non‐technical skills is yet to be explored.

The extent to which virtual reality simulation might replace other methods of teaching technical skills in a training programme is still to be evaluated. For example, Wiet et al proposed that a 10‐hour period of simulation experience can in some respects be equivalent to the dissection of two temporal bones (Wiet 2012). This type of data would be very useful for supervisors of training. At the moment, there is no evidence to suggest that virtual reality simulation should completely replace cadaveric dissection, but whether this might be possible is an avenue for further investigation.

Summary of findings

Open in table viewer
Summary of findings for the main comparison. Virtual reality endoscopic sinus surgery training compared to conventional training for surgical trainees to improve their performance in the operating theatre

Virtual reality endoscopic sinus surgery training compared to conventional training for surgical trainees to improve their performance in the operating theatre

Patient or population: surgical trainees
Settings: operating theatre
Intervention: virtual reality endoscopic sinus surgery training
Comparison: conventional training

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

No of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Conventional training

Virtual reality endoscopic sinus surgery training

Psychomotor score
Performance rating scale Scale from: 0 to 10

The mean psychomotor score ranged across control groups from
2.73 to 6.68 points

The mean psychomotor score in the intervention groups was
3.20 higher
(2.05 to 4.34 higher)

29
(2 studies1)

⊕⊝⊝⊝
very low2,3,4,5

Procedural score
Time to completion of task

The mean procedural score in the control group was
10.06 minutes

The mean procedural score in the intervention group was
5.50 lower
(9.97 to 1.03 lower)

25
(1 study)

⊕⊕⊝⊝
low2,4,5

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1One study was not a RCT.
2The author reported that they randomised the participants into the intervention and control group but gave no specific details of the randomisation method.
3The author did not report on allocation concealment.
4No blinding of participants.
5Total population is fewer than 400 participants (a threshold rule‐of‐thumb value; using the usual α and β, and an effect size of 0.2 SD, representing a small effect).

Open in table viewer
Summary of findings 2. Virtual reality temporal bone dissection training compared to cadaveric temporal bone dissection training for surgical trainees to improve their performance in a controlled environment

Virtual reality temporal bone dissection training compared to cadaveric temporal bone dissection training for surgical trainees to improve their performance in a controlled environment

Patient or population: surgical trainees
Settings: controlled environment
Intervention: virtual reality temporal bone dissection training
Comparison: cadaveric temporal bone dissection training

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

No of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Cadaveric temporal bone dissection training

Virtual reality temporal bone dissection training

End‐product score
Welling scale
Scale from: 0 to 3.5

The mean end‐product score in the control group was
2.13 points

The mean end‐product score in the intervention group was
0.07 higher
(0.19 lower to 0.33 higher)

65
(1 study)

⊕⊕⊝⊝
low1,2,3

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1No blinding of participants.
218% incomplete outcome data.
3Total population size is fewer than 400 participants (a threshold rule‐of‐thumb value; using the usual α and β, and an effect size of 0.2 SD, representing a small effect).

Open in table viewer
Summary of findings 3. Virtual reality temporal bone dissection training compared to conventional training for medical students to improve their performance in a controlled environment

Virtual reality temporal bone dissection training compared to conventional training for medical students to improve their performance in a controlled environment

Patient or population: medical students
Settings: controlled environment
Intervention: virtual reality temporal bone dissection training
Comparison: conventional training

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

No of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Conventional training

Virtual reality temporal bone dissection training

Procedural score
Technical step binary scale

The mean procedural score in the intervention groups was
1.11 standard deviations higher
(0.44 to 1.79 higher)

40
(2 studies)

⊕⊕⊝⊝
low1,2

End‐product score
Modified Welling scale

The mean end‐product score in the intervention groups was
2.60 standard deviations higher
(1.71 to 3.49 higher)

40
(2 studies)

⊕⊕⊝⊝
low1,2

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1The authors did not report on allocation concealment.
2Total population is fewer than 400 participants (a threshold rule‐of‐thumb value; using the usual α and β, and an effect size of 0.2 SD, representing a small effect).

Open in table viewer
Summary of findings 4. Virtual reality endoscopic sinus surgery training compared to conventional training for medical students to improve their performance in a controlled environment

Virtual reality endoscopic sinus surgery training compared to conventional training for medical students to improve their performance in a controlled environment

Patient or population: medical students
Settings: controlled environment
Intervention: virtual reality endoscopic sinus surgery training
Comparison: conventional training

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

No of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Conventional training

Virtual reality endoscopic sinus surgery training

Anatomy score
Anatomy identification test Scale from: 0 to 10

The mean anatomy score in the control group was
5.1

The mean anatomy score in the intervention group was
4.3 higher
(2.05 to 6.55 higher)

17
(1 study)

⊕⊝⊝⊝
very low1,2,3,4,5

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1The authors reported that they randomised the participants into intervention and control groups but gave no specific details of the randomisation methods.
2Did not mention allocation concealment.
3Did not mention blinding.
4There was 22% missing data in the control group.
5Total population is fewer than 400 participants (a threshold rule‐of‐thumb value; using the usual α and β, and an effect size of 0.2 SD, representing a small effect).

Background

Description of the condition

A simulation is a controlled environment where some aspects of a real surgical task are recreated for the purpose of training. Physical models, cadaveric human or animal tissue, living animals and computer programs are all examples of simulations.

Simulation is purported to have some distinct advantages for medical training, such as the presentation of scenarios that are played out in an arena where there is 'permission to fail', therefore allowing trainees to learn from mistakes without causing patient harm (George 2010; Satava 2008). Simulation facilitates 'hands‐on' experience when learning specific aspects of a surgical procedure, with the opportunity for periods of reflection to consolidate the student's understanding (Rourke 2010). As such, it is anticipated that training periods may be shortened (Kolb 1984).

Virtual reality surgical simulation involves the production of computer‐generated imagery that emulates a surgical training environment. Depending upon the surgical task, simulations may generate three‐dimensional (3D) graphics, sounds and a sense of touch (provision of force feedback through a haptic device) within the virtual reality environment (Rafiq 2008). Virtual reality is well suited for the implementation of simulation in surgical training (Zirkle 2007a; Zirkle 2007). Virtual reality simulation may enable self directed learning to occur outside and alongside operative hours, making learning more flexible for both trainer and trainee (Sedlack 2004).

This review addresses the question of whether the anticipated advantages of virtual reality surgical simulation are supported by evidence within the context of ENT surgery. However, in order to undertake this task, it is first necessary to consider what constitutes a successful outcome in surgical training. The ultimate goal is always to provide surgeons with the skills to ensure optimal outcomes for their patients. These skills include the technical competence to handle instruments and tissue deftly, knowledge of the flow of each surgical procedure and the surgical anatomy required to support this, and judgement about when to employ specific operative approaches and deal with complicated pathology (Crossley 2011; Yule 2008; Yule 2009). However, the assessment of these factors in the clinical context is challenging when it is considered that surgical training is undertaken in an environment where patient safety is always being optimised and where surgical error is already low.

With a view to patient safety, early implementation of virtual reality simulation‐based training has been undertaken outside the operating theatre (Lateef 2010; Okuda 2009), in a controlled environment where technical skill, procedural knowledge or clinical judgement can be assessed without putting patients at risk. While this setting does not provide direct evidence for improved patient outcomes, it is important for supporting the validity of virtual reality simulation in developing the surgical skills required to improve (patient) outcomes.

Later‐stage testing of virtual reality simulation has begun to occur within the clinical context (Gala 2013; Gallagher 2005). Here the minimum acceptable standard required of virtual reality simulation is that patient outcomes are at least as good as those achieved through conventional training methods. This involves demonstrating that surgical error does not increase and that the surgical skills learnt through virtual reality training are as good as those acquired through conventional training. Arguably, proving that virtual reality simulation leads to better patient outcomes than conventional training is not necessary to support its implementation, although this is of course desirable.

The effectiveness of virtual reality simulation is also a legitimate consideration when assessing its benefit. For the health system, the societal benefit of virtual reality training might be reflected in reduced training times, improved rates of skill acquisition (learning curve) or reduced direct costs of training, freeing resources for other purposes. This adds another dimension to the task of defining appropriate measures to evaluate the worth of virtual reality simulation.

Another important consideration when appraising the simulation literature is to be cognisant of the intended recipients of the training and how these compare with the participants involved in randomised controlled trials (RCTs). Ideally, the level of surgical experience of trial participants should match that of the intended users. When this is not the case (for example, when medical students make up the study cohort for a simulation targeted at surgical trainees) the outcomes of the study need to be interpreted with this disparity in mind.

It may be concluded that the determination of outcomes for this systematic review requires consideration of factors that cross the interests of individual patients, society and study design. We judged the highest level of evidence to support virtual reality simulation for surgical training to be RCTs conducted within the operating room, where patient outcomes or surgical technique were assessed. We also sought RCTs exploring the efficacy of virtual reality simulation for increasing the efficiency of surgical skill acquisition. We also analysed evidence derived from controlled experimental conditions (outside the operating theatre) as this provides invaluable insights into the capability of virtual reality simulation to deliver outcomes in specific dimensions of surgical skill. We further stratified the latter by participant type, where we attributed the greatest weight to studies where the participants were the intended users of the simulation.

Description of the intervention

Virtual reality surgical simulation is a computer‐generated emulation of a surgical procedure in a graphical environment. The virtual environment may be enriched with sounds and/or the use of force feedback (haptics) to provide a sense of touch (Lewis 2012; Maran 2003). Within this virtual world, the surgical anatomy is presented, to varying degrees of realism, depending upon the requirements of the surgical task. The graphical presentation of the surgical field may be presented in either two or three dimensions, depending upon which is most appropriate given the operation to be simulated. For example, simulations of functional endoscopic sinus surgery will present a two‐dimensional 'endoscopic' view of the surgical field, while simulations of temporal bone surgery will provide a three‐dimensional 'microscopic' representation. Surgical instrumentation is also rendered within the virtual environment and during the simulation the operator controls these instruments to undertake the surgical procedure.

Virtual reality surgical simulations are designed to enhance training and as such may provide feedback to the learner either during or after a training task. Surgical simulators generate metrics that can be utilised to provide objective analysis of learner performance, such as time series of the three‐dimensional path of instruments within the virtual environment, and the place and time of removal of virtual 'tissue' or disease (Zhou 2013). This opens up novel opportunities for evaluation and feedback that are not possible with other modalities.

In the context of ENT, there is literature on simulations for the operation of functional endoscopic sinus surgery (FESS) and temporal bone surgery. FESS is an endoscopic procedure, undertaken endonasally upon the sinuses to treat infection, inflammatory disease (e.g. nasal polyps) or tumours. Temporal bone surgery refers to operations on the temporal bone, which is located behind the ear canal, and is undertaken to treat chronic ear infections, for cochlear implantation or as a surgical access to the base of the skull (George 2010; Jackson 2002; Javia 2012; Nogueira 2010).

How the intervention might work

Surgical professionals require the acquisition of precise skills and achieving mastery entails deliberate practice to produce expert performance. This requires surgeons to participate in intense repetition of a skill, rigorous assessment of outcomes and specific informative feedback in a controlled setting (Ericsson 2004). Virtual reality simulation is an ideal medium in which the above principle can be effectively embodied. The basic feature of virtual reality simulation is the ability to reproduce scenarios. Skill repetition in practice sessions gives learners opportunities to correct error and polish their performance so that skilled movements can be made effortlessly and automatically. It can also incorporate real‐time feedback during training (Wijewickrema 2014), or summative feedback after each practice session (Issenberg 2005; McGaghie 2006).

Current evidence from the literature and educational psychology theory suggest that for adults to learn, they require a level of engagement with the task that traditional training methodologies may lack. Adult learners require self directed learning, the capacity to relate the task to previous experience and immediate outcomes from their learning, as well as for the task to be engaging in nature, in order to foster a readiness and motivation to learn (Tsuda 2009). Virtual reality simulation has a unique capacity to engage the student on a level that fulfils the requirements of the adult learner by enabling interaction with the simulated environment (Nehls 1995). It also allows the learner to have control in order to maximise educational efficacy (Wulf 2005). The same cannot be said for didactic methods of teaching or textbook‐based learning, where the learner must enter the teaching environment with a pre‐fostered willingness to learn (Bryan 2009).

In a recent Cochrane review, virtual reality training appeared to decrease the operating time and improve the operative performance of surgical trainees with limited laparoscopic experience when compared with no training or with box‐trainer training (Nagendran 2013).

Why it is important to do this review

This review seeks to put into context the evidence supporting the introduction of virtual reality surgical simulation into the training of surgeons, and to provide a framework within which the emerging literature on virtual reality simulation can be interpreted. This is important for all stakeholders in health care, providing evidence to inform whether it is desirable to introduce simulation into training or provide infrastructure and other resources to support it. For educators and learners, the review will help them better understand whether simulation‐based learning is effective. For researchers, this review identifies domains where there are gaps in the literature that might require attention.

The efficacy of virtual reality training seems to be dependent upon the learning domain, so it is important to explore this within the context of ENT surgery. In other fields, such as cardiac life support training, some articles provide results claiming that students taught in simulators, when compared to non‐simulator‐taught students, have an increase in either 'skills' or 'knowledge'. However, conversely, other papers conclude that simulators provide the same benefits as textbooks when teaching surgical knowledge and, thus, textbooks are financially preferable (Cavaleiro 2009; Cherry 2007; Peng 2009). This review will help to resolve such conflicts for ENT surgery.

The overriding aim of this review is therefore to understand whether current evidence supports the integration of virtual reality simulation into surgical training programmes. The primary consideration informing this decision is whether surgeons undertaking virtual reality simulation‐based training achieve surgical ('patient') outcomes that are at least as good as, or better than, those achieved through conventional training methods. Therefore, studies examining the effect of simulation‐based training on patient outcomes are considered as the best evidence. However, given that surgery is undertaken in a context where patient safety is paramount and where error rates are very low, we anticipate that it may be difficult to detect differences in 'real world' outcomes for patients. In light of this, the review also considered whether there was evidence from either the operating theatre, or from controlled (simulation centre‐based) environments, to support the notion that virtual reality‐based surgical training leads to surgical skills that are comparable to, or better than, those achieved through conventional training. Surgical skills underpin excellence in operative surgery, so the efficacy (and/or efficiency) of virtual reality simulation in training surgical skills is pertinent to the question of whether it should be integrated into training programmes.

Objectives

1. To assess whether surgeons undertaking virtual reality simulation‐based training achieve surgical ('patient') outcomes that are at least as good as, or better than, those achieved through conventional training methods.

2. To assess whether there is evidence from either the operating theatre, or from controlled (simulation centre‐based) environments, that virtual reality‐based surgical training leads to surgical skills that are comparable to, or better than, those achieved through conventional training.

Methods

Criteria for considering studies for this review

Types of studies

Randomised controlled trials and controlled trials.

Types of participants

All types of surgical trainees in ear, nose or throat surgery (residents, fellows, training otolaryngologists) and medical students.

Types of interventions

Virtual reality training versus any other method of training.

Types of outcome measures

We analysed the following outcomes in the review, but we did not use them as a basis for including or excluding studies.

Primary outcomes

  • Real world outcomes: intra‐ or postoperative error rates, discomfort or pain score of patients after surgery, and re‐admission rates.

The most significant evidence to support the integration of virtual reality simulation into surgical training would be equivalent or improved outcomes for patients operated upon by surgeons trained using virtual reality simulation compared with those trained using conventional methods (Lambert 2003; Larsen 1979).

  • Technical skills in the operating theatre: psychomotor scores measuring physical co‐ordination providing precise movement, procedural scores measuring task accomplishment through a sequence of actions, and surgical anatomy scores measuring knowledge of specific anatomical structures.

In order to support the integration of virtual reality simulation into surgical training, it is important to know that the technical skills acquired through virtual reality simulation‐based training are at least as good as those learnt through conventional training methods (Kundhal 2009; Reznick 1997).

  • Non‐technical skills in the operating theatre: situational awareness skill scores, decision‐making skill scores, communication and teamwork skill scores, and leadership skill scores.

Non‐technical skills are equally important for achieving good surgical outcomes. Non‐technical skills can be broadly classed as those required to make the correct decisions at the right time during an operation, and those required to integrate and lead the operating room team (Flin 2006). It has been well established that patient outcomes are enhanced by a well‐functioning team within the operating theatre (Mazzocco 2009). Some types of simulation focus specifically upon the acquisition of these skills, so in this review we were open to the possibility that these factors might have been evaluated in trials relating to virtual reality simulation.

Secondary outcomes

Surgeons undertake much of their training in controlled environments outside the operating theatre. These settings are often referred to as 'surgical skills' centres or 'surgical simulation' centres. An example of a controlled environment known to most ENT surgeons is the temporal bone laboratory. Skills learnt in controlled environments are expected to 'transfer' to the operating theatre, meaning that foundational skills are acquired in the laboratory and then refined in the operating theatre. Thus, the secondary outcomes of this review were based on skills acquired in controlled environments.

  • Technical skills in the controlled environment: as described above and end‐product scores.

The technical skills assessed in controlled environments are the same as those assessed in the operating theatre and described above. In controlled environments, the final anatomical dissection or completed operation can be assessed, arriving at what is known as an end‐product score (Butler 2007). This score describes the extent of dissection and rates of injury to anatomical structures in a cadaveric specimen, or a model, following surgery. End‐product scores are often considered to reflect overall competence in the surgical task.

  • Non‐technical skills in the controlled environment. These are as described above, namely: situational awareness skill scores, decision‐making skill scores, communication and teamwork skill scores, and leadership skill scores.

Search methods for identification of studies

The Cochrane Ear, Nose and Throat Disorders Group (CENTDG) Trials Search Co‐ordinator (TSC) conducted systematic searches for randomised controlled trials and controlled clinical trials. There were no language, publication year or publication status restrictions. The date of the search was 27 July 2015.

Electronic searches

The TSC searched:

  • the CENTDG Trials Register (searched 27 July 2015);

  • the Cochrane Central Register of Controlled Trials (CENTRAL 2015, Issue 6);

  • PubMed (1946 to 27 July 2015);

  • Ovid EMBASE (1974 to 2015 week 30);

  • EBSCO CINAHL (1982 to 27 July 2015);

  • Ovid CAB Abstracts (1910 to 2015 week 29);

  • EBSCO ERIC (1964 to 27 July 2015);

  • LILACS, lilacs.bvsalud.org (searched 27 July 2015);

  • KoreaMed (searched via Google Scholar 27 July 2015);

  • IndMed, www.indmed.nic.in (searched 27 July 2015);

  • PakMediNet, www.pakmedinet.com (searched 27 July 2015);

  • Web of Knowledge, Web of Science (1945 to 27 July 2015);

  • ClinicalTrials.gov (searched via the Cochrane Register of Studies 27 July 2015);

  • ICTRP, www.who.int/ictrp (searched 27 July 2015);

  • ISRCTN www.isrctn.com (searched 27 July 2015);

  • Google Scholar, scholar.google.co.uk (searched 27 July 2015);

  • Google, www.google.com (searched 27 July 2015).

In searches prior to 2013, we also searched BIOSIS Previews 1926 to November 2012.

The TSC modelled subject strategies for databases on the search strategy designed for CENTRAL. Where appropriate, they were combined with subject strategy adaptations of the highly sensitive search strategy designed by The Cochrane Collaboration for identifying randomised controlled trials and controlled clinical trials (as described in the Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0, Box 6.4.b. (Handbook 2011). Search strategies for major databases including CENTRAL are provided in Appendix 1.

Searching other resources

We scanned the reference lists of identified publications for additional trials and contacted trial authors where necessary. In addition, the TSC searched PubMed, TRIPdatabase, The Cochrane Library and Google to retrieve existing systematic reviews relevant to this systematic review, so that we could scan their reference lists for additional trials.

Data collection and analysis

Selection of studies

Two authors (PP and AA) selected papers that appeared to meet the inclusion criteria based on title and abstract. If the information relevant to the inclusion criteria was not available in the abstract, or if the title was relevant but the abstract was not available, we obtained the full text of the report. Two authors (GK and SO) assessed papers that passed this initial review. We resolved disagreements between authors by discussion until a consensus was reached.

Data extraction and management

Two authors (PP and AA) extracted data independently, using a paper data extraction form. We compared results and resolved disagreements by discussion (PP, AA, GK and SO) until a consensus was reached. We entered the extracted data into RevMan 5 (RevMan 2014).

We extracted the following information relating to different aspects of the studies:

  1. Study methodology or quality: randomisation concealment as described in the study, blinding, timing of follow‐up, percentage of dropouts during follow‐up and baseline comparability of the groups.

  2. Methods: criteria for accepting participants into the study (medical students, residents, fellows and practitioners), ENT procedure under study (e.g. mastoidectomy, endoscopic sinus surgery), name of virtual reality simulator, training details and duration of training.

  3. Participants: number of participants in intervention and control groups at start, mean age of participants (plus age range), gender, level of training, surgical experience, year the study was published and study setting.

  4. Characteristics of the intervention: intervention comparisons, information on courses of virtual reality and any other conventional method.

  5. Outcome: skill competency scores (technical and non‐technical skills) and real world outcomes. We mainly extracted the outcome information as skill competency scores by study group.

We contacted the corresponding authors of the trials for further details and asked them to provide original data if the published paper had insufficient or unclear information. If there was any doubt whether the trials shared the same participants, we contacted the corresponding author to clarify the problem. One study, Fried 2012, did not report mean scores and standard deviations for any skill outcome in the manuscript. Another study, Wiet 2009, included six medical students and six residents. The author presented the combined results with no separation of residents and medical students. We contacted the corresponding authors (Fried 2012; Wiet 2009), however we received no response. As such, we excluded these studies from the analysed results.

Assessment of risk of bias in included studies

PP, AA and ML undertook assessment of the risk of bias of the included trials independently, with the following taken into consideration as guided by theCochrane Handbook for Systematic Reviews of Interventions (Handbook 2011):

  • sequence generation;

  • allocation concealment;

  • blinding;

  • incomplete outcome data;

  • selective outcome reporting; and

  • other sources of bias.

We used the Cochrane 'Risk of bias' tool in RevMan 5 (RevMan 2014), which involved describing each of these domains as reported in the trial and then assigning a judgement about the adequacy of each entry: 'low', 'high' or 'unclear' risk of bias.

Measures of treatment effect

We compared the effect of virtual reality against conventional training by using the mean difference (MD) and 95% confidence intervals (CI) for continuous variables such as skill competency scores, operating room performance score, etc. for each trial. We assessed complication or error rate by calculating the risk ratio (RR) and 95% confidence intervals (CI) for each trial. We performed the meta‐analyses according to the recommendations of the Cochrane Handbook for Systematic Reviews of Interventions (Handbook 2011). We used standardised mean difference (SMD) when studies used different instruments to measure the same construct.

Unit of analysis issues

The analysis needed to take into account the level at which randomisation occurred (Handbook 2011). We conducted analysis based on the number of 'units' that were randomised.

Dealing with missing data

We dealt with missing data by contacting the original investigators to request these data. If we received no response from the investigators, we assumed the data to be missing at random and performed the analysis by ignoring the missing data.

Assessment of heterogeneity

We assessed the significance of any discrepancies in the estimates of the intervention effects from the different trials by means of Cochran's Q test for heterogeneity and by a measure of the I2 statistic. The I2 statistic describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error. An I2 statistic greater than 50% may be considered to represent substantial heterogeneity. We also used the forest plots to assess heterogeneity visually.

Assessment of reporting biases

We planned to assess publication bias by means of a funnel plot if there were sufficient studies. However, a funnel plot was not possible in this review because the number of included studies was small and not sufficient to assess any reporting bias.

Data synthesis

Where appropriate, we pooled the results of comparable studies. We used a fixed‐effect model and 95% CI. We would have considered using the random‐effects model if there was unexplained heterogeneity.

Subgroup analysis and investigation of heterogeneity

We planned to consider the following conditions for subgroup analysis if heterogeneity of the intervention effect was found:

  1. type of ENT procedure under study;

  2. prior level of surgical experience (none, limited or expert);

  3. study design.

However, we did not perform the planned subgroup analyses due to the very few studies included.

Sensitivity analysis

To test the robustness of the results, we had planned to use sensitivity analyses to examine the effect of including studies with a high risk of bias (such as from lack of allocation concealment) on the overall estimates of effect for important outcomes. However, we did not perform these analyses because we rated the majority of the included studies as having a high risk of bias in the same domains.

GRADE and 'Summary of findings' tables

We used the GRADE system (GRADE Working Group) to define the quality of evidence in 'Summary of findings' tables, following the Cochrane Handbook for Systematic Reviews of Interventions (Handbook 2011).

Results

Description of studies

See: Characteristics of included studies; Characteristics of excluded studies; Characteristics of ongoing studies.

Results of the search

The study selection flow chart is shown in Figure 1.


Process for sifting search results and selecting studies for inclusion.

Process for sifting search results and selecting studies for inclusion.

We identified a total of 1674 references through electronic searches and one additional record from other sources. From these, we discarded 1560 references at the first‐level screening through reading of the abstracts (574 duplicates and 986 clearly irrelevant references), leaving 115 references for further consideration.

Of the 115 remaining references, we discarded a further 74: four employed computer‐assisted instruction for learning otolaryngology; 18 used computer‐generated virtual endoscopy for diagnosis; four used virtual reality for treatment; three concerned virtual reality development and validation. We identified other types of simulation (e.g. manikin, low‐fidelity model) in 45 other studies (32 ENT surgery, 13 other fields). We included the remaining 41 references in the second‐level screening.

At the second‐level screening, we discarded a further 12 studies for the following reasons: some were trials of virtual reality training in other fields (three studies from the audiology, laparoscopy and rehabilitation fields); others were review articles on virtual reality (four reviews on petrous surgery, temporal bone dissection and general otolaryngology); another used virtual reality for surgical planning (simulating the pre‐operative fitting of an electronic implantable hearing device) and anatomical learning; and three compared types of virtual reality simulators.

Of the remaining 29 studies, we excluded 19 (17 non‐controlled and two survey studies). The details are provided in the Excluded studies section below and Characteristics of excluded studies table.

We identified one 'ongoing' study (NCT02030873; see below and Characteristics of ongoing studies). No studies are 'awaiting assessment'.

Included studies

Nine studies were eligible and their details are presented in the table of Characteristics of included studies.

Design and sample size

We included nine studies (Edmond 2002; Fried 2010; Fried 2012; Solyar 2008; Weiss 2008; Wiet 2009; Wiet 2012; Zhao 2011a; Zhao 2011b). Seven studies were randomised controlled trials (Fried 2010; Solyar 2008; Weiss 2008; Wiet 2009; Wiet 2012; Zhao 2011a; Zhao 2011b). Two studies were controlled trials (Edmond 2002; Fried 2012).

Edmond 2002 included four first‐year otolaryngology residents from the University of Washington, USA, as they performed their first endoscopic sinus surgery. Fried 2010 compared virtual reality and conventional training for endoscopic sinus surgery in a study of 28 otolaryngology residents from the north‐eastern United States. Fried 2012 had a similar design, but with an augmented set of outcome measures, and included 14 junior otorhinolaryngology residents from two academic medical centres in New York City. Solyar 2008 also compared virtual reality and conventional teaching of endoscopic sinus surgery, and included 17 first‐year medical students from Albert Einstein College of Medicine, NY. Weiss 2008 included 15 ophthalmology residents from the New York metropolitan area, and compared virtual reality and traditional training methods for learning an endoscopic procedure on the nasal cavity where the nasolacrimal duct is opened endonasally. Wiet 2009 included six otolaryngology residents and six medical students at the Ohio State University Medical Center, OH, and compared virtual reality and conventional training for temporal bone surgery. Similarly, Wiet 2012 explored the same question in a multi‐institutional investigation of 80 otolaryngology residents in the USA. Zhao 2011a assessed whether self directed learning in a virtual reality simulation environment improved the outcomes of cadaveric temporal bone dissection compared with traditional training, and included 20 final‐year medical students from the University of Melbourne, Australia. Zhao 2011b compared whether expert‐guided instruction in temporal bone dissection undertaken in a virtual reality environment led to better cadaveric temporal bone dissection than traditional teaching methods, and included 20 medical students from the University of Melbourne, Australia.

Setting

Seven studies were located in medical schools in the USA (Edmond 2002; Fried 2010; Fried 2012; Weiss 2008; Wiet 2009; Wiet 2012). Two studies were located in medical schools in Australia (Zhao 2011a; Zhao 2011b).

Participants

Three studies recruited medical students (Solyar 2008; Zhao 2011a; Zhao 2011b). Four studies recruited otolaryngology residents (Edmond 2002; Fried 2010; Fried 2012; Wiet 2012). One study recruited ophthalmology residents (Weiss 2008). One study recruited both medical students and otolaryngology residents (Wiet 2009).

Interventions

Four studies compared virtual reality endoscopic sinus surgery training versus conventional training (Edmond 2002; Fried 2010; Fried 2012; Weiss 2008), while one study compared virtual reality endoscopic dacryocystorhinostomy training versus textbook reading (Solyar 2008).

Two studies compared virtual reality temporal bone dissection training versus cadaveric temporal bone dissection training (Wiet 2009; Wiet 2012), while two studies compared virtual reality temporal bone dissection training versus a small group tutorial with temporal bone models (Zhao 2011a; Zhao 2011b).

Outcomes

Outcomes for endoscopic sinus surgery trials were metrics of surgical performance as evaluated by experts (Edmond 2002; Fried 2010; Fried 2012; Weiss 2008). One study used pre‐ and post‐anatomy video assessment to evaluate the anatomical knowledge of participants (Solyar 2008), where the anatomy scores ranged from 0 to 10. Two studies reported a psychomotor score that was assessed within the operating room (Edmond 2002; Fried 2010). One study reported "time on task" in the operating room (Fried 2010), which may be seen as a 'combination score' assessment reflecting a synthesis of anatomical knowledge, procedural skill and psychomotor behaviour. No study reported on the rate of patient discomfort following operative surgery, nor surgical complication or error rates. All studies for otology used the Welling scale (Wiet 2009; Wiet 2012), or a modified version of the Welling scale (Zhao 2011a; Zhao 2011b). This is an 'end‐product' score, where a binary data score is used to evaluate the quality of a dissected temporal bone, considering exposure of anatomical landmarks and injury to these structures. These outcomes reflect technical skills (Wiet 2009; Wiet 2012; Zhao 2011a; Zhao 2011b).

Five studies reported on technical skills in a controlled environment (Solyar 2008; Wiet 2009; Wiet 2012; Zhao 2011a; Zhao 2011b), and four studies reported on technical skills in the operating theatre (Edmond 2002; Fried 2010; Fried 2012; Weiss 2008).

None of the studies reported on non‐technical skills or real world outcomes.

Excluded studies

We excluded a total of 19 studies that did not fulfil the above inclusion criteria: Balderston 2007; Delson 2003; Deutsch 2009; Francis 2012; Fried 2007; Ho 2012; Kerwin 2013; Khemani 2012; Reddy‐Kolanu 2011; Sewell 2007a; Sewell 2007b; Strauss 2005; Stredney 1998; Tolsdorff 2010; Wiet 2006; Yamashita 1999; Zhao 2010a; Zhao 2010b; Zirkle 2007. See Characteristics of excluded studies.

Ongoing studies

See Characteristics of ongoing studies.

We identified one ongoing study on the effect of virtual simulation training in mastoidectomy (NCT02030873). The authors plan to enrol 36 participants. The estimated study completion date was given as January 2015.

Risk of bias in included studies

Summaries of the risk of bias across the included studies are presented in Figure 2 and Figure 3. Across the assessed domains, the majority of trials scored 'high risk' for selection bias, performance bias and detection bias.


'Risk of bias' graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

'Risk of bias' graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.


'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study.

'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study.

Allocation

One study used computer‐generated randomisation and allocation concealment (Wiet 2012). Six studies reported that they randomised the participants into intervention and control groups but provided no specific details on the randomisation methods. Of these six randomised studies, no authors reported on allocation concealment (Fried 2010; Solyar 2008; Weiss 2008; Wiet 2009; Zhao 2011a; Zhao 2011b). This may not be surprising, however, given that the experimental design in these studies (comparing virtual reality to traditional teaching) would not have allowed for allocation concealment. One study was a quasi‐experimental study (Edmond 2002), and another a prospective cohort study (Fried 2012).

Blinding

While it may have been possible to conceal the allocation until the point at which the intervention commenced, this could not be maintained because the differences between traditional and virtual reality teaching would have been obvious to all participants (the participants and intervention providers could not be blinded to the intervention). However, it was still possible that those analysing the results could be blinded to the intervention.

Seven studies were single‐blind, removing the participants' identification before evaluation by the experts (Fried 2010; Fried 2012; Weiss 2008; Wiet 2009; Wiet 2012; Zhao 2011a; Zhao 2011b). One study used a pre‐ and post‐anatomy test but did not mention blinding (Solyar 2008). One study did not mention blinding (Edmond 2002).

Incomplete outcome data

Seven studies were unlikely to have incomplete outcome data (Edmond 2002; Fried 2010; Fried 2012; Weiss 2008; Wiet 2009; Zhao 2011a; Zhao 2011b). One study reported that two of the nine students in the control group did not complete the study (22% missing data) (Solyar 2008). One study enrolled 80 participants but only 65 completed the entire testing protocol (18% missing data) (Wiet 2012).

Selective reporting

The nine included studies were unlikely to have selective reporting. All included studies reported the main technical skills (psychomotor score, procedural score or surgical anatomy score). In terms of selective reporting of analyses, all included studies used continuous data for the technical skill scores and compared the scores between groups.

Other potential sources of bias

Seven included studies did not declare any conflict of interest (Edmond 2002; Solyar 2008; Weiss 2008; Wiet 2009; Wiet 2012; Zhao 2011a; Zhao 2011b). Two studies declared their conflict of interest, which had low impact on the studies (Fried 2010; Fried 2012). Although all of the virtual reality instruments were validated, the use of different methods to measure the outcomes may have led to bias. This was unavoidable to some extent, given that some studies examined temporal bone surgery, while others a different operation, namely endoscopic sinus surgery. In studies of temporal bone surgery, the widespread use of the Welling end‐product scale will have mitigated this risk.

Effects of interventions

See: Summary of findings for the main comparison Virtual reality endoscopic sinus surgery training compared to conventional training for surgical trainees to improve their performance in the operating theatre; Summary of findings 2 Virtual reality temporal bone dissection training compared to cadaveric temporal bone dissection training for surgical trainees to improve their performance in a controlled environment; Summary of findings 3 Virtual reality temporal bone dissection training compared to conventional training for medical students to improve their performance in a controlled environment; Summary of findings 4 Virtual reality endoscopic sinus surgery training compared to conventional training for medical students to improve their performance in a controlled environment

Virtual reality training versus any other method of training

Performance in the operating theatre
Real world outcomes

No study reported on real world outcomes.

Technical skills

Four studies comparing virtual reality endoscopic sinus surgery training versus conventional training reported on technical skills in the operating theatre (Edmond 2002; Fried 2010; Fried 2012; Weiss 2008).

Two studies involving 32 participants reported a psychomotor score in the operating room (Edmond 2002; Fried 2010). One study involving 15 participants reported time on task in the operating room (Fried 2010).

Two studies did not report mean score or standard deviations for any skill outcome in the manuscript (Fried 2012; Weiss 2008). We received no additional data from our contact with the paper author. As such, we excluded these studies from the following analysis. See summary of findings Table for the main comparison.

Edmond 2002 evaluated the following operating room technical skills using 10‐point scales: navigation, injection, uncinectomy, anterior ethmoidectomy, maxillary antrostomy, orientation of image, image‐task alignment, proper depth of image, tool manipulation, tool selection, tool‐tool dexterity, tissue respect, surgical confidence and case difficulty. The overall mean rating was 6.68 ± 1.09 for the virtual reality group and 4.02 ± 0.07 for the conventionally trained group.

Fried 2010 evaluated psychomotor skills in terms of surgical confidence and instrument manipulation performance in the operating room using 10‐point scales, and procedural skill using time on task (injection time and dissection time). The surgical confidence score was 6.55 ± 2.65 for the virtual reality group and 2.67 ± 2.00 for the conventional group. The instrument manipulation score was 6.75 ± 2.51 and 2.78 ± 1.86 for the virtual reality and conventional groups, respectively. The time for infiltration of the mucosa with local anaesthetic/adrenaline prior to the commencement of surgery (minutes) was 1.75 ± 1.04 for the virtual reality group and 4.67 ± 2.09 for the conventional group. The dissection time (min) was 7.37 ± 3.36 and 15.44 ± 6.46 for the virtual reality and conventional groups, respectively.

We assessed the GRADE quality of the evidence from these two studies as 'very low' due to the issues with study design (Edmond 2002; Fried 2010).

Meta‐analysis of psychomotor scores from these two studies showed a significant difference between groups (mean difference (MD) 3.20, 95% confidence interval (CI) 2.05 to 4.34); the I² value was 12% (see Analysis 1.1; Figure 4). The procedural score from one study, Fried 2010, also showed a significant difference between groups, with the virtual reality group performing the task faster (MD ‐5.50 minutes, 95% CI ‐9.97 to ‐1.03) (see Analysis 1.2).


Forest plot of comparison: 1 Virtual reality endoscopic sinus surgery versus conventional training: surgical trainees performance in operating theatre, outcome: 1.1 Psychomotor score.

Forest plot of comparison: 1 Virtual reality endoscopic sinus surgery versus conventional training: surgical trainees performance in operating theatre, outcome: 1.1 Psychomotor score.

Non‐technical skills

No study reported on non‐technical skills.

Performance in a controlled environment
Technical skills

1) Surgical trainees

Two studies comparing virtual reality temporal bone dissection training versus cadaveric temporal bone dissection training reported on technical skills for surgical trainees (Wiet 2009; Wiet 2012). These studies involved 92 participants and used the 35‐item Welling scale to determine an end‐product score for cadaveric temporal bone dissection.

The first study, Wiet 2009, included six medical students and six residents. The author presented the combined results with no separation of residents and medical students. We contacted the author for additional data but received no response. As such, we excluded this study from the analysis.

The second study, Wiet 2012, involved 80 participants, of which 65 completed the entire testing protocol, with 32 in the simulator group and 33 in the conventional training group. We assessed the GRADE quality of evidence from this study as 'low' based on the sample size (Wiet 2012). See summary of findings Table 2.

After adjusting for residency level, objective test scores, the number of previous human temporal bone dissections and institution, the least square mean (the mean estimated from a linear model after adjusted for the covariates) was 2.20 ± 0.54 for the virtual reality group and 2.13 ± 0.55 for the cadaveric temporal bone training group. The difference between groups was not statistically significant (least square MD 0.07, 95% CI ‐0.19 to 0.33) (see Analysis 2.1).

2) Medical students

Three studies reported on technical skills for medical students (Solyar 2008; Zhao 2011a; Zhao 2011b). Two compared virtual reality temporal bone dissection training versus conventional training (Zhao 2011a; Zhao 2011b), and one compared virtual reality endoscopic sinus surgery training versus conventional training (Solyar 2008).

Two studies involving 40 participants reported procedural score and end‐product score for each temporal bone dissection step in relation to the following anatomical structures: cortex, dura, sigmoid sinus, posterior external auditory canal, lateral semicircular canal and incus (Zhao 2011a; Zhao 2011b). See additional data table (Table 1). We assessed the GRADE quality of evidence from these two studies as 'low' based on the study design, which necessarily precluded blinding of allocation of the participants. See summary of findings Table 3.

Open in table viewer
Table 1. Additional data from Zhao 2011a and Zhao 2011b

Studies

Interventions

Procedural score

Mean ± SD

End‐product score

Mean ± SD

Overall score

Mean ± SD

Zhao 2011a

Virtual reality

27.73 ± 5.09

13.80 ± 1.46

7.00 ± 1.26

Conventional

21.17 ± 4.82

9.63 ± 1.44

2.83 ± 1.41

Zhao 2011b

Virtual reality

21.40 ± 2.56

18.20 ± 1.01

5.50 ± 1.74

Conventional

17.60 ± 4.64

13.20 ± 2.55

3.50 ± 1.00

SD: standard deviation

Zhao 2011a reported the procedural score according to cortex, dura, sigmoid sinus, lateral semicircular canal and incus (range 0 to 52); end‐product score according to dura, sigmoid, posterior external auditory canal, lateral semicircular canal and incus (range 0 to 18); and overall score (range 0 to 10). The average procedural score was 27.73 ± 5.09 in the virtual reality group and 21.17 ± 4.82 in the conventional group, respectively. The end‐product score was 13.80 ± 1.46 in the virtual reality group and 9.63 ± 1.44 in the conventional group, respectively. The overall score was 7.00 ± 1.26 in the virtual reality group and 2.83 ± 1.41 in the conventional group, respectively.

Zhao 2011b reported the procedural score according to cortex, dura, sigmoid sinus, lateral semicircular canal, facial nerve and cavity (range 0 to 30); end‐product score according to dura, sigmoid sinus, facial nerve, lateral semicircular canal and cavity (range 0 to 22); and overall score (range 0 to 10). The average procedural score was 21.40 ± 2.56 in the virtual reality group and 17.60 ± 4.64 in the conventional group, respectively. The end‐product score was 18.20 ± 1.01 in the virtual reality group and 13.20 ± 2.55 in the conventional group, respectively. The overall score was 5.50 ± 1.74 in the virtual reality group and 3.50 ± 1.00 in the conventional group, respectively.

For procedural score, the meta‐analysis from these two studies found statistically better performance in the virtual reality group (standardised mean difference (SMD) 1.11, 95% CI 0.44 to 1.79) with I² values of 0% (see Analysis 3.1; Figure 5). The meta‐analysis of end‐product score also found statistically better performance in the virtual reality group (SMD 2.60, 95% CI 1.71 to 3.49) with I² values of 0% (see Analysis 3.2; Figure 6).


Forest plot of comparison: 3 Virtual reality temporal bone dissection training versus conventional training: medical students performance in controlled environment, outcome: 3.1 Procedural score.

Forest plot of comparison: 3 Virtual reality temporal bone dissection training versus conventional training: medical students performance in controlled environment, outcome: 3.1 Procedural score.


Forest plot of comparison: 3 Virtual reality temporal bone dissection training versus conventional training: medical students performance in controlled environment, outcome: 3.2 End‐product score.

Forest plot of comparison: 3 Virtual reality temporal bone dissection training versus conventional training: medical students performance in controlled environment, outcome: 3.2 End‐product score.

One study, Solyar 2008, involving 17 participants, reported an anatomy identification test on the following structures: inferior turbinate, middle turbinate, nasal septum, maxillary ostium, maxillary sinus, uncinate process, ethmoid bulla, nasopharynx and Eustachian tube. The study showed that the virtual reality group had a significantly higher anatomical identification score (MD 4.3, 95% CI 2.05 to 6.55) on a scale of 0 to 10 (see Analysis 4.1). We assessed the GRADE quality of evidence as 'very low' based on multiple concerns about the study design. See summary of findings Table 4.

Non‐technical skills

No study reported on non‐technical skills.

Discussion

Summary of main results

No studies reported the effects of virtual reality simulation‐based training on real world outcomes or the acquisition of non‐technical skills in either the operating room or a controlled environment. All eligible data related to the development of technical skills.

In terms of the performance of surgical trainees in surgical theatre, those who received virtual reality simulation‐based training exhibited significantly better psychomotor scores (mean difference (MD) 3.20, 95% confidence interval (CI) 2.05 to 4.34) and completed the surgical task in less time (MD ‐5.50 minutes, 95% CI ‐9.97 to ‐1.03).

When comparing the performance of the surgical trainees in a controlled environment, there was a non‐statistically significant difference (least square MD 0.06, 95% CI ‐0.14 to 0.26) in end‐product scores for cadaveric dissection of temporal bones between virtual reality training and conventional methods.

Analysis of the performance of medical students in a controlled environment showed that students who received virtual reality training had significantly higher procedural scores (standardised mean difference (SMD) 1.11, 95% CI 0.44 to 1.79), end‐product scores (SMD 2.60, 95% CI 1.71 to 3.49) and anatomical identification scores (MD 4.3, 95% CI 2.31 to 6.29).

Overall completeness and applicability of evidence

We found literature on virtual reality surgical simulation for both functional endoscopic sinus surgery and temporal bone surgery. These procedures reflect two of the main fields in which ENT surgeons need to gain competence in their training. However, there are still many areas of the specialty where virtual reality simulation has not found application as a training environment.

The best evidence to support the integration of virtual reality simulation into surgical training would demonstrate that training undertaken in virtual reality leads to patient outcomes that are as good as, or better than, those achieved through conventional training methods. To date, this type of evidence is not available and poses an opportunity for future research.

The available evidence supports the notion that virtual reality‐based training leads to technical outcomes that are equivalent to or better than those acquired through conventional training, irrespective of whether these skills are applied to the operating room or a controlled environment. Confidence in this evidence would be strengthened by studies with larger cohorts, and in particular by studies that enrolled surgical trainees as participants.

There is a paucity of studies examining non‐technical skills. This may reflect the types of virtual reality simulation that have been developed to date. These are primarily for teaching the psychomotor and procedural skills required for operative surgery, and have been focused around the surgeon's perspective of the operating theatre. As such, these simulations do not as yet provide opportunities for team interaction and many of the qualities of non‐technical skill acquisition cannot be assessed.

The establishment of a universal scoring system for surgical outcome measurement would be a significant advance, but is not without its challenges due to the differing types of information that can be obtained from simulation environments. Many of these data are novel and methods for analysis are only now being developed. However, the emergence and widespread use of end‐product scales in temporal bone surgery (using the Welling Scale) is a good step towards the standardisation of outcome assessment.

Quality of the evidence

See: Characteristics of included studies; summary of findings Table for the main comparison; summary of findings Table 2; summary of findings Table 3; summary of findings Table 4.

Most of the studies included in this review are at risk of bias arising from limitations in random sequence generation, allocation concealment and participant blinding. Some of these risks are inherent to the types of experimental design that are possible in controlled trials comparing traditional and virtual reality training environments. Under these circumstances, it is not possible to blind either instructors or participants to their trial allocation group, and neither is it possible to conceal allocation once the trials begin. Therefore, in these respects the data available must be considered to be at a high risk of bias. However, in most of the trials, assessors were blinded to the allocation of participants, so the risk of bias for the analyses was, in general, low.

We assessed the GRADE quality of the evidence across most outcomes as 'low' due to concerns about randomisation, allocation concealment, blinding and sample size.

Potential biases in the review process

It is unlikely that bias has been introduced into the review. We used no language restrictions in the search strategy. Two authors independently selected and extracted data from the studies. We followed the procedures suggested in the Cochrane Handbook for Systematic Reviews of Interventions (Handbook 2011) and there was no disagreement between the two authors. We assessed the outcomes strictly according to the protocol. The search is fully up to date and it is unlikely that studies could have been missed by the search strategy. It is acknowledged that two review authors (GK and SOL) authored two of the studies reviewed (Zhao 2011a; Zhao 2011b). In order to manage the inherent risk of bias that this introduces, SOL and GK did not participate in the risk of bias analysis of these papers, however SOL participated in the meta‐analyses.

Agreements and disagreements with other studies or reviews

To our knowledge, there are no other systematic reviews assessing the efficacy of virtual reality training for improving the skills needed for performing surgery of the ear, nose or throat. We included nine studies in our review, which showed statistically significant effects on anatomy, procedural and end‐product scores in a controlled environment and in the operating theatre. The results are consistent with some previous non‐systematic reviews (George 2010; Jackson 2002; Nogueira 2010), and a systematic review of simulators in otolaryngology (Javia 2012).

Process for sifting search results and selecting studies for inclusion.
Figures and Tables -
Figure 1

Process for sifting search results and selecting studies for inclusion.

'Risk of bias' graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.
Figures and Tables -
Figure 2

'Risk of bias' graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study.
Figures and Tables -
Figure 3

'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study.

Forest plot of comparison: 1 Virtual reality endoscopic sinus surgery versus conventional training: surgical trainees performance in operating theatre, outcome: 1.1 Psychomotor score.
Figures and Tables -
Figure 4

Forest plot of comparison: 1 Virtual reality endoscopic sinus surgery versus conventional training: surgical trainees performance in operating theatre, outcome: 1.1 Psychomotor score.

Forest plot of comparison: 3 Virtual reality temporal bone dissection training versus conventional training: medical students performance in controlled environment, outcome: 3.1 Procedural score.
Figures and Tables -
Figure 5

Forest plot of comparison: 3 Virtual reality temporal bone dissection training versus conventional training: medical students performance in controlled environment, outcome: 3.1 Procedural score.

Forest plot of comparison: 3 Virtual reality temporal bone dissection training versus conventional training: medical students performance in controlled environment, outcome: 3.2 End‐product score.
Figures and Tables -
Figure 6

Forest plot of comparison: 3 Virtual reality temporal bone dissection training versus conventional training: medical students performance in controlled environment, outcome: 3.2 End‐product score.

Comparison 1 Virtual reality endoscopic sinus surgery versus conventional training: surgical trainees' performance in the operating theatre, Outcome 1 Psychomotor score.
Figures and Tables -
Analysis 1.1

Comparison 1 Virtual reality endoscopic sinus surgery versus conventional training: surgical trainees' performance in the operating theatre, Outcome 1 Psychomotor score.

Comparison 1 Virtual reality endoscopic sinus surgery versus conventional training: surgical trainees' performance in the operating theatre, Outcome 2 Procedural score.
Figures and Tables -
Analysis 1.2

Comparison 1 Virtual reality endoscopic sinus surgery versus conventional training: surgical trainees' performance in the operating theatre, Outcome 2 Procedural score.

Comparison 2 Virtual reality temporal bone dissection training versus cadaveric dissection training: surgical trainees' performance in a controlled environment, Outcome 1 End‐product score.
Figures and Tables -
Analysis 2.1

Comparison 2 Virtual reality temporal bone dissection training versus cadaveric dissection training: surgical trainees' performance in a controlled environment, Outcome 1 End‐product score.

Comparison 3 Virtual reality temporal bone dissection training versus conventional training: medical students' performance in a controlled environment, Outcome 1 Procedural score.
Figures and Tables -
Analysis 3.1

Comparison 3 Virtual reality temporal bone dissection training versus conventional training: medical students' performance in a controlled environment, Outcome 1 Procedural score.

Comparison 3 Virtual reality temporal bone dissection training versus conventional training: medical students' performance in a controlled environment, Outcome 2 End‐product score.
Figures and Tables -
Analysis 3.2

Comparison 3 Virtual reality temporal bone dissection training versus conventional training: medical students' performance in a controlled environment, Outcome 2 End‐product score.

Comparison 4 Virtual reality endoscopic sinus surgery versus conventional training: medical students' performance in a controlled environment, Outcome 1 Anatomy score.
Figures and Tables -
Analysis 4.1

Comparison 4 Virtual reality endoscopic sinus surgery versus conventional training: medical students' performance in a controlled environment, Outcome 1 Anatomy score.

Summary of findings for the main comparison. Virtual reality endoscopic sinus surgery training compared to conventional training for surgical trainees to improve their performance in the operating theatre

Virtual reality endoscopic sinus surgery training compared to conventional training for surgical trainees to improve their performance in the operating theatre

Patient or population: surgical trainees
Settings: operating theatre
Intervention: virtual reality endoscopic sinus surgery training
Comparison: conventional training

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

No of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Conventional training

Virtual reality endoscopic sinus surgery training

Psychomotor score
Performance rating scale Scale from: 0 to 10

The mean psychomotor score ranged across control groups from
2.73 to 6.68 points

The mean psychomotor score in the intervention groups was
3.20 higher
(2.05 to 4.34 higher)

29
(2 studies1)

⊕⊝⊝⊝
very low2,3,4,5

Procedural score
Time to completion of task

The mean procedural score in the control group was
10.06 minutes

The mean procedural score in the intervention group was
5.50 lower
(9.97 to 1.03 lower)

25
(1 study)

⊕⊕⊝⊝
low2,4,5

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1One study was not a RCT.
2The author reported that they randomised the participants into the intervention and control group but gave no specific details of the randomisation method.
3The author did not report on allocation concealment.
4No blinding of participants.
5Total population is fewer than 400 participants (a threshold rule‐of‐thumb value; using the usual α and β, and an effect size of 0.2 SD, representing a small effect).

Figures and Tables -
Summary of findings for the main comparison. Virtual reality endoscopic sinus surgery training compared to conventional training for surgical trainees to improve their performance in the operating theatre
Summary of findings 2. Virtual reality temporal bone dissection training compared to cadaveric temporal bone dissection training for surgical trainees to improve their performance in a controlled environment

Virtual reality temporal bone dissection training compared to cadaveric temporal bone dissection training for surgical trainees to improve their performance in a controlled environment

Patient or population: surgical trainees
Settings: controlled environment
Intervention: virtual reality temporal bone dissection training
Comparison: cadaveric temporal bone dissection training

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

No of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Cadaveric temporal bone dissection training

Virtual reality temporal bone dissection training

End‐product score
Welling scale
Scale from: 0 to 3.5

The mean end‐product score in the control group was
2.13 points

The mean end‐product score in the intervention group was
0.07 higher
(0.19 lower to 0.33 higher)

65
(1 study)

⊕⊕⊝⊝
low1,2,3

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1No blinding of participants.
218% incomplete outcome data.
3Total population size is fewer than 400 participants (a threshold rule‐of‐thumb value; using the usual α and β, and an effect size of 0.2 SD, representing a small effect).

Figures and Tables -
Summary of findings 2. Virtual reality temporal bone dissection training compared to cadaveric temporal bone dissection training for surgical trainees to improve their performance in a controlled environment
Summary of findings 3. Virtual reality temporal bone dissection training compared to conventional training for medical students to improve their performance in a controlled environment

Virtual reality temporal bone dissection training compared to conventional training for medical students to improve their performance in a controlled environment

Patient or population: medical students
Settings: controlled environment
Intervention: virtual reality temporal bone dissection training
Comparison: conventional training

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

No of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Conventional training

Virtual reality temporal bone dissection training

Procedural score
Technical step binary scale

The mean procedural score in the intervention groups was
1.11 standard deviations higher
(0.44 to 1.79 higher)

40
(2 studies)

⊕⊕⊝⊝
low1,2

End‐product score
Modified Welling scale

The mean end‐product score in the intervention groups was
2.60 standard deviations higher
(1.71 to 3.49 higher)

40
(2 studies)

⊕⊕⊝⊝
low1,2

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1The authors did not report on allocation concealment.
2Total population is fewer than 400 participants (a threshold rule‐of‐thumb value; using the usual α and β, and an effect size of 0.2 SD, representing a small effect).

Figures and Tables -
Summary of findings 3. Virtual reality temporal bone dissection training compared to conventional training for medical students to improve their performance in a controlled environment
Summary of findings 4. Virtual reality endoscopic sinus surgery training compared to conventional training for medical students to improve their performance in a controlled environment

Virtual reality endoscopic sinus surgery training compared to conventional training for medical students to improve their performance in a controlled environment

Patient or population: medical students
Settings: controlled environment
Intervention: virtual reality endoscopic sinus surgery training
Comparison: conventional training

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

No of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Conventional training

Virtual reality endoscopic sinus surgery training

Anatomy score
Anatomy identification test Scale from: 0 to 10

The mean anatomy score in the control group was
5.1

The mean anatomy score in the intervention group was
4.3 higher
(2.05 to 6.55 higher)

17
(1 study)

⊕⊝⊝⊝
very low1,2,3,4,5

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval

GRADE Working Group grades of evidence
High quality: Further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality: We are very uncertain about the estimate.

1The authors reported that they randomised the participants into intervention and control groups but gave no specific details of the randomisation methods.
2Did not mention allocation concealment.
3Did not mention blinding.
4There was 22% missing data in the control group.
5Total population is fewer than 400 participants (a threshold rule‐of‐thumb value; using the usual α and β, and an effect size of 0.2 SD, representing a small effect).

Figures and Tables -
Summary of findings 4. Virtual reality endoscopic sinus surgery training compared to conventional training for medical students to improve their performance in a controlled environment
Table 1. Additional data from Zhao 2011a and Zhao 2011b

Studies

Interventions

Procedural score

Mean ± SD

End‐product score

Mean ± SD

Overall score

Mean ± SD

Zhao 2011a

Virtual reality

27.73 ± 5.09

13.80 ± 1.46

7.00 ± 1.26

Conventional

21.17 ± 4.82

9.63 ± 1.44

2.83 ± 1.41

Zhao 2011b

Virtual reality

21.40 ± 2.56

18.20 ± 1.01

5.50 ± 1.74

Conventional

17.60 ± 4.64

13.20 ± 2.55

3.50 ± 1.00

SD: standard deviation

Figures and Tables -
Table 1. Additional data from Zhao 2011a and Zhao 2011b
Comparison 1. Virtual reality endoscopic sinus surgery versus conventional training: surgical trainees' performance in the operating theatre

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Psychomotor score Show forest plot

2

29

Mean Difference (IV, Fixed, 95% CI)

3.20 [2.05, 4.34]

2 Procedural score Show forest plot

1

25

Mean Difference (IV, Fixed, 95% CI)

‐5.50 [‐9.97, ‐1.03]

Figures and Tables -
Comparison 1. Virtual reality endoscopic sinus surgery versus conventional training: surgical trainees' performance in the operating theatre
Comparison 2. Virtual reality temporal bone dissection training versus cadaveric dissection training: surgical trainees' performance in a controlled environment

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 End‐product score Show forest plot

1

65

Mean Difference (IV, Fixed, 95% CI)

0.07 [‐0.19, 0.33]

Figures and Tables -
Comparison 2. Virtual reality temporal bone dissection training versus cadaveric dissection training: surgical trainees' performance in a controlled environment
Comparison 3. Virtual reality temporal bone dissection training versus conventional training: medical students' performance in a controlled environment

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Procedural score Show forest plot

2

40

Std. Mean Difference (IV, Fixed, 95% CI)

1.11 [0.44, 1.79]

2 End‐product score Show forest plot

2

40

Std. Mean Difference (IV, Fixed, 95% CI)

2.60 [1.71, 3.49]

Figures and Tables -
Comparison 3. Virtual reality temporal bone dissection training versus conventional training: medical students' performance in a controlled environment
Comparison 4. Virtual reality endoscopic sinus surgery versus conventional training: medical students' performance in a controlled environment

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Anatomy score Show forest plot

1

15

Mean Difference (IV, Fixed, 95% CI)

4.30 [2.05, 6.55]

Figures and Tables -
Comparison 4. Virtual reality endoscopic sinus surgery versus conventional training: medical students' performance in a controlled environment