Article Text

Download PDFPDF

Intraoperative evoked potential monitoring for detecting cerebral injury during adult aneurysm clipping surgery: a systematic review and meta-analysis of diagnostic test accuracy
  1. Fang Zhu1,2,
  2. Jason Chui1,
  3. Ian Herrick1,
  4. Janet Martin1,2,3
  1. 1 Department of Anesthesia and Perioperative Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
  2. 2 Centre for Medical Evidence Decision Integrity and Clinical Impact (MEDICI), Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
  3. 3 Department of Epidemiology and Biostatistics, Western University, London, Ontario, Canada
  1. Correspondence to Dr Jason Chui; jason.chui{at}lhsc.on.ca

Abstract

Objectives We aim to evaluate the diagnostic test accuracy (DTA) of intraoperative evoked potential (EP) monitoring to detect cerebral injury during clipping of cerebral aneurysms.

Design Systematic review.

Data sources Major electronic databases including MEDLINE, EMBASE, LILACS.

Eligibility criteria We included studies that reported the DTA of intraoperative EP monitoring during intracranial aneurysm clipping procedures in adult patients.

Data extraction and synthesis After quality assessment, we performed a meta-analysis using the bivariate random effects model, and calculated the possible range of DTA point estimates using a new best-case/worst-case scenario approach to quantify the impact of rescue intervention on DTA.

Results A total of 35 studies involving 4011 patients were included. The quality of the primary studies was modest and the heterogeneity across studies was high. The pooled sensitivity and specificity for predicting postoperative neurological deficits for the somatosensory evoked potential (SSEP) monitoring was 59% (95% CI: 39% to 76%; I2: 76%) and 86% (95% CI: 77% to 92%; I2: 94%), for motor evoked potential (MEP) monitoring was 81% (95% CI: 58% to 93%; I2: 54%) and 90% (95% CI: 86% to 93%; I2: 81%), and for combined SSEP and MEP monitoring was 92% (95% CI: 62% to 100%) and 88% (95% CI: 83% to 93%). The best-case/worst-case range for the pooled point estimates for sensitivity and specificity for SSEP was 50%–63% and 81%–100%, and for MEP was 59%–74% and 93%–100%, and for combined SSEP and MEP was 89%–94% and 83%–100%.

Conclusions Due to the modest quality and high heterogeneity of the existing primary studies, it is not possible to confidently support or refute the diagnostic value of EP monitoring in cerebral aneurysm clipping surgery. However, combined SSEP and MEP appears to provide the best DTA for predicting postoperative stroke. Contrary to popular assertion, the modest sensitivity of SSEP monitoring is not explained by the use of rescue intervention.

PROSPERO registration number CRD42015016884.

  • evoked potential monitoring
  • cerebral aneurysm
  • cerebral injury
  • ischemia
  • stroke
  • motor evoked potential

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This systematic review provides the most comprehensive evaluation of the quality and limitations of the primary studies investigating the diagnostic test accuracy of all evoked potential monitoring modalities used during clipping of intracranial aneurysm surgery.

  • We reported our results according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses diagnostic test accuracy checklist (2018).

  • This study is the first to use a best-case/worst-case (conservative/liberal) approach to the analysis of diagnostic test accuracy to quantify the magnitude of change in pooled sensitivity and specificity that were potentially attributable to the use of rescue intervention.

  • The results of this study were limited by the modest quality, inherited methodological limitations and high heterogeneity of the existing primary studies.

Introduction  

Intraoperative cerebral injury can result from various surgical and/or anaesthetic causes during surgical clipping of an intracranial aneurysm.1 Despite advances in surgical and anaesthetic technique, it is estimated that a new postoperative stroke occurs up to 11% of patients undergoing aneurysm clipping.1 2 Interest in mitigating the risk of intraoperative cerebral injury associated with aneurysm clipping has led to the deployment of intraoperative evoked potential (EP) monitoring during these procedures. Somatosensory evoked potential monitoring (SSEP) was first used in the mid-1980s3 in the intracranial anterior circulation aneurysm surgery. SSEP is performed by applying an electrical stimulation to a specific mixed nerve or dermatome to generate sensory stimulus and recording the responses along the ascending neural pathway and the sensory cortex. In the context of aneurysm clipping surgery, a significant increase in latency and reduction in amplitude signifies conduction delay, hence cerebral injury of somatosensory cortex or its ascending pathway. Brain-stem auditory evoked potential monitoring (BAEP) was later used specific for the posterior cerebral circulation aneurysm surgery.4 5 BAEP was performed by applying an auditory stimulus in the ear and recording the far-field potentials of neural generators (ie, nucleus) along the auditory pathway. As the auditory pathway (eg, superior olivary complex and lateral lemniscus) are located in brainstem, an abnormal BAEP response signifies injury to brainstem. More recently, the motor evoked potential (MEP) monitoring was introduced for improving the detection of subcortical injury.6–8 It is performed by applying an electrical stimulation on the motor cortex and recording the distal muscle responses in the upper and lower limbs. An abnormal MEP response signifies cerebral injury to motor cortex and its descending pathway.

Current evidence pertaining to the accuracy of modalities used for EP monitoring during cerebral aneurysm clipping procedures is conflicting. Although there is a substantial body of literature related to EP monitoring during these procedures, there is a lack of prospective, randomised studies evaluating EP monitoring and its impact on clinically relevant outcome measures. All observational studies addressing the diagnostic test accuracy (DTA) of EP monitoring have fundamental methodological flaws that confound the evaluation of diagnostic performance because the intraoperative identification of EP changes is typically accompanied by a rescue intervention.9 While the clinical rationale for rescue intervention is compelling (as the name suggests), the practice leads to misclassification bias in the study results as it is frequently unclear how outcome is altered as a result. None of the primary studies4 5 8 10–41 has adequately accounted for this misclassification bias in reporting the diagnostic accuracy of EP monitoring with some investigators choosing to classify effective rescue interventions (eg, associated with improvement or reversal of EP changes) as an averted cerebral injury and other investigators choosing to ignore the effects of rescue interventions. As the magnitude of the impact of this misclassification bias remains unquantified, discrepancies in DTA measurements that arise with the effects of rescue interventions continue to represent a central controversy regarding the role of EP monitoring during these procedures. In addition, there is a lack of published systematic reviews that adequately assess the quality and limitations of the included primary studies to evaluate the DTA and clinical relevance of different EP monitoring modalities in the context of aneurysm clipping procedures.

In view of these methodological concerns, we undertook a rigorous systematic review and meta-analysis to transparently evaluate the shortcomings in the existing literature and to quantify the effect that the use of rescue intervention is likely to have on DTA with the intent of informing future research priorities in this area.

Objectives

The primary objective of this systematic review was to provide an unbiased and complete assessment of the diagnostic value of EP monitoring in aneurysm clipping surgery, based on available evidence from clinical studies, while also relating clinically important outcomes to diagnostic performance. For the following reasons, this systematic review and meta-analysis extends beyond pre-existing reviews42 43 on this topic: i) it assesses the quality and limitations of the primary studies, ii) it quantifies the effect of intraoperative rescue interventions on the DTA and iii) it compares the DTA of all EP monitoring modalities for detecting intraoperative cerebral injury that results in an adverse neurological outcome and/or new radiological change that is consistent with stroke following intracranial aneurysm clipping surgery. The secondary objective of this review was to explore potential effect modifiers (eg, subgroups of patients, procedures or combinations of neuromonitoring modalities), which can maximise the DTA and may help to establish evidence-informed standards of appropriateness for clinical application of EP monitoring.

Methods

Protocol and registration

This systematic review was registered in PROSPERO (International prospective register of systematic review) prior to conducting the review (CRD42015016884).

Eligibility criteria

Types of studies

Prospective and retrospective clinical studies were included if they reported DTA of at least one EP monitoring modality and the number of cerebral injuries (either clinical or radiological) observed. We excluded case-control studies, registry data and data derived from an unpublished hospital database.

Participants

We included studies involving adult patients (>18 years of age) who underwent intracranial aneurysm clipping for ruptured or unruptured cerebral aneurysms under general anaesthesia irrespective of the pathology, location or size of the aneurysm. We excluded studies investigating patients undergoing endovascular coiling procedures or concomitant extracranial-intracranial bypass procedures or procedures involving cardiopulmonary bypass.

Index tests

Intraoperative EP monitoring was defined as the use of SSEP, MEP, BAEP or any combination of these modalities. Other modalities such as electroencephalography, cranial nerve monitoring were not included. The choice of stimulation technique (sites, montages and cut-off values) was not restricted.

Target conditions

The target condition of this review was defined as intraoperative cerebral injury. Intraoperative cerebral injury may be caused by multiple mechanisms, surgical or anaesthetic-related, ischaemic or non-ischaemic, which can result in different outcomes postoperatively.

Reference standards

The reference standard was the identification of new postoperative clinical neurological deficits or radiological changes indicating clinical or radiological cerebral injury (or stroke), respectively. The diagnosis of cerebral injury (or stroke) is usually established at the initial postoperative physical examination and confirmed with radiological imaging (eg, CT or MRI). If multiple outcome measures at different time points were reported in the primary studies, we used the outcome reported at the earliest time point of measurement as the reference standard in this review. The effect of different timing of outcome assessment on DTA was further examined by meta-regression to explore whether effect sizes differed over time.

Definition of events

An intraoperative rescue intervention is frequently employed when an EP abnormality is detected and prior to postoperative assessment for confirmation of neurological injury. As a consequence, there is a significant risk of outcome misclassification bias in all EP studies that evaluate DTA because neurological outcome may be altered as a result of the rescue intervention. Most studies do not address this potential major source of bias that arises when outcome is assigned (positive or negative) irrespective of whether an intraoperative therapeutic intervention was applied.

To quantify the risk of outcome misclassification bias, this systematic review quantified the treatment effect of rescue intervention in response to EP signal changes on DTA. We calculated the best-case scenario (a more liberal approach that assumed all intraoperative interventions had full beneficial treatment effects)44 45 and the worst-case scenario (a more conservative approach that assumed no beneficial treatment effect was produced by intraoperative rescue interventions, ie, by ignoring the potential impact of rescue interventions)) to generate an upper and lower range of plausible estimates of the potential impact of the rescue interventions. Using these two approaches, a range of pooled sensitivity and specificity was calculated that represented the upper and lower range of the pooled point DTA. In this innovative approach, we used data from the subset of studies reporting data regarding the reversibility of detected EP changes to evaluate the best-case and worst-case scenario assuming the impact of rescue intervention on DTA. Details of these two approaches are described in table 1. In essence, the two main differences of these two approaches were:

  • In the best-case scenario, a significant intraoperative EP signal change that is reported to be reversed by an intraoperative rescue intervention and not followed by a new postoperative neurological deficit or radiological change was defined as a true positive. Because all reports of reversal of the EP signal change(s) were assumed to reflect mitigation of injury as a consequence of the intervention.

  • In the best-case scenario, a false-negative was defined as a reported new postoperative neurological deficit or radiological change associated with a significant EP signal changes that were reversed to baseline following intraoperative surgical or anaesthetic rescue intervention.

Table 1

Definitions of positive and negative events in the best-case/worst-case scenario approaches for the analysis of diagnostic test accuracy

Information sources and search

A systematic search was performed from 1 January 1960 to 5 January 2016 (last updated on 27 June 2018) in seven major electronic database including MEDLINE, EMBASE, LILACS, IndMed and a variety of other sources including hand searching, snowballing of references lists, conference proceedings and other grey literature databases. The complete list search terms and strategies are summarised in online supplementary appendix 1.

Supplementary file 1

Study selection

Two authors (FZ, JC) independently scanned the titles and abstracts to identify potentially relevant studies. Full-text versions of all relevant articles were retrieved. Two authors (FZ, JC) independently selected the articles that met the predefined inclusion criteria using a standardised study inclusion form (pilot tested by two independent assessors on a sample of three papers). All disagreements were sorted by consensus of two authors (FZ, JC).

Data collection process

Data from each included study were independently extracted and entered by two authors (FZ, JC). All discrepancies were resolved by consensus. Details of the study population, diagnostic test and outcomes were extracted using a standardised electronic data extraction form. The data extraction form was tested and refined by two independent assessors on a sample of three papers. Translations of four articles (one in French, one in Japanese and two in Chinese) were required for data extraction. The translation was performed by two independent translators, who are fluent in the corresponding language and has experiences in performing data extraction for other systematic reviews.

Risk of bias and applicability

Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool.46 Signalling questions were tailored to this review, and were pretested and refined by two independent assessors on a sample of three papers. Two authors (FZ, JC) assessed and graded the quality of the primary studies independently. All disagreements were sorted by consensus of two authors (FZ, JC). All data were tabulated and displayed graphically in the methodological quality summary and study quality graph.

Synthesis of results

A 2×2 cross-table of postoperative neurological deficits versus EP changes was constructed for each study to calculate the corresponding sensitivity and specificity for each study. A meta-analysis using a bivariate random effects model was used to summarise the pooled effect estimates for each EP modality (SSEP, MEP, BAEP) and summary receiver operating characteristic (SROC) curves were plotted.47 48 Area under the ROC curve (AUC) >0.9 was considered to reflect high diagnostic performance.48 Positive likelihood ratio (LR+) and negative LR (LR−), diagnostic OR (DOR), positive predictive value and negative predictive value were calculated, and the LR scattergram was plotted. LR+ >10 and LR− <0.1 were considered to be convincing confirmation and exclusion test results, respectively. LR+ >5 and LR− <0.2 were considered to reflect moderate diagnostic test accuracy.45 46 A DOR >100 (corresponding to pairing of LR+ of 10 and LR− of 0.1) and >25 (corresponding to pairing of LR+ of 5 and LR− of 0.2) were considered evidence of high and adequate diagnostic test performance, respectively.49 The Fagan nomogram was used to illustrate the pretest probability to post-test probability changes in the LR of each EP modality. We assessed heterogeneity using the I2 statistic and Galbraith plots. We plotted the standardised log-transformed diagnostic OR against the reciprocal of its SE to look for potential outlying studies.

Sensitivity analysis was performed by sequential removal of outlying studies and/or the most influential studies. Funnel plots were used to explore evidence for potential publication bias and small study effects. For each EP monitoring modality, meta-regression was performed to explore the following potential effect modifiers on DTA parameters: timing of outcome assessment, length of follow-up, ruptured or unruptured aneurysm status, type of anaesthesia, the use of neuroprotection strategy, year of publication, sample size and study design.

Analysis was repeated to calculate the upper and lower range of the DTA in each EP modality (to reflect the potential effect of intraoperative rescue intervention). Ranges for DTA for each modality were calculated to quantify the effect of rescue intervention. All statistical analyses were executed using user written command MIDAS in Stata IC (V.13.1) or RevMan V.5.3 as appropriate.

Patient and public involvement

Patients and public were not involved in this study.

Results

We identified 3095 titles and abstracts from our systematic search in the electronic database, grey literature and hand searching. After exclusion of duplicate works and after screening of titles and abstracts, 156 full-text articles were retrieved for further evaluation. A total of 35 studies4 5 8 10–41 reported in 36 publications involving 4011 patients met criteria for inclusion in the meta-analysis (online supplementary appendix 2).

Supplementary file 2

Study and patient characteristics

All 354 5 8 10–41 included studies were observational, with a sample size ranging from 15 to 685, and were performed in 11 countries between 1987 and 2017 (table 2). There were no randomised controlled studies. Most studies (24 studies)4 5 8 10 12 14 16–20 24 26 28 32–41 used clinical examination as the outcome reference standard. Five studies used radiological imaging as the reference standard. (CT=4 studies, MR=1 study).21 22 25 27 31 Six studies used both clinical and radiological methods to define outcome.11 13 15 23 29 30

Table 2

Study characteristics

Most study patients were between 48 and 66 years of age and represented a mixed population of ruptured and unruptured cerebral aneurysms (Hunt and Hess grades 0–5). Of the 35 included studies, 18 studies reported only on patients with anterior circulation aneurysms, and 14 studies reported on patients with both anterior and posterior circulation aneurysms (online supplementary table 1).

Supplementary file 3

Evoked potential monitoring and anaesthetic characteristics

Most studies (16 studies) investigated SSEP monitoring,5 18 20 26 29 31–41 9 studies investigated MEP monitoring8 12 14 15 17 22–24 27 and 9 studies investigated SSEP and MEP alone and combined.10 11 13 16 19 21 25 28 30 Only one study investigated the combined use of SSEP and BAEP4 (online supplementary table 2). In the primary studies, the use of upper limb and/or lower limb SSEP and MEP monitoring was mainly determined by the location of the aneurysm. Diagnostic criteria for SSEP generally included both amplitude reduction and conduction delay. In studies that used MEP monitoring, both transcranial (tc) and direct cortical (dc) MEPs were investigated, and most studies (13 of 17 studies) used amplitude reduction as the primary diagnostic criterion. All studies reported high rates of successful recording; 69%–100% for SSEP and 60%–100% for MEP. None of the studies specified the use of certified neurophysiologists for EP monitoring.

Most recent studies used a total intravenous anaesthetic technique, particularly when investigating MEP monitoring, and avoided the use of nitrous oxide (online supplementary table 2). In comparison, earlier trials typically combined the use of inhaled anaesthetics and nitrous oxide. Despite the well-known anaesthetic-induced suppression of EP monitoring signals, five studies did not report the anaesthetic regimen used.

Summary of events

Despite the use of EP monitoring and timely intervention, 8.3% (302 of 3621 patients) and 11.2% (105 of 941 patients) were reported to have new postoperative neurological deficits (clinical stroke) and radiological features of stroke, respectively. Neuroprotection strategies, including pharmacological burst suppression and passive hypothermia, were employed routinely in some studies (table 3). In cases with intraoperative EP signal changes, most studies reported that surgical and/or anaesthetic interventions were applied with the intention to mitigate possible cerebral injury. Reported manoeuvres included release of temporary clips, repositioning of permanent clips or brain retractor(s) and augmentation of blood pressure to improve cerebral perfusion pressure.

Table 3

Summary of events

Risk of bias and applicability

The methodological quality of the studies was of concern (online supplementary figure 1-2). Almost half of the primary studies had significant deficiencies in reported information for all quality domains. The most common domains with reporting deficiencies related to the recruitment process, inclusion and exclusion criteria as well as execution of reference standards (ie, evaluation of neurological outcome). The timing of application of reference standards was variable (from immediately after surgery to 1 year postoperatively) and was also generally poorly reported across studies.

Supplementary file 4

Diagnostic test accuracy of SSEP

Twenty-two studies4 5 10 11 16 18 20 26 28–41 reported the DTA of SSEP monitoring; 17 studies4 5 10 11 16 18 28 30 31 33–39 41 (1765 patients) assessed the DTA for predicting new postoperative neurological deficits (clinical stroke) and 6 studies11 18 20 26 28 29 (474 patients) assessed the DTA for predicting postoperative radiological evidence of stroke (radiological stroke). The pooled sensitivity and specificity of SSEP alone for predicting postoperative stroke was 59% (95% CI: 39% to 76%; I2: 76%) and 86% (95% CI: 77% to 92%; I2: 94%), respectively (figure 1). The LR+ was 4.17 (95% CI: 2.26 to 7.68) and LR− was 0.48 (95% CI: 0.30 to 0.78) (table 4). The AUC in the SROC curve was 0.83 (95% CI: 0.79 to 0.86) (online supplementary figure 3). The pooled sensitivity for predicting radiological stroke was 55% (95% CI: 0.39% to 0.7%; I2: 0%) and the pooled specificity was 89% (95% CI: 84% to 93%; I2: 58%). However, this was based on only six studies reporting on radiological stroke (table 4).

Figure 1

Forest plot of somatosensory evoked potential for predicting postoperative neurological deficit.

Table 4

Summary of results

Sensitivity analysis and heterogeneity of SSEP

Analysing the sensitivity results for SSEP, the DTA is only slightly changed after sequential removal of the two most influential studies34 35; the pooled sensitivity and specificity slightly decreased to 58% (95% CI: 43% to 72%) and 85% (95% CI: 78% to 90%) (online supplementary table 3) and the LR+ dropped slightly to 3.93 (95% CI: 2.93 to 5.27) and the LR− increased to 0.49 (95% CI: 0.36 to 0.66). The DOR decreased slightly from 8.66 to 8.01. There was no publication bias or small study effect (p=0.82) identified. In the meta-regression, the use of inhaled anaesthetics was identified to significantly affect the pooled DTA of SSEP in predicting postoperative neurological deficits (p<0.001) compared with combined inhaled and intravenous anaesthetics. Of note, publication year, sample size, location of aneurysm, mean age, ruptured or unruptured aneurysm status, the choice of SSEP diagnostic criteria, the use of nitrous oxide and timing of outcome assessment did not change the DTA of SSEP.

Diagnostic test accuracy of MEP

Fourteen studies investigated the DTA of MEP monitoring; for predicting postoperative neurological injury in 14 studies8 11 12 14–17 22 24 25 27 28 30 (1762 patients), and for predicting postoperative radiological stroke in 9 studies8 11 13 15 16 23 24 27 28 (740 patients). The pooled sensitivity of MEP was 81% (95% CI: 58% to 93%; I2: 54%) (higher than SSEP) and the pooled specificity was 90% (95% CI: 86% to 93%; I2: 81%) (figure 2). The DOR of MEP was 38.1 (95% CI: 13.04 to 111.4); the LR+ was 8.19 (95% CI: 5.78 to 11.6) and LR− was 0.21 (95% CI: 0.09 to 0.52); the AUC in the SROC curve was 0.93 (95% CI: 0.90 to 0.95) (online supplementary figure 4), indicating moderate diagnostic accuracy and high overall diagnostic performance. The false negative rate of MEP (2.2%) was lower than the false negative rate of SSEP (6.3%). On subgroup analysis, dc-MEP (98%; 95% CI: 10% to 100%; I2: 63%) achieved higher pooled sensitivity than tc-MEP (58%; 95% CI: 44% to 71%; I2: 0%). Pool specificities were similar for tc-MEP and dc-MEP, which were 92% (95% CI: 87% to 95%; I2: 84%) and 87% (95% CI: 82% to 92%; I2: 9%), respectively. In contrast to SSEP, MEP was found to have a lower pooled sensitivity of 63% (95% CI: 51% to 74%; I2=0%) for predicting radiological evidence of neurological injury; however, there were only nine studies reporting radiological outcomes. The pooled specificity was similar at 90% (95% CI: 81% to 95%; I2=82%).

Figure 2

Forest plot of motor evoked potential for predicting postoperative neurological deficit.

Sensitivity analysis and heterogeneity of MEP

Sensitivity analysis did not identify a significant change after removing the outlying study (online supplementary table 3). There was no publication bias. In the univariate meta-regression, year of publication, sample size, mean age, location of aneurysm, anaesthetic regimens, use of nitrous oxide, the time of outcome assessment, ruptured or unruptured aneurysm status were not found to have an interaction with the DTA of MEP monitoring. The use of muscle relaxant and location of aneurysm were identified to significantly affect the pooled DTA of MEP in predicting postoperative neurological deficits (p<0.01).

Diagnostic test accuracy of combined SSEP and MEP

Only three studies of 201 patients11 19 21 assessed the DTA of combined SSEP and MEP to diagnose postoperative neurological deficits. Based on this limited literature reporting the combined use of SSEP and MEP monitoring, the pooled sensitivity was 92% (95% CI: 62% to 100%), the specificity was 88% (95% CI: 83% to 93%) and the pooled DOR was 83.5 (table 4). However, this results was only based on 3 studies with 201 patients.

Diagnostic accuracy of BAEP and combined SSEP and BAEP

A meta-analysis was not conducted on BAEP data as only on one study4 reported the DTA of BAEP alone, or combined SSEP and BAEP during surgery for clipping posterior circulation aneurysms. This single study4 reported sensitivity and specificity of BAEP alone was 42% (95% CI: 20% to 67%) and 89% (95% CI: 78% to 95%), respectively, whereas the combined SSEP and BAEP were 84% (95% CI: 60% to 97%) and 79% (95% CI: 66% to 88%), respectively (table 4).

Impact of rescue intervention on DTA

The preceding calculations of DTA used a worst-case scenario (conservative) approach that ignored any potential beneficial effect of rescue interventions. As not all studies reported data regarding the reversibility of detected EP changes, analysis of the impact of rescue intervention on DTA was conducted on a subset of studies that reported reversible signal changes (10 studies (1262 patients) for SSEP; 5 studies (1152 patients) for MEP) (table 5).

Table 5

Summary of estimated ranges of DTA in predicting postoperative neurological deficit using a best-case scenario and worst-case scenario approach

Using this methodology, the pooled sensitivity and specificity of SSEP using a best-case scenario (liberal) approach that included the potential impact of rescue intervention were 63% and 100%, respectively. The pooled sensitivity and specificity of SSEP using a worst-case scenario (conservative) approach that excluded the potential impact of rescue intervention were 50% and 81%, respectively. Thus, the range of pooled sensitivity and specificity for SSEP for predicting postoperative neurological deficits was 50%–63% and 81%–100%, respectively (ie, worst-case and best-case scenario of DTA values that reflect the inclusion or exclusion of the potential impact of rescue intervention on outcome). These results indicate that the low sensitivity observed with SSEP is not attributed to the effect of rescue intervention on misclassification because the pooled sensitivity of SSEP was, at best, only 63%.

Similarly, the pooled sensitivity and specificity of MEP using a best-case (liberal) approach were 74% and 100%, respectively. The pooled sensitivity and specificity of MEP based on a worst-case scenario (conservative) approach (with exclusion of the potential impact of rescue intervention) were 59% and 93%, respectively. The range of pooled sensitivity and specificity for MEP monitoring was 59%–74% and 93%–100%, respectively, for predicting postoperative neurological deficits.

The range of pooled sensitivity and specificity for combined SSEP and MEP monitoring was 89%–94% and 83%–100%, respectively, for predicting postoperative neurological deficits.

Discussion

Principal findings

This meta-analysis provides the most comprehensive systematic review of the current literature related to the use of EP monitoring during intracranial aneurysm surgery and adds several new insights to the current EP monitoring literature. First, we identified that the quality of the primary studies was modest and marked heterogeneity was encountered. Due to these uncertainties in the existing evidence base, it is not possible to confidently support or refute the diagnostic value of EP monitoring in anterior circulation aneurysm clipping surgery. As such, caution is advocated when deploying EP monitoring mainly to predict intraoperative conditions associated with neurological injury.

Second, this meta-analysis is the first to address the longstanding controversy surrounding the potential impact of rescue interventions on outcome using an best-case/worst-case analysis of DTA that adjusted the classification of intraoperative events based on the response to intervention. This approach quantified the magnitude of change in pooled sensitivity and specificity potentially attributable to the rescue intervention. For SSEP monitoring, the possible ranges of pooled sensitivity and specificity were 50%–63% and 81%–100%, respectively. For MEP monitoring, the possible ranges of pooled sensitivity and specificity for MEP monitoring were 59%–74% and 93%–100%. These results indicate that the modest sensitivity observed with SSEP or MEP alone monitoring are not attributed to the effect of rescue intervention on misclassification.

Third, based on current evidence our meta-analysis indicates that combined SSEP and MEP monitoring may have superior DTA compared with other EP modalities used alone for predicting neurological injury during craniotomy for cerebral aneurysm clipping. The DOR was 83.5, indicating moderate-to-high diagnostic performance. This finding supports the general opinion that multimodality EP monitoring improves diagnostic accuracy compared with use of single EP modality monitoring.6–8 It must be emphasised however, that this result is based on a small number of studies. The DTA for each EP monitoring modality was higher for predicting postoperative neurological deficits than for predicting postoperative radiological stroke, although the number of studies using a radiological end point was small.

The results of this meta-analysis are focused on the diagnostic accuracy; however, it must be recognised that the choice of EP modality is not exclusively a function of test performance. For example, although our results suggest that MEP monitoring alone has superior DTA compared with SSEP monitoring alone, other relevant clinical factors such as the intermittent nature of MEP monitoring, challenges associated with obtaining satisfactory MEP signals, stimulated movement during intracranial microvascular surgery, restrictions on anaesthetic choices (particularly the use of inhaled anaesthetics and muscle relaxants) and the potential relevance of other information that can be obtained from SSEP monitoring (eg, brachial plexus or ulnar nerve injury due to positioning) remain relevant factors in the clinical choice of monitoringmodality.

Furthermore, specificity is relatively high for all modalities and the absence of an EP signal change is associated with a high probability that the patient will not awaken with a new neurological deficit. Many operative teams find value in the reassurance afforded by the absence of EP signal change. However, sensitivity is modest at best for most of these modalities, particularly SSEP monitoring alone. Contrary to conventional opinion, our results suggest that this lower sensitivity is not attributed to the effect of rescue intervention on misclassification. As a consequence, this modest sensitivity may be relevant when EP monitoring is used during aneurysm surgery to identify cerebral injury and/or guide intervention particularly when the triggering event is not readily identified.

A recently published systematic review42 examined SSEP monitoring used alone during aneurysm surgery. This review, based on only 13 studies, reported similar pooled sensitivity (56.8%) and specificity (84.5%) and a DOR of 7.8. The authors concluded that patients with neurological deficits following aneurysm surgery are seven times more likely to have developed an intraoperative change in SSEP, and that SSEP is highly specific for predicting impending stroke. The low sensitivity observed was attributed to rescue intervention. Notwithstanding these conclusions, it needs to be noted that a DOR of 7.8 reflects inadequate test performance since DOR >100 and >25 are reported to reflect high and adequate diagnostic test performance, respectively.49 Although the authors speculate, like many other EP monitoring studies50 that the low sensitivity reported with SSEP monitoring reflects successful rescue intervention, our results suggest that rescue intervention does not substantively alter the pooled sensitivity for SSEP monitoring. Furthermore, the previous review42 did not account for the modest quality of the primary studies or heterogeneity across the studies.

The other recently published systematic review43 examined the DTA of SSEP or MEP monitoring used alone during aneurysm surgery. This review, examining only two EP modalities, reported that MEP monitoring has a higher DTA than SSEP monitoring alone in predicting postoperative neurological deficits (consistent with our findings). However, the review was based on a small subset of studies (eight for SSEP and five for both tc-MEP and dc-MEP) and did not account for the quality of primary studies or heterogeneity across the studies.

Limitations

An important weakness of this systematic review is the potential bias due to the modest quality and incomplete reporting of the primary studies. The most significant source of bias is the execution of the reference standard to identify patients with clinical and radiological strokes. There was no consistent method used to assess outcome across the primary studies. In addition, most of the included studies reported incomplete results and/or inadequate methodological detail (online supplementary figure 1-2). However, the DTA estimates for all EP modalities in this analysis were not significantly changed when selected indicators of quality were accounted for in sensitivity analyses.

Another important weakness is the paucity of direct comparisons of EP monitoring modalities within the individual studies, necessitating indirect comparisons across studies within the meta-analysis through bivariate analysis. While this is common for DTA meta-analyses, and reflects the inherent paucity of direct comparisons in the field of diagnostics, it still requires readers to apply cautious interpretation of comparative DTA estimates from meta-analysis since indirect comparisons across studies are likely to be further confounded by differences in a combination of identifiable and unidentifiable factors (such as differences in baseline patient characteristics and co-interventions across studies, etc). Nevertheless, despite the limitations, it is crucial to emphasise that DTA estimates derived from rigorous systematic review and meta-analyses are considered the best possible level of evidence and are considered superior to relying on incomplete assessment of selected studies without aggregation through meta-analysis.

High heterogeneities were encountered during the analysis of DTA of SSEP and MEP. Unfortunately, we were unable to fully account for the high heterogeneity. The DTA estimates did not significantly change in sensitivity analyses attempting to explore these potential moderators. Exploration of funnel plots did not suggest a small study effect or detectable evidence of publication bias. The sources of high heterogeneity are very likely multifactorial and resulted from both clinical (eg, different evoked potential monitoring set-up, diagnostic criteria, different co-interventions or thresholds for rescue interventions, different risk factors across patient groups) and methodological heterogeneity (eg, time of outcome assessment, reference standards, loss to follow-up) between the primary studies.

Implications

Our systematic review suggests that, based on the current literature, the available data are insufficient to either definitively support or refute the diagnostic value of EP monitoring in anterior circulation aneurysm clipping surgery. Despite three decades of using and investigating EP monitoring in this clinical context, this inadequacy remains a reflection of the modest quality of primary studies, non-homogeneity among studies and underlying methodological flaws in study design. As a consequence of this gap in the literature base, fundamental questions regarding the value of EP monitoring for the detection of intraoperative cerebral injury during these procedures remain inadequately addressed; the development of evidence-based guidelines for the deployment of these modalities in the context of cerebral aneurysm surgery remains elusive and decision support in relation to the comparative efficacy of rescue interventions, particularly in circumstances when a triggering event is not immediately apparent, remains undefined.

Appropriately designed prospective studies will be required to evaluate the impact of EP-guided intervention before evidence-based guidance can be provided. In the context of the current literature base, controversy surrounds ethical concerns regarding the randomisation of access to EP monitoring or failure to intervene if an EP change is detected.9 Methodological options to address these concerns have been advanced including the use of pragmatic active-controlled head-to-head study designs or clustered randomisation51 in centres where EP monitoring is not a routine practice or a large-scale observational study with propensity matching9 to compare EP monitored versus non-monitored patients with the benefit of mitigating ethical concerns but at the expense of introducing potential biases from unknown/neglected confounders. A third option that has not yet been adequately addressed would be to conduct a large-scale pragmatic factorial trial where all patients are randomised to active monitoring: SSEP or MEP or SSEP+MEP or BAEP. This would allow for superiority among modalities to be explored. If no differences were found after testing in an adequately powered pragmatic factorial trial, then there may be room for discussion about whether the modalities are equally efficacious or equally non-efficacious, before subsequent randomised trials are conducted to evaluate remaining equipoise.

Conclusion

This extensive systematic review and meta-analysis explored the current literature pertaining to the use of EP monitoring to predict the risk of stroke during intracranial aneurysm surgery and introduced a new methodological approach to evaluate the impact of rescue intervention on DTA. Despite an extensive literature base, modest quality and high heterogeneity across studies is reported to hinder evidence-based decision making and present opportunities for further research in this area. Based on the limited evidence available, combined SSEP and MEP monitoring appears to provide the best DTA for predicting postoperative stroke compared with SSEP or MEP alone. Using a new best-case/worst-case scenario approach, we found the modest sensitivity of SSEP monitoring alone is not explained by misclassification bias introduced by the use of rescue intervention during these procedures. We suggest that future studies might consider using this new methodology to address a prominent methodological controversy that continues to hinder the interpretation of DTA for intraoperative EP monitoring.

Acknowledgments

The authors would like to thank Brie McConnell, Medical Librarian, for her assistance in preparing the search strategy for this systematic review. The authors would also like to acknowledge Maurcico Giraldo and Rizq Alamri, neuroanaesthesia fellows, for their help in validating the study inclusion form and data collection form.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.

Footnotes

  • Patient consent for publication Not required.

  • FZ and JC contributed equally.

  • IH and JM contributed equally.

  • Contributors All authors have contributed significantly to the design, conduct, analysis and reporting of this systematic review. FZ and JC contributed to the data collection and screening of publications. All authors gave final approval for submission.

  • Funding The Evidence-Based Perioperative Clinical Outcomes Research (EPiCOR) centre, Western University, Canada and The Centre for Medical Evidence, Decision Integrity and Clinical Impact (MEDICI), Western University, provided statistical and technical support for this systematic review.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.