Introduction
Non-contrast CT (NCCT) is the most widely available imaging modality for assessing patients with acute stroke. Many centres now also perform CT angiography (CTA) as part of their stroke imaging protocol [
1]. However, only limited observer reliability data exists for the reporting of CTA in acute stroke [
2,
3]. A recent consensus statement on angiography grading standards for acute ischaemic stroke recommended that further reliability studies should be performed [
4].
The Third International Stroke Trial (IST-3) was a multicentre, randomised controlled trial in 3035 patients that tested whether intravenous recombinant tissue plasminogen activator (rt-PA), given within 6 h of ischaemic stroke, improved functional outcome at 6 months [
5]. Standardised brain imaging (predominantly NCCT) was mandatory for all IST-3 patients prior to randomisation in the trial. In some centres, CTA was also routinely obtained.
Using CTA to assess intracerebral arterial patency is limited by the lack of a grading scale developed specifically for cross-sectional imaging [
6]. To date, most trials incorporating CTA have used one of the catheter angiography scales, e.g. Thrombolysis in Cerebral Infarction (TICI) [
7]. However, there are two reasons why application of catheter angiography scales to CTA (or MR angiography) without modification, is problematic: 1) catheter angiography scales assess distal tissue perfusion, but perfusion is not appreciable on CTA (unless it is time resolved) [
6] and 2) catheter angiography scales conflate features of cerebral arterial patency, flow and perfusion into one scale thereby potentially increasing sources of observer disagreement. Unlike catheter angiography, standard CTA provides only a snapshot in time and can only be used to assess arterial patency rather than flow. A new IST-3 angiography score was developed in an attempt to overcome the limitations of applying catheter angiography scores to CTA. The IST-3 angiography score aims to assess only those characteristics of angiography that are identifiable on CTA, especially arterial luminal patency at the main point of occlusion [
6].
Our primary aim was to investigate inter- and intra-observer reliability of expert readers assessing CTA in acute ischaemic stroke. We also sought to establish how less-experienced readers perform and to evaluate a new CTA grading scale, the IST-3 angiography score.
Discussion
In this study, where 14 observers with differing levels of experience assessed a purposive sample of 15 examinations, we show that CTA features have slightly higher levels of agreement than non-contrast CT features. Imaging characteristics that are likely to have the greatest clinical impact (e.g. the presence and severity of arterial occlusion) are reported with the highest inter-observer agreement, both by experienced (K-alpha > 0.60) and inexperienced observers. There was less agreement over arterial collateral supply and use of CTA-SI to identify perfusion deficits, even among experienced observers (K-alpha 0.30–0.60). Despite being comparatively inexperienced, the participating radiology trainees that had undertaken additional neuroradiology training (neuroradiology fellows) performed as well as experts in the assessment of CTA. This implies that, with adequate training, CTA can be reliably assessed even by readers with less experience.
The IST-3 angiography score is an adaptation of earlier scores (TICI, Mori). It is designed to overcome the limitations of using a catheter angiography score for the assessment of CTA by primarily assessing residual arterial calibre at the point of stenosis and contrast penetration into the major distal vessels only and makes no attempt to assess distal tissue perfusion [
6]. The present work represents the first external testing of observer reliability for the IST-3 angiography score, and it compares favourably with TICI.
To the best of our knowledge, there are only a few previous studies of CTA reliability in stroke; all had fewer than seven observers, and none tested all the CTA signs assessed in our study. Knauth et al. reported an inter-reader kappa = 0.78 for two readers identifying the correct location of occlusion on CTA in acute ischaemic stroke [
23]. Suh et al. compared TICI versus a modified TICI score and found both scales were moderately repeatable (intra-class coefficients (ICC), 0.67 and 0.73, respectively) across five readers [
24]. We did not replicate the inter- and intra- observer reliability demonstrated by Puetz and colleagues in their original report of the Clot Burden Score (six readers, ICC = 0.87 and 0.96, respectively) despite similar reader numbers [
16]. Similarly, in the original report defining their classification of collateral status, Miteff and colleagues demonstrated an inter-observer reliability of kappa = 0.93 for two observers [
17]. We were unable to replicate those findings, but our results are more consistent with other methods of assessing leptomeningeal flow as demonstrated on a systematic review (0.49–0.87) [
3]. Neither did we replicate the results from three recent articles, each with four readers, that demonstrated improved detection of infarct using CTA-SI over NCCT alone; Hopyan et al. improved reader agreement from kappa 0.28–0.44 to 0.34–0.57 [
25], Finlayson et al. showed an increase in ICC from 0.83 to 0.88 [
26] while van Seeters and colleagues improved their ICC range from 0.54–0.62 to 0.57–0.76 [
27].
These previous studies represent a mixture of kappa statistics and ICC and are not directly comparable with our K-alpha results; any comparisons should be treated with caution. Nevertheless, kappa, ICC and K-alpha work on the same numerical scale and are therefore broadly similar. We opted to use K-alpha for several reasons. Kappa is only suitable for assessing two observers rating nominal data and even then may not be the most suitable test [
28,
29]; we had up to seven observers per analysis and a mixture of nominal and ordinal data. K-alpha has been shown to provide a more robust measure of observer variance than kappa or ICC and provides several advantages to the user; it allows comparisons between any number of observers, it can handle both categorical and ordinal data, it is less prone to the influences of observer bias and result prevalence and it can still be computed in the presence of missing data [
20,
30].
Other strengths of our work include more readers than in previous studies; calculation of both inter- and intra-reader reliability; use of a robust, standardised image analysis platform, previously shown to provide consistent multiuser reporting [
9,
10]; complete blinding of readers to all clinical information and to any other scan assessments and use of representative cases from a multicentre trial which increases the generalisability and real world relevance of our results.
Our work also has some limitations. Firstly, in contrast to previous work [
10], we did not formally produce a single reference standard for the ‘correct’ interpretation of the 15 scans to compare with other readers. Use of a reference standard would have allowed us to assess reader accuracy in addition to reader reliability. The results in Table
2 represent the consensus opinion of three senior neuroradiologists but are nevertheless still open to interpretation error. By confirming high observer agreement among a group of seven experienced readers, including several senior neuroradiologists, we believe that our results are as informative as reader comparisons set against any reference standard created from the same data. We do however acknowledge the possibility that the expert panel was reliable in making false diagnoses but feel this is highly unlikely. Secondly, several of the characteristics we tested in our intra-observer analyses are probably underpowered.
Acknowledgments
We thank all IST-3 collaborators, including the National Co-ordinators and Participating Centres, Steering Committee, Data Monitoring Committee and Event Adjudication Committee. The IST-3 collaborative group wishes chiefly to acknowledge all patients who participated in the study, and the many individuals not specifically mentioned in the paper that have provided support.
IST-3 was funded from many sources; these are listed in online appendix
2. The views expressed in this work are those of the authors and not of the funding sources.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.