Background
Core outcome sets (COS) are recommended for use in clinical effectiveness trials to reduce heterogeneity of reported outcomes and aid data synthesis across similar trials, enhancing evidence-based medicine and reducing research waste [
1‐
4]. A COS is an agreed minimum set of outcomes to be measured and reported in all trials of a particular condition or intervention [
4]. Their development requires consensus methodology to establish outcomes considered most essential to patients and health professionals. One increasingly used approach is a Delphi survey [
5‐
7], where participants are required to anonymously rate the importance of a long list of potential outcomes in sequential (postal or electronic) surveys or ‘rounds’ [
8]. Feedback from each round is presented in the subsequent round such that participants can consider the opinions of others before re-rating items. The results of the Delphi inform any further consensus methods (such as a consensus meeting [
9‐
11]) and the final COS. Guidelines exist for the Delphi process, within the context of a COS [
4,
5,
12] and more widely elsewhere [
13‐
17], with emphases on selection of stakeholders, number of rounds, presentation of feedback and criteria for consensus. Far less focus has been awarded to the actual design of the Delphi survey itself, which has been criticised as often being poorly formulated [
17,
18].
One issue that may be important within Delphi surveys is the ordering of questions and the potential for question order to impact on both overall survey response rate and individual responses to questions. Within Social and Health Sciences, there are numerous publications relating to the design of questionnaires or surveys and question order is frequently discussed [
19‐
21]. The choice of initial items may influence a respondent’s willingness or motivation to complete a survey since early items may shape a respondent’s understanding of what the survey is about [
19]. Previous literature, including randomised studies, has demonstrated mixed effects in terms of overall survey response rate [
22‐
24]. In terms of actual responses to questions, when items are not asked in isolation it is likely (at least for some individuals) that responses to earlier questions will be used as a comparative standard by which to respond; consequently, the order of questions (or the ‘context’ in which questions are asked) may influence responses [
21,
25]. This phenomenon is often referred to as a ‘context effect’ [
19,
20,
25]. Indeed, such effects have been observed in numerous randomised and non-randomised studies [
19‐
21,
25‐
29]. While focus has commonly been on the ordering of general and specific questions (with the recommendation that the general question should precede the specific, since the specific are more likely to influence the general than vice versa) [
20,
26,
28‐
31], effects have also been observed with the ordering of two or more similarly specific items [
21,
25]. In order to explore question order effects, Moore [
25] suggests a comparison of responses to two questions in the non-comparative context (when question asked first) and the comparative context (when question asked after another one). When responses to the two questions become more similar in the comparative than the non-comparative context we observe what is termed a consistency effect [
21,
25], where respondents attempt to be consistent with their earlier responses. When responses become more different in the comparative context we observe a contrast effect [
21,
25], respondents emphasising differences between items rather than the similarities.
In the context of Delphi surveys, we are only aware of one publication warning of such context effects [
16]. Delphi surveys, constructed for COS development, generally include attitudinal questions, asking respondents to rate the importance of a succession of specific outcomes that may be valued differently. In such a setting it seems plausible that question order and context effects may lead to a significant bias [
16], which is likely to influence the resulting COS.
This study explored the impact of question order within a Delphi survey used in the development of a COS for oesophageal cancer surgery. The following hypotheses were considered:
1.
The ordering of items impacts on Delphi survey response rates;
2.
The ordering of items effects participants’ responses (context effects); and the effect differs among patients and health professionals;
3.
The ordering of items influences the items retained at the end of the first Delphi round.
Discussion
This methodological work examined the impact of question order within the first round of a Delphi survey to inform a COS for oesophageal cancer resection surgery. Question order did not impact on response rates within patients; however, fewer health professionals responded to the survey when clinical items appeared first and PRO items last. While participants consistently rated clinical items more essential than PROs (irrespective of question order or stakeholder group), context effects (where prior questions affect responses to later questions) were observed among both stakeholder groups, though the direction of these effects differed. Patients inflated the importance of PROs when rating them last in the survey, being more consistent with their earlier judgments regarding clinical items (consistency effect), whereas professionals inflated the importance of clinical items when they appeared last, emphasising their greater importance compared to PROs previously rated (contrast effect). Moreover, this study observed that question-order impacted on items retained at the end of round 1 (based on pre-specified criteria), which will ultimately influence the final COS and, therefore, is of utmost importance. Given these findings, we would strongly recommend that potential question order effects are considered when designing and implementing a Delphi survey for the development of a COS.
The results of this study agree with previous literature within survey research (including both non-randomised and randomised studies) and extend it to Delphi surveys and COS development. The majority of research into question order effects has dealt with behavioural or factual items that are verifiable. In Delphi surveys for COS development participants are asked attitudinal questions, being required to rate how important they feel different outcomes are relative to each other. In this situation, it is implicit that participants consider items in comparison to previous items; hence, context effects are perhaps more likely than in other settings [
16,
39].
Items presented at the beginning of a survey may motivate or demotivate an individual to respond [
19]. In this study, health professionals appear to have been less motivated to respond if clinical items appeared first. One may hypothesise that if PROs appear first, a professional might feel strongly compelled to express their opinion that these are not the most important items, whereas if clinical items (such as survival) appear first that same professional might feel less driven (or less need) to respond. Within this study, opposite context effects were seen within patients and professionals. This agrees with Birckart [
40] who argues that consistency effects (what he terms ‘carryover’) are more likely when respondents feel they are moderately knowledgeable (such as patients), whereas contrast effects (‘backfire’) are more likely when respondents are highly knowledgeable (such as health professionals in the relevant field).
Recent research has demonstrated that different types of health professionals value different outcomes and that each group should be adequately represented [
41,
42]. Within this study, 76% of health professionals were surgeons (consultant and registrar) and only 24% specialist nurses. Additional post-hoc analyses demonstrated that surgeons and nurses prioritised different outcomes. Moreover, question order resulted in different context effects within these two groups of health professionals. While the number of nurses in this analysis was small, given the observed differences we would support recent recommendations that different health professionals should be considered as separate panels during the Delphi process [
42].
In the current study, some degree of imbalance was observed between the randomisation groups in terms of the gender of patients and the age of health professionals. This may be due to chance or it may (at least partially) be due to certain individuals being more or less likely to respond to the different versions of the survey (PRO first and PRO last). For example, women may be more likely (or men less likely) and younger professionals more likely (or older professionals less likely) to respond when clinical items are first (PRO last). Previous authors have suggested that the magnitude of order effects may depend on participant demographics [
26]; however, few studies have provided empirical evidence. McFarland found no evidence of question order effect varying with sex or education [
29], but a later study observed order effects among less-educated respondents only [
30,
40]. We are not aware of any studies that have specifically considered age. Further exploration within this current study examined male and female patients and younger and older professionals separately (Additional files
3,
4,
5 and
6: Tables S3-S6). Patterns were largely consistent, with perhaps a greater consistency effect within women than men and a greater contrast effect within younger rather than older professionals; however, numbers of participants were small within individual groups.
Patients and health professionals were the only stakeholder groups included in this study and it is possible that different question order effects may occur in other groups such as methodologists or regulators. However, patients and health professionals are considered the most essential stakeholders to include in the development of a COS [
5] and are likely to make up a large majority, if not all, of the Delphi participants. This study included participants only from the UK and within a single disease setting; it is important, therefore, to repeat this study in other countries and settings. In addition, not all Delphi surveys drop items (deemed less essential) at the end of each round, instead retaining all items until the end of the final round. However, in such a scenario it is highly likely that if context effects are present, due to the design of the survey, they will impact on responses in all rounds and the subsequent final COS.
This is the first study we are aware of investigating question order within a Delphi for COS development and, while exploratory in nature, it provides the best evidence at present, that such effects should be considered in this setting. Initial piloting of the Delphi survey may be valuable in identifying potential question order effects and we would recommend that this is always done. Cognitive interviews, such as ‘Think Aloud’ [
43], carried out while individuals complete the survey with different orderings of items, may help identify if and how responses are influenced by earlier items. Previous survey research offers potential recommendations to reduce potential question-order effects. Question-order effects are assumed to arise because items similar in content influence one another [
26]; this has led to the suggestion that such items could be separated with ‘buffer’ questions [
27,
39,
44]. One potential within a Delphi survey for a COS, such as that described in this current paper, might be to alternate clinical and PRO items. However, this may interrupt the flow of the survey, making it less coherent [
26], and guidelines suggest that items within the same theme should be grouped together [
21]. Future research should explore this approach further.
An alternative approach for COS development is to randomise participants to receive surveys with different question orders and then combine the responses across the different surveys. Indeed, within the field of survey research this approach was recommended as long as 40 years ago [
45] and has been reiterated since [
16,
19,
20,
28]. The idea here is that when the data are combined across all randomised participants (as in the development of the oesophageal COS) question-order effects will be ‘cancelled out’ or at least diminished. This current paper has only considered the ordering of two ‘blocks’ of items (PRO and clinical), which produces only two different randomised versions. We have not considered potential order effects within those ‘blocks’ which may also exist. Again, initial piloting with cognitive interviews may help identify the extent of randomisation required. While it would be plausible to randomise items within ‘blocks’, it may be more logistically challenging, although this is likely to be easier for an electronic Delphi survey than a postal one. This should be explored further.
Within the context of crossover trials, when strong period-treatment interactions are observed, one recommendation is to use data from the first period only from each of the randomisation groups [
38]. This has also been recommended within survey research, where question order has been randomised, in the belief that responses to questions asked in the non-comparative context are a better representation of an individual’s true feelings [
29]. However, in the context of prioritising potential outcomes for a COS, it could be argued that an outcome cannot be rated without consideration of other outcomes and so the comparative context may be more appropriate.
While context effects were observed in this exploratory study, further work is needed to replicate and confirm our findings within the development of other core sets. It is, however, plausible that question order may, to some extent, have impacted on previously developed COSs which have employed Delphi surveys. A crucial part of the development of a COS is its subsequent periodic review in order to validate the COS and ensure outcomes are still important [
4]. For COSs initially developed without consideration of question order, such a review would afford the opportunity to consider such potential effects. This research does not invalidate previously developed COSs but offers a potential enhancement to the review and updating of COSs and the development of future COSs.
In addition to initial piloting of the Delphi survey, in the absence of further research we would recommend that question order within a Delphi survey is randomised, at least in terms of the presentation of clinical and patient-reported outcomes, and that the responses are then combined across randomisation groups to inform the final COS.
Finally, while this study has considered the use of a Delphi survey to inform a COS, question order is also likely to have an impact in other forms of consensus methodology such as the Nominal Group Technique or less-structured consensus meetings. While these approaches do not generally incorporate a formal questionnaire, items for discussion are still presented to participants in some order. Without running multiple meetings, it is difficult to envisage how randomisation could be utilised in this scenario. The Delphi method enables randomisation of question order and impact of question order to be examined empirically afterwards.
Acknowledgements
The authors are grateful to all the patients and health professionals who gave up their time to participate in the Delphi survey, including the CONSENSUS (Core Outcomes and iNformation SEts iN SUrgical Studies) Esophageal Cancer working group which comprises health professionals who participated in at least one round of the Delphi survey: Derek Alderson (University Hospitals Birmingham NHS Foundation Trust, UK), Bilal Alkhaffaf (Central Manchester University Hospitals NHS Foundation Trust, UK), William Allum (The Royal Marsden NHS Foundation Trust, UK), Stephen Attwood (Northumbria Healthcare NHS Foundation Trust, UK), Hugh Barr (Gloucestershire Hospitals NHS Foundation Trust, UK), Issy Batiwalla (North Bristol NHS Trust, UK), Guy Blackshaw (University Hospital of Wales, UK), Marilyn Bolter (Plymouth Hospitals NHS Trust, UK), Abrie Botha (Guy and St Thomas’ NHS Foundation Trust, UK), Jim Byrne (University Hospitals Southampton NHS Foundation Trust, UK), Joanne Callan (Heart of England NHS Foundation Trust, UK), Graeme Couper (NHS Lothian, UK), Khaled Dawas (University College London Hospitals, UK), Chris Deans (NHS Lothian, UK), Claire Goulding (Plymouth Hospitals NHS Trust, UK), Simon Galloway (South Manchester University Hospitals NHS Trust, UK), Michelle George (Maidstone and Tunbridge Wells NHS Trust, UK), Jay Gokhale (Bradford Teaching Hospitals NHS Foundation Trust, UK), Mike Goodman (The Royal Bournemouth and Christchurch Hospitals NHS Foundation Trust, UK), Richard Hardwick (Cambridge University Hospitals NHS Foundation Trust, UK), Ahmed Hassn (Princess of Wales Hospital, UK), Mark Henwood (Glangwili General Hospital, UK), David Hewin (Gloucestershire Hospitals NHS Foundation Trust, UK), Simon Higgs (Gloucestershire Hospitals NHS Foundation Trust, UK), Jamie Kelly (University Hospitals Southampton NHS Foundation Trust, UK), Richard Kryzstopik (Royal United Hospitals Bath NHS Trust, UK), Michael Lewis (Norfolk and Norwich University Hospitals NHS Foundation Trust, UK), Colin MacKay (NHS Greater Glasgow and Clyde, UK), James Manson (Singleton Hospital, UK), Robert Mason (Guy and St Thomas’ NHS Foundation Trust, UK), Ruth Moxon (Royal Berkshire NHS Foundation Trust, UK), Muntzer Mughal (University College London Hospitals, UK), Sue Osborne (Yeovil District Hospital NHS Foundation Trust, UK), Richard Page (Liverpool Heart and Chest Hospital NHS Foundation Trust, UK), Raj Parameswaran (Leeds Teaching Hospitals NHS Trust, UK), Simon Parsons (Nottingham University Hospitals NHS Trust, UK), Simon Paterson-Brown (NHS Lothian, UK), Anne Phillips (Oxford University Hospitals NHS Foundation Trust, UK), Shaun Preston (Royal Surrey County Hospital NHS Foundation Trust, UK), Kishore Pursnani (Lancashire Teaching Hospitals NHS Foundation Trust, UK), John Reynolds (St James’ Hospital, Dublin, Ireland), Bruno Sgromo (Oxford University Hospitals NHS Foundation Trust, UK), Mike Shackcloth (Liverpool Heart and Chest Hospital NHS Foundation Trust, UK), Jane Tallett (Norfolk and Norwich University Hospitals NHS Foundation Trust, UK), Dan Titcomb (University Hospitals Bristol NHS Foundation Trust, UK), Olga Tucker (Heart of England Birmingham NHS Foundation Trust, UK), Tim Underwood (University of Southampton, UK), Jon Vickers (Salford Royal NHS Foundation Trust, UK), Mark Vipond (Gloucestershire Hospitals NHS Foundation Trust, UK), Lyn Walker (University Hospitals of North Midlands NHS Trust, UK), Neil Welch (Nottingham University Hospitals NHS Trust, UK), John Whiting (University Hospitals Birmingham NHS Foundation Trust, UK), Jo Price (Royal United Hospitals Bath NHS Foundation Trust, UK), Peter Sedman (Hull and East Yorkshire Hospitals NHS Trust, UK), Thomas Walsh (Connolly Hospital, Dublin, Ireland), Jeremy Ward (Lancashire Teaching Hospitals NHS Foundation Trust, UK).
The ROMIO study group comprises co-applicants on the ROMIO feasibility study (listed in alphabetical order): C Paul Barham (University Hospitals Bristol NHS Foundation Trust, UK), Richard Berrisford (Plymouth Hospitals NHS Trust, UK), Jenny Donovan (University of Bristol, UK), Jackie Elliott (Bristol Gastro-Oesophageal Support and Help Group, UK), Stephen Falk (University Hospitals Bristol NHS Foundation Trust, UK), Robert Goldin (Imperial College London, UK), George Hanna (Imperial College London, UK), Andrew Hollowood (University Hospitals Bristol NHS Foundation Trust, UK), Sian Noble (University of Bristol, UK), Grant Sanders (Plymouth Hospitals NHS Trust, UK), Tim Wheatley (Plymouth Hospitals NHS Trust, UK).