Background
Worldwide, the leading preventable cause of illness (e.g. lung cancer, respiratory- and cardiovascular diseases) and pre-mature death remains tobacco smoking. Smoking accounts for more than six million deaths every year in the world (via smoking-related diseases) [
1]. More effort needs to be directed towards reducing smoking prevalence, and evidence-based, comprehensive tobacco control measures should be implemented [
2,
3]. Hence, a wide array of effective interventions for smoking cessation has been developed [
4‐
6].
Most interventions rely on randomized controlled trials [RCTs] for proof of efficacy. In doing so, they utilize a variety of smoking cessation outcome measures to evaluate the efficacy of interventions. This may limit the comparability of study results. Not only are various smoking cessation outcome measures used, previous literature has reported arguments for, and empiric evaluations of, specific measures [
7,
8]. It seems that measures can be broadly classified as self-report and biochemical validation measures [
8]. Examples of self-report measures are point-prevalence abstinence—the percentage of former smokers who are not smoking for a specific period of time (e.g. 24 h or 7 days) at the point of assessment—and continuous abstinence (the percentage of former smokers who remained abstinent since the introduction of an intervention or event) [
8]. Examples of biochemical validation measures are carbon monoxide, which can be measured in expired air and blood, and cotinine, which is the major proximate metabolite of nicotine and can be measured in various biological specimens (e.g. saliva and urine) [
9].
Velicer, Prochaska, Rossi, and Snow [
7] reviewed outcome measures for smoking cessation and evaluated several self-report measures and biochemical validation measures. In the literature, self-report measures have been empirically compared and discussed [
8]; several types of biochemical verification methods have been discussed as well [
9]. Using different outcome measures (self-report and biochemical validation) may vary the reported abstinence rates more than twofold (e.g. Hurt et al. [
10]). Selection of outcome measures should of course reflect the chosen study goals, which may lead to differences in outcomes between studies. However when possible, common outcome measures should be utilized in order to increase the comparability of effective interventions. According to West, Hajek, Stead, and Stapleton [
11], depending on the criteria adopted, the success rates of trials can differ dramatically. However studies with differing measures, such as using or not using biochemical validation, or studies with different follow-up durations, have been combined in overviews [
12‐
14]. Some studies report similarities in smoking cessation outcomes, like point-prevalence abstinence vs. prolonged abstinence, producing similar relative effect sizes [
15]. Other studies, however, show differences in results, with point-prevalence abstinence producing smaller effect sizes than prolonged abstinence [
16,
17]. Given that smoking cessation studies use different outcome measures, which limits the comparability and interpretability of their findings, a standard set of criteria for outcome measures in tobacco smoking research is needed to enable researchers to uniformly express their results [
11].
Attempts have been made to develop a standard set of criteria for outcome measures that would be utilized by all investigators. A workgroup examined outcome measures used in clinical trials using a literature search in 2003 [
16], resulting in an overview of abstinence measures and recommendations. Later, West et al. [
11] set out criteria and proposed the Russell standard. It combines a period of prolonged prevalence/continuous abstinence (six months or 12 months) after the quit date, during which a participant is allowed up to 5 cigarettes. This is combined with a biochemical test, using expired air carbon monoxide. However, one may argue that the use of carbon monoxide also has limitations and that self-report is highly accurate except for high risk groups and medical patients [
7]. Despite the proposed Russell standard, studies still use different outcome measures, resulting in Cochrane reviews using different outcomes of abstinence, such as 7-day point-prevalence abstinence after six months, three-month prolonged abstinence, and 12-month continuous abstinence [
14,
18]. This clearly illustrates the lack of consensus about an optimal strategy and the need for a study assessing various views on outcome criteria, identifying where there is consensus.
Moreover, economic evaluations identify which and to what extent cessation interventions are cost-effective. This information may be valuable for decision-makers to prioritize the reimbursement of interventions. However, the effectiveness of interventions used in these analyses is based on different outcomes of abstinence as well [
5,
19]. Hence, despite the existence of the Russell standard, in order for a standard to be used, consensus on the usefulness of the recommendations is also a prerequisite, consensus that may only partly exist for the Russell standard. Researchers need more standardisation regarding the measurement of smoking cessation to enhance interpretations of effectiveness and studies of cost-effectiveness. Exploring where there is consensus and where there is lack of consensus is thus important to enhance uniformity by adjusting or creating a standard set of criteria.
To date, no study has investigated the preferences of smoking cessation researchers and the extent of consensus regarding outcome criteria in randomized controlled smoking cessation trials. This study explored to what extent smoking cessation experts agree on the most important outcome criteria in RCTs. Consensus in outcome criteria, and thus deciding to use these outcome criteria, may enhance the comparability of future studies. Hence, the aim of the study is (1) to provide an overview of researchers’ opinions regarding the preferred outcome criteria (i.e. outcome measure, duration of abstinence or assessment method, and ideal follow-up) to be considered in smoking cessation RCTs, and (2) to identify the extent to which researchers have consensus on the importance of these outcome criteria. The results will reveal to what extent expert opinions are in agreement with the current Russell standard, and may indicate other potentially important outcome measures that should be considered.
Method
A three-round online Delphi study was conducted among smoking cessation researchers using Formdesk® and Qualtrics® between July 2015 and April 2016. Three iterations are often sufficient to collect the needed information and to elicit consensus [
20,
21]. The Delphi technique is a widely used and accepted method for achieving convergence of expert opinions [
22]. This technique is a method for consensus-building by using a series of questionnaires to collect data from experts [
22‐
26]. In contrast to other data gathering and analysis techniques, the Delphi technique employs multiple iterations [
27] in which the feedback process allows and encourages the selected experts to reassess their initial judgments about the information from previous iterations. Moreover, the Delphi technique is characterized by its ability to provide anonymity to respondents, a controlled feedback process, and the suitability of a variety of statistical analysis techniques to interpret the data [
23,
26]. These characteristics reduce the effects of dominant individuals and certain downsides of group dynamics, such as manipulation or coercion to adopt or conform to a certain viewpoint [
23,
26]. Furthermore, the Delphi methodology is practical, as experts from different parts of the world can be included due to its online character, and experts can complete each questionnaire at their own convenience, mitigating difficulties from non-matching schedules [
26].
Smoking cessation researchers of RCTs were selected as experts for this study (described in the Delphi rounds). Experts were recruited via an e-mail, inviting the researchers to participate in an online Delphi study. Additionally, participants in the first round were asked to suggest relevant researchers for the second and third rounds. For non-responders, an e-mail reminder was sent after two weeks, followed by another reminder after approximately four weeks. During each round, the researchers were invited to respond to specific questions in an online survey. Each survey took about 10–20 min to complete, and rounds were iterative in nature. In this study, for each outcome criteria, abstinence of smoking is defined as having smoked no cigarettes at all during the specified period of time.
First round
Smoking cessation researchers of RCTs were selected as experts for this study. In our systematic search for experts, we used the PubMed database for authors of relevant papers. We filtered for English language papers from the last 10 years and relied on the following keywords: ‘smoking cessation AND RCT’, ‘smoking cessation AND randomized controlled trial’ and ‘smoking cessation AND randomized controlled trial’. Titles were screened for relevance, and when author information was available, all authors of these studies were selected for recruitment. In addition, experts were identified via Google Scholar search and the international network of the authors. The Google Scholar search was a scoping search using a similar search strategy as the PubMed database search, to make sure key smoking cessation researchers were included. This led to a list of 250 experts (randomized using Microsoft Excel® (via the RAND() function)), from which we invited a random sample of experts to participate in all three rounds of the Delphi study. We initially invited 30 experts to participate. Two weeks later, we invited 14 more experts (thus inviting 44 in total) to reach a sufficient number of participants, as we wanted to include at least 15 experts in the first round. This resulted in 17 participating experts (38.6% response rate) from seven countries (i.e. United States (US), Hong Kong, United Kingdom (UK), Germany, Sweden, Australia, and the Netherlands) for the first-round questionnaire. This number was deemed sufficient, as 10 to 15 experts are regarded as sufficient if the experts’ backgrounds are rather homogeneous [
28]. As described by Ludwig [
29], 15 to 20 participants have been used in the majority of Delphi studies.
In the first round, we collected the most important smoking cessation measures by means of open-ended questions. The survey consisted of two parts. First, experts were asked to answer a few items regarding demographic characteristics: gender (male, female), age, and current profession (post-doc researcher, assistant professor, associate professor, professor, senior researcher, and other). Second, open-ended questions were categorized around two themes: self-reported outcome measures and biochemical validation methods. For self-reported outcome measures we asked the following question: “What are according to you the most important self-reported outcome measures to assess smoking cessation in randomized controlled smoking cessation trials?” In the survey, there was a limitation of six answers to stimulate reporting of the most important criteria. Experts were asked to name the outcome measure (e.g. prolonged abstinence), its duration of abstinence (e.g. six months), and the ideal follow-up period (e.g. 6 and 12 months). For biochemical validation methods, the following question was addressed: “What are according to you the most important biochemical validation methods to assess smoking cessation in randomized controlled smoking cessation trials?” Experts were asked to name the outcome measure of the validation method (e.g. cotinine) and its assessment method (e.g. saliva samples). Additionally, they were asked to indicate for each outcome measure (i.e. both self-report and biochemical validation measures) that they reported whether there is a specific research population for which this measure would be inappropriate. Finally, experts were asked whether they had other comments and suggestions for smoking cessation experts for the following rounds of the present Delphi study.
The collected responses resulted in a list of smoking cessation measures that were indicated to be most important in randomized smoking cessation trials. Two researchers analysed this list of measures and where possible, merged measures that were semantically similar. After discussion with one more researcher, all three researchers fully agreed about the measures that were included in the second-round questionnaire.
Second round
All 250 identified experts, plus researchers suggested by first round participants, were invited to participate in the second round. Of the 256 invited experts, 48 from 16 countries (i.e. Australia, France, UK, Canada, China, Germany, Greece, India, Israel, Italy, Malaysia, New Zealand, the Netherlands, Turkey, UK, and the US) completed the questionnaire, resulting in a 19% response rate. Experts were presented with a list of smoking cessation measures that were identified as most important in randomized smoking cessation trials during the first round. Then experts were asked to rate the factors in order of importance using a 7-point Likert scale ranging from 1 (not at all important) to 7 (extremely important). Factors such as outcome measures, abstinence duration/assessment method, and ideal follow-up period were evaluated. The survey assessed the ratings for self-report outcome measures, biochemical outcome measures, and ideal follow-up separately. For self-report and biochemical outcome measures, first the outcome measures were rated, which was then followed by the ideal abstinence duration per measure. Experts rated all factors due to forced response in the survey. To analyse the importance of each factor, the median score (Mdn) was calculated and a score of ≥6 was considered important (i.e. agreement with the factor being important) [
30]. To gain an indication of the degree of consensus between experts on the factors, interquartile ranges (IQR) were calculated [
30]. Using a 7-point Likert scale, IQRs with a value of ≤1 (i.e. more than half of the opinions fall within one point of the scale) indicate good consensus among the experts [
24].
Third round
Factors with IQRs of ≤1 were removed from the questionnaire for the third round. All experts from the second round were invited to re-rate the remaining factors for which there was no consensus. Again, experts rated all factors due to forced response in the survey. Of 48 invited experts, 37 experts from 12 countries (i.e. Australia, UK, Canada, China, Germany, Greece, India, Israel, Malaysia, New Zealand, the Netherlands, and the US) completed the questionnaire in this final round (77% response rate). For each factor in the third round questionnaire, the Mdn and IQR of the second round were presented alongside the question to re-rank the remaining factors/outcome measures on their importance.
Conclusions
The findings suggest that regarding expert opinion, only partial compliance with the Russell standard is reported by experts, which is congruent with the reports of efficacy studies. Experts seem to deem more outcome criteria important for consideration in randomized controlled smoking cessation trials. Consequently, findings suggest the need to develop an adapted version, a Russell 2.0 standard, that includes more outcome measures, such as: (1) six-month prolonged abstinence (or continuous abstinence); (2) seven-day point-prevalence abstinence with the numbers of cigarettes smoked in these seven days; (3) biochemical validation, at least in a sample of the population, with a preference for cotinine assessments over carbon monoxide because of its greater sensitivity and specificity; (4) follow-ups after 6 months, and preferably also after 12 months.