Discrete choice experiments (DCEs) are increasingly advocated as a way to quantify preferences for health. However, increasing support does not necessarily result in increasing quality. Although specific reviews have been conducted in certain contexts, there exists no recent description of the general state of the science of health-related DCEs. The aim of this paper was to update prior reviews (1990–2012), to identify all health-related DCEs and to provide a description of trends, current practice and future challenges.
Methods
A systematic literature review was conducted to identify health-related empirical DCEs published between 2013 and 2017. The search strategy and data extraction replicated prior reviews to allow the reporting of trends, although additional extraction fields were incorporated.
Results
Of the 7877 abstracts generated, 301 studies met the inclusion criteria and underwent data extraction. In general, the total number of DCEs per year continued to increase, with broader areas of application and increased geographic scope. Studies reported using more sophisticated designs (e.g. D-efficient) with associated software (e.g. Ngene). The trend towards using more sophisticated econometric models also continued. However, many studies presented sophisticated methods with insufficient detail. Qualitative research methods continued to be a popular approach for identifying attributes and levels.
Conclusions
The use of empirical DCEs in health economics continues to grow. However, inadequate reporting of methodological details inhibits quality assessment. This may reduce decision-makers’ confidence in results and their ability to act on the findings. How and when to integrate health-related DCE outcomes into decision-making remains an important area for future research.
Quantifying preferences for healthcare is becoming increasingly popular; however, there exists no recent description of how health-related discrete choice experiments (DCEs) are being employed.
This study identified changes in experimental design, analytical methods, validity tests, qualitative methods and outcome measures over the last 5 years.
To facilitate quality assessment and better integration into health decision-making, future DCE reports should include more complete information, which might be achieved by developing reporting guidelines specifically for DCEs.
1 Introduction
In recent years, there have been increased calls for patient and public involvement in healthcare decision-making [1, 2]. Patient or public involvement can support decision-making at multiple levels: individual (shared decision-making), policy (patient experts on panels) and commissioning (incorporating patient preferences in technology evaluations or health state valuation). Views can be elicited qualitatively, quantitively or using mixed-methods approaches [3]. Example methods include interviews, focus groups and stated preference techniques such as the standard gamble or time trade-off. Studies by the Medical Device Innovation Consortium (MDIC) [4] and Mahieu et al. [5] highlighted a wide variety of methods to measure both stated and revealed preferences in healthcare.
Anzeige
Among the quantitative methods for eliciting stated health preferences, discrete choice experiments (DCEs) are increasingly advocated [6]. In a DCE individuals are asked to select their preferred (and/or least preferred) alternative from a set of alternatives. DCEs are grounded in theories which assume that (1) alternatives can be described by their attributes, (2) an individual’s valuation depends upon the levels of these attributes, and (3) choices are based on a latent utility function [7‐10]. The theoretical foundations have implications for the experimental design (principles to construct alternatives and choice sets) and the probabilistic models used to analyse the choice data [7].
Previously conducted broad reviews by Ryan and Gerard (1990–2000) [11], de Bekker-Grob et al. (2001–2008) [7] and Clark et al. (2009–2012) [6] identified a number of methodological challenges of DCEs (e.g. how to choose among orthogonal, D-efficient and other designs or how to account for preference heterogeneity when analysing choice data). These reviews, as well as published checklists [12] and best-practice guidelines [13‐17], have been developed to provide specific guidance and potentially improve quality [12, 18]. However, it is unknown whether the challenges identified in prior reviews are still relevant or whether there has been a response to the published suggestions and guidelines. Furthermore, although health-related DCEs are increasingly advocated by organisations such as the MDIC [4], their use for actual decision-making in health remains limited [7, 13]. Key barriers to their wider use in policy include concerns about the robustness and validity of the method and the quality of applied studies [19, 20].
This paper seeks to provide a current overview of the applications and methods used by DCEs in health economics. This overview will be created by systematically reviewing DCE literature and extracting data from the period 2013–2017. In addition, historical trends in experimental design, analytical methods, validation procedures and outcome measures will be described by comparing the results to those of prior reviews. For the sake of generality and to allow examination of trends based on consistent data extraction methods, this comparison will focus on the broad reviews cited above, rather than on narrower reviews of DCEs covering specific study designs or disease areas [21‐41]. Recent developments in DCE methods will be incorporated by including new data elements not reported in previous reviews. Potential challenges and recommendations for future research will also be identified.
2 Methods
The current systematic review continued the work conducted in the prior broad DCE reviews [6, 7, 11] by focusing on DCE1 applications published between 2013 and 2017. The methodology for this systematic review built on that of the prior reviews to allow comparison of results across review periods and identification of trends. The search was initiated in May 2015 and updated in February 2016 and January 2018. We used the same search engine (PubMed) that was used in the latest review by Clark et al. [6] and generally used the same search terms. We decided to exclude the search terms ‘conjoint’ and ‘dce’, since these yielded too many irrelevant results (particularly due to the rise of dynamic contrast-enhanced imaging in gene expression profiling) and would have substantially increased the number of abstracts to be reviewed. The final search terms included ‘discrete choice experiment’, ‘discrete choice experiments’, ‘discrete choice modeling’, ‘discrete choice modelling’, ‘discrete choice conjoint experiment’, ‘stated preference’, ‘part-worth utilities’, ‘functional measurement’, ‘paired comparisons’, ‘pairwise choices’, ‘conjoint analysis’, ‘conjoint measurement’, ‘conjoint studies’, ‘conjoint choice experiment’ and ‘conjoint choice experiments’. A study was included if it was applied to health, included a discrete choice exercise (rather than rating or ranking), focused on human beings and was published as a full-text article in English between January 2013 and December 2017. Consistent with prior reviews, DCEs without empirical data (e.g. methodological studies) and studies of samples already included in our review were excluded.
Anzeige
To ensure consistency of data extraction and assist with synthesis of results, the authors used an extraction tool, available in Appendix A of the Electronic Supplementary Material, initially developed using the criteria of Clark et al. [6]. We first considered areas of application (e.g. patient consumer experience, valuing health outcomes) and background information (country of origin, number and type of attributes, number of choice sets, survey administration method), followed by more detailed information about the experimental design (type, plan, use of blocking, design software, design source, method used to create choice sets, number of alternatives, presence of an opt-out or status quo option, sample size and type), data analysis (model, analysis software, model details), validity checks (external and internal), use of qualitative methods (type and rationale) and presented outcome measures. The authors tested the extraction tool and discussed initial results. To fully capture current DCE design methods, the following data elements were added to the original data extraction tool: number of alternatives, presence of an opt-out or status quo, sample size, use of blocking, use of a Bayesian design approach, software for econometric analyses and the type of qualitative research methods reported. With regard to analysis methods, this review also extracted additional information on the use of scale-adjusted latent class, heteroskedastic conditional logit and generalised multinomial models. Studies were also categorised by journal type.
Each author extracted data from a group of articles, checking online appendices and supplementary materials where relevant. A subsample of studies (20%) was double-checked by V.S. for quality control. We categorised the extracted data and reported the results as percentages. Results for the econometric analysis models were categorised based on the three key characteristics of the multinomial logit model (Fig. 1): (1) the assumption that error terms are independent and identically distributed (IID) according to the extreme value type I distribution, (2) independence of irrelevant alternatives (IIA) (resulting from the first characteristic) and (3) the presence or absence of preference heterogeneity [7]. The IID characteristic limits flexibility in estimating the error variance, whereas IIA is about the flexibility of the substitution pattern (how flexible respondents are to substitute between choices), and assumptions about preference heterogeneity determine whether preferences are allowed to vary across respondents.
×
3 Results
3.1 Search Results
A total of 7877 abstracts were identified from the beginning of 2013 until the end of 2017. After abstract and full-text review, 301 DCEs (including six case 3 best–worst scaling [BWS] studies) met the inclusion criteria and were selected for data extraction (see Fig. 2) [43‐343]. Figure 3 depicts the total number of DCE applications in health across the different review periods: 1990–2000, 2001–2008, 2009–2012 and 2013–2017. The 2009–2012 review reported that the number of studies had increased to 45 per year on average [6]. The current review period found 60 studies per year on average, with a high of 98 studies in 2015 and a low of 32 studies in 2017 (Fig. 3). Figure 3 also shows that the increase in DCE applications between the prior review periods and the current review period was less consistent than the increases observed in prior periods.
×
×
3.2 Areas of Application
Prior reviews mentioned that although DCEs were originally introduced in health economics to value patient or consumer experience, the use of DCEs has broadened considerably [6, 344]. Table 1 summarises information about the different areas of application of DCEs for each review period (Appendix B of the Electronic Supplementary Material contains figures based on the tables in this review). Compared to the latest review period, the largest overall shifts occurred in the areas of patient consumer experience (category A), trade-offs between health outcomes and patient or consumer experience factors (category C), and health professionals’ preferences for treatment or screening (category G). In the current review period, 8% of studies valued health outcomes such as ‘heart attacks avoided’ (category B, 23 studies, e.g. studies [148, 152, 153, 162, 170]), 4% estimated utility weights within the quality-adjusted life year (QALY) framework (category D, 13 studies, e.g. [218, 226‐228, 230]), 6% focused on job choices (category E, 17 studies, e.g. [231, 236, 238, 242, 247]), and 9% developed priority-setting frameworks (category F, 27 studies, e.g. [248, 253, 270, 272, 274]).
N/C not collected (data were not collected for this specific category), QALY quality-adjusted life year
aNumbers of individual studies might not add up to total Ns as some studies addressed multiple topics
bPercentages might not add up to 100% because some studies addressed multiple topics and because of rounding error
Among the DCEs reviewed, the most common journal focus was health services research (n = 139; 46%). About a third (n = 102; 34%) of articles were published in specialty-focused medical journals such as Vaccine (five studies [66, 131, 146, 311, 313]) or the British Journal of Cancer (three studies [47, 70, 171]). Fifty-one (17%) were published in general medical journals such as PLoS One (20 studies, e.g. [44, 64, 81, 91, 99]) and BMJ Open (five studies [100, 102, 109, 169, 264]). More details can be found in Appendix C of the Electronic Supplementary Material.
3.3 Background Information About DCEs
The reviews from Ryan and Gerard [11], de Bekker-Grob et al. [7] and Clark et al. [6] provided detailed information about study characteristics. Information for the current review period is described in the sections below. Table 2 parts (a) and (b) report the current information alongside data from the prior reviews.
Table 2
DCE Background information
Item
Category
1990–2000
2001–2008
2009–2012
Current: 2013–2017
N = 34a
(%)b
N = 114a
(%)b
N = 179a
(%)b
N = 301a
(%)b
(a)
Country of origin
Australia
6
(18)
13
(11)
14
(8)
30
(10)
Canada
1
(3)
6
(5)
23
(13)
25
(8)
Germany
0
(0)
3
(3)
18
(10)
28
(9)
Netherlands
0
(0)
5
(4)
27
(15)
44
(15)
UK
20
(59)
55
(48)
39
(22)
50
(17)
US
7
(21)
14
(12)
28
(16)
50
(17)
Other
0
(0)
13
(11)
45
(25)
102
(34)
Number of attributes
2–3
5
(15)
15
(13)
14
(8)
30
(10)
4–5
10
(29)
50
(44)
57
(32)
117
(39)
6
9
(26)
30
(26)
61
(34)
67
(22)
7–9
4
(12)
15
(13)
41
(23)
63
(21)
10
2
(6)
2
(2)
5
(3)
4
(1)
> 10
4
(12)
2
(2)
5
(3)
12
(4)
Not clearly reported
N/C
N/C
N/C
N/C
N/C
N/C
8
(3)
Attributes covered
Monetary measure
19
(56)
61
(54)
102
(57)
150
(50)
Time
25
(74)
58
(51)
118
(66)
117
(39)
Risk
12
(35)
35
(31)
106
(59)
133
(44)
Health status
19
(56)
62
(54)
109
(61)
71
(24)
Health care
28
(82)
79
(69)
129
(72)
104
(35)
Other
3
(9)
17
(15)
88
(49)
144
(48)
Number of choices per individual
8 or less
13
(38)
45
(39)
36
(20)
86
(29)
9–16 choices
18
(53)
43
(38)
113
(63)
162
(54)
> 16 choices
2
(6)
21
(18)
30
(17)
44
(15)
Not clearly reported
1
(3)
5
(4)
5
(3)
9
(3)
Administration of survey
Self-completed questionnaire (paper)
27
(79)
76
(67)
86
(48)
69
(23)
Self-completed questionnaire (online)
3
(9)
13
(11)
75
(42)
172
(57)
Interviewer administered
3
(9)
22
(19)
34
(19)
44
(15)
Other
N/C
N/C
N/C
N/C
N/C
N/C
5
(2)
Not clearly reported
1
(3)
9
(8)
7
(4)
11
(4)
(b)
Number of alternatives (not including opt-out/status quo)
2
N/C
N/C
N/C
N/C
N/C
N/C
251
(83)
3
N/C
N/C
N/C
N/C
N/C
N/C
20
(7)
4
N/C
N/C
N/C
N/C
N/C
N/C
5
(2)
5
N/C
N/C
N/C
N/C
N/C
N/C
2
(1)
Not clearly reported
N/C
N/C
N/C
N/C
N/C
N/C
23
(8)
Number of studies with opt-out/status quo
Yes
N/C
N/C
N/C
N/C
N/C
N/C
98
(33)
No
N/C
N/C
N/C
N/C
N/C
N/C
194
(64)
Not clearly reported
N/C
N/C
N/C
N/C
N/C
N/C
9
(3)
Sample size
Mean
N/C
N/C
N/C
N/C
N/C
N/C
728
N/A
Median
N/C
N/C
N/C
N/C
N/C
N/C
401
N/A
Type of sample
Patients
15
(44)
N/C
N/C
N/C
N/C
110
(37)
Healthcare workers
N/C
N/C
N/C
N/C
N/C
N/C
39
(13)
General public
11
(32)
N/C
N/C
N/C
N/C
81
(27)
Other
8
(24)
N/C
N/C
N/C
N/C
93
(31)
Not clearly reported
N/C
N/C
N/C
N/C
N/C
N/C
5
(2)
DCE discrete choice experiment, N/A not applicable, N/C not collected (data were not collected for this specific category)
aNumbers of individual studies might not add up to total Ns as some studies addressed multiple topics
bPercentages might not add up to 100% because some studies addressed multiple topics and because of rounding error
3.3.1 Country of Origin
Table 2a shows that UK-based studies made up a relatively high proportion of published DCEs (17%, 50 studies), as did studies from the US (17%, 50 studies), Australia (10%, 30 studies), the Netherlands (15%, 44 studies), Germany (9%, 28 studies) and Canada (8%, 25 studies). DCEs were also popular in other European countries, for example, Italy (3%, eight studies) and Sweden (2%, six studies) (not shown). We also observed an increase in studies coming from ‘other’ countries, from 0% to 34% across the four review periods, which reflects an upwards trend towards applying DCEs in middle- and low-income countries (e.g. Cameroon [239], Ghana [244], Laos [232], Malawi [254] and Vietnam [122]).
3.3.2 Attributes, Choices and Survey
In the current review period, the number of attributes per alternative in DCEs ranged from two to 21, with a median of five. We observed a slight decrease in number of attributes; the modal category was 4–5 (39%, 117 studies). In line with prior reviews, most studies (82%, 247 studies) included four to nine attributes. For the period 2013–2017, most studies included a monetary (50%, 150 studies), time-related (39%, 117 studies), or risk-related (44%, 133 studies) attribute. The proportion of studies including time-related and health status (24%, 71 studies) attributes decreased.
Most DCEs in the current period included nine to 16 choices per individual (54%, 162 studies), with a median of 12 (minimum 1, maximum 32). Prior reviews mentioned increases in online administration of DCEs. This trend continued in the current review period, with 57% of the DCEs conducted online (172 studies), whereas the number of DCEs which used pencil and paper dropped to 23% (69 studies). These self-completed DCEs remained the main source of survey administration.
3.3.3 Alternatives and Sample
Prior reviews did not collect data about the number of alternatives included in each DCE or whether an opt-out or status quo option was included. For the current period, most of the studies (83%, 251 studies) included two alternatives (not including any opt-out or status quo option), with 8% (23 studies) not clearly reporting the number of included alternatives (Table 2b). The majority of the studies (64%, 194 studies) did not include an opt-out or status quo option.
The prior reviews covering the period 1990–2012 did not extract data about the sample size. In the current period, the mean and median sample size were 728 and 401, respectively. Sample size ranged from a minimum of 35 [116] to a maximum of 30,600 respondents [148]. Most of the samples included patients (37%, 110 studies) or the general public (27%, 81 studies). A large number of DCEs sampled ‘other’ populations (31%, 93 studies) such as healthcare workers, healthcare students or a mixture of these.
3.4 Experimental Design
Experimental design (planning of the alternatives and choice sets) is crucial to the conduct of a DCE. The review from de Bekker-Grob et al. [7] describes DCE design in detail. For more information about the choices researchers have to make when designing the experimental part of a DCE, we also refer to a key checklist and best practice example [14, 15].
3.4.1 Design Type, Design Plan and Blocking
As in prior review periods, most DCEs made use of a fractional design (89%, 269 studies) (Table 3). Additionally, we observed that for the current review period, the design plan of DCEs most frequently focused on main effects only (29%, 86 studies). This is a decrease compared to the periods 1990–2000, 2001–2008 and 2009–2012, with 74%, 89% and 55%, respectively. The percentage of DCEs not clearly reporting design plan information increased to 49% (147 studies) for 2013–2017. When generating the experimental design, blocking, creating different versions of the experiment for different respondent groups, can be used to reduce the cognitive burden of respondents by reducing the total number of choices per respondent [345]. Reviews for the period 1990–2012 did not collect information about blocking. Data for the current period showed that 50% (150 studies) reported using blocking when generating the experimental design. On average, studies with blocking had 709 participants, each of whom completed 11 choice sets, whereas studies with unblocked designs had 439 participants, each of whom completed 13 choice sets.
Table 3
Experimental design information DCEs
Item
Category
1990–2000
2001–2008
2009–2012
Current: 2013–2017
N = 34a
(%)b
N = 114a
(%)b
N = 179a
(%)b
N = 301a
(%)b
Design type
Full factorial
4
(12)
0
(0)
9
(5)
13
(4)
Fractional
25
(74)
114
(100)
158
(88)
269
(89)
Not clearly reported
5
(15)
0
(0)
12
(7)
19
(6)
Design plan
Main effects only
25
(74)
100
(89)
98
(55)
86
(29)
Main effects and two-way interactions
2
(6)
6
(5)
23
(13)
52
(17)
Not applicable
4
(12)
0
(0)
5
(3)
5
(2)
Other
N/C
N/C
N/C
N/C
N/C
N/C
11
(4)
Not clearly reported
3
(9)
8
(7)
52
(29)
147
(49)
Blocking
Yes
N/C
N/C
N/C
N/C
N/C
N/C
150
(50)
No
N/C
N/C
N/C
N/C
N/C
N/C
60
(20)
Not clearly reported
N/C
N/C
N/C
N/C
N/C
N/C
91
(30)
Design software
Ngene
N/C
N/C
N/C
N/C
N/C
N/C
62
(21)
SAS
0
(0)
14
(12)
41
(23)
54
(18)
Sawtooth
2
(6)
5
(4)
30
(17)
47
(16)
SPEED
13
(38)
22
(19)
9
(5)
1
(0)
SPSS
2
(6)
14
(12)
13
(7)
20
(7)
Not applicable
N/C
N/C
N/C
N/C
N/C
N/C
11
(3)
Other
2
(6)
N/C
N/C
27
(15)
7
(2)
Not clearly reported
N/C
N/C
4
(4)
9
(5)
99
(33)
Design source
Website
0
(0)
3
(3)
9
(5)
4
(1)
Expert
4
(12)
4
(4)
11
(6)
5
(2)
Not clearly reported
9
(26)
42
(37)
30
(17)
215
(71)
Methods to create choice sets
Orthogonal: single profiles (binary choices)
3
(9)
12
(11)
2
(1)
7
(2)
Orthogonal: random pairing
18
(53)
19
(17)
18
(10)
12
(4)
Orthogonal: pairing with constant comparator
6
(18)
23
(20)
5
(3)
0
(0)
Orthogonal: foldover-random pairing
0
(0)
1
(1)
4
(2)
2
(1)
Orthogonal: foldover
0
(0)
11
(10)
34
(19)
26
(9)
D-efficiency
0
(0)
14
(12)
54
(30)
105
(35)
Bayesian D-efficiency
N/C
N/C
N/C
N/C
N/C
N/C
23
(8)
Other
4
(12)
2
(2)
27
(15)
26
(9)
Not clearly reported
3
(9)
32
(28)
39
(22)
100
(33)
DCE discrete choice experiment, N/C not collected (data were not collected for this specific category)
aNumbers of individual studies might not add up to total Ns as some studies addressed multiple topics
bPercentages might not add up to 100% because some studies addressed multiple topics and because of rounding error
3.4.2 Design Software
Ngene became the most popular software tool in the current period for generating experimental designs (21%, 62 studies, e.g. [53, 63, 139, 268, 319]). SAS (18%, 54 studies, e.g. [262, 290, 296, 300, 316]) and Sawtooth (16%, 47 studies, e.g. [46, 141, 207, 276, 323]) remained popular tools. Compared to prior review periods, we observed an increase in the percentage of studies not clearly indicating what software was used to generate the experimental design (33%, 99 studies, e.g. [44, 144, 177, 204, 299]).
3.4.3 Methods to Create Choice Sets
The upwards trend in the use of D-efficient (35%, 105 studies) experimental designs continued in the current review period. Correspondingly, fewer DCEs used orthogonal arrays through methods such as single profiles, random pairing or the foldover technique (Table 3). As with the experimental design characteristics mentioned in the previous sections, we observed that an increasing number of studies (33%, 100 studies in 2013–2017) did not clearly report the methods used to create choice sets.
3.5 Econometric Analysis Methods
Information about the different econometric analysis methods and the appropriateness of these methods for different DCE applications is described in great detail in the prior reviews [6, 7, 11]. More information can be found in papers by Louviere and Lancsar [12], Bridges et al. [14] and Hauber et al. [17]. Table 4 parts (a) and (b) summarise information about econometric analyses from the current and prior review periods.
Table 4
Econometric analysis details DCEs
Item
Category
1990–2000
2001–2008
2009–2012
Current: 2013–2017
N = 34a
(%)b
N = 114a
(%)b
N = 179a
(%)b
N = 301a
(%)b
(a)
Econometric analysis model
Random effects probit (random intercept)
18
(53)
47
(41)
18
(10)
17
(6)
Logit
1
(3)
13
(11)
18
(10)
0
(0)
Multinomial logit
6
(18)
25
(22)
86
(45)
116
(39)
Random effects logit (random intercept)
1
(3)
6
(5)
14
(8)
15
(5)
Mixed logit (random parameter)
1
(3)
6
(5)
45
(25)
118
(39)
Latent class
0
(0)
1
(1)
7
(4)
36
(12)
Nested logit
0
(0)
5
(4)
4
(2)
6
(2)
Scale-adjusted latent class
N/C
N/C
N/C
N/C
N/C
N/C
2
(1)
Heteroskedastic multinomial logit
N/C
N/C
N/C
N/C
N/C
N/C
11
(4)
Generalised multinomial logit
N/C
N/C
N/C
N/C
N/C
N/C
12
(4)
Probit
6
(18)
8
(7)
4
(2)
7
(2)
Other
1
(3)
4
(4)
32
(18)
25
(8)
Not clearly reported
2
(6)
4
(4)
2
(1)
7
(2)
(b)
Software for econometric analysis
Nlogit
N/C
N/C
N/C
N/C
N/C
N/C
65
(22)
Biogeme
N/C
N/C
N/C
N/C
N/C
N/C
5
(2)
Sawtooth
N/C
N/C
N/C
N/C
N/C
N/C
16
(5)
R
N/C
N/C
N/C
N/C
N/C
N/C
10
(3)
Stata
N/C
N/C
N/C
N/C
N/C
N/C
94
(31)
SAS
N/C
N/C
N/C
N/C
N/C
N/C
17
(6)
Other
N/C
N/C
N/C
N/C
N/C
N/C
15
(5)
Not clearly reported
N/C
N/C
N/C
N/C
N/C
N/C
79
(26)
Mixed logit/random parameter logit-additional information
Number of studies with additional information
N/C
N/C
N/C
N/C
38
(21)
65
(22)
Mean number of draws
N/C
N/C
N/C
N/C
N/C
N/C
1354
N/A
Median number of draws
N/C
N/C
N/C
N/C
N/C
N/C
1000
N/A
Distributional assumption: normal distribution
N/C
N/C
N/C
N/C
20
(52)
53
(18)
Distributional assumption: other distribution/unclear
N/C
N/C
N/C
N/C
19
(50)
12
(4)
DCE discrete choice experiment, N/A not applicable, N/C not collected (data were not collected for this specific category)
aNumbers of individual studies might not add up to total Ns as some studies addressed multiple topics
bPercentages might not add up to 100% because some studies addressed multiple topics and because of rounding error
Anzeige
3.5.1 Econometric Analysis Model, Software and Preference Heterogeneity
We present information about econometric analysis models according to the taxonomy described in the “Methods” section and visualised in Fig. 1. Reviews for the periods 1990–2000 and 2001–2008 reported that most DCEs used random-effects (random-intercept) probit models to analyse preference data (53% and 41%, respectively). The review for the period 2009–2012 showed a shift to the use of other methods like multinomial logit models (45%) and mixed (random-parameter) logit models (25%). For the current review period, this trend continued (see Table 4a). Most DCEs in 2013–2017 reported the use of mixed logit models (39%, 118 studies, e.g. [47, 271, 301, 314, 318]) or multinomial logit models (39%, 116 studies, e.g. [92, 110, 166, 294, 339]) to analyse preference data. The current review period also showed an increase in the use of latent class models (12%, 36 studies, e.g. [38, 91, 139, 165, 269]) and other econometric analysis models. Examples include generalised multinomial logit (4%, 12 studies, e.g. [97, 124, 157, 174, 240]) and heteroskedastic multinomial logit (4%, 11 studies, e.g. [134, 139, 184, 256, 309]).
Prior reviews did not collect data about the software used for econometric analysis. For the current review period, Table 4b shows that most DCEs made use of Stata (31%, 94 studies, e.g. [91, 110, 138, 149, 213]) or Nlogit (22%, 65 studies, e.g. [94, 171, 204, 282, 346]) to conduct econometric analysis. However, 26% (79 studies, e.g. [101, 184, 211, 231, 330]) did not clearly report information about the software used.
Among the studies that used mixed logit models to account for preference heterogeneity in the period 2013–2017, 22% (65 studies) included additional information about the distributional assumptions used to conduct the mixed logit analysis and the number of distributional draws (e.g. Halton draws) used to simulate preference heterogeneity. This percentage is similar to the percentage for the period 2009–2012, which was 21%. The mean number of draws for the current review period was 1354 (median 1000, minimum 50, maximum 10,000), and 18% of the DCEs (53 studies) assumed that parameters followed the normal distribution.
3.6 Validity Checks and Qualitative Methods
DCEs are based on responses to hypothetical choices (stated preferences), so internal and external validity checks provide a crucial opportunity to assess data quality or to compare stated preferences from DCEs with revealed preferences. As Clark et al. [6] observed in their review, there is often little reported about the tests for external validity, possibly because validating hypothetical choice scenarios is difficult [347]. Perhaps for this reason, the review covering the period 1990–2000 did not extract specific information about external validity tests. In the reviews from 2001–2012, only a very small proportion (1%) of the DCEs reported any details about their investigations into external validity. The current review period showed that 2% (seven studies [55, 93, 147, 184, 185, 195, 248]) reported using external validity tests (Table 5).
Table 5
Details of validity checks and qualitative methods
Item
Category
1990–2000
2001–2008
2009–2012
Current: 2013–2017
N = 34a
(%)b
N = 114a
(%)b
N = 179a
(%)b
N = 301a
(%)b
External validity tested
Yes
0
(0)
1
(1)
2
(1)
7
(2)
No
34
(100)
113
(99)
177
(99)
294
(98)
Internal validity tested
Non-satiation (dominated questions)
15
(44)
56
(49)
36
(20)
50
(17)
Transitivity (a > b, b > c then c > a)
3
(9)
5
(4)
2
(1)
2
(1)
Sen’s expansion and contraction
0
(0)
2
(2)
2
(1)
2
(1)
Internal compensatory (1 attribute)
12
(35)
36
(32)
30
(17)
18
(6)
Other
N/C
N/C
N/C
N/C
N/C
N/C
102
(34)
Not clearly reported/not tested
N/C
N/C
N/C
N/C
N/C
N/C
189
(63)
Type of qualitative method used
Interviews
N/C
N/C
N/C
N/C
N/C
N/C
151
(50)
Focus groups
N/C
N/C
N/C
N/C
N/C
N/C
54
(18)
Other
N/C
N/C
N/C
N/C
N/C
N/C
53
(18)
No qualitative method used
N/C
N/C
N/C
N/C
N/C
N/C
43
(14)
Rationale using qualitative methods
Attribute selection
6
(18)
79
(69)
90
(50)
160
(53)
Level selection
6
(18)
38
(33)
73
(41)
134
(44)
Pre-testing questionnaire
16
(47)
36
(32)
73
(41)
113
(38)
Understanding results/responses
0
(0)
5
(4)
14
(8)
12
(4)
Not clearly reported/other
N/C
N/C
N/C
N/C
N/C
N/C
5
(2)
N/C not collected (data were not collected for this specific category)
aNumbers of individual studies might not add up to total Ns as some studies addressed multiple topics
bPercentages might not add up to 100% because some studies addressed multiple topics and because of rounding error
For detailed information about the different internal validity tests, we refer to the prior review papers [6, 7, 11]. In the current review period, the percentage of studies that included internal validity checks ranged from a maximum of 17% (50 studies) for non-satiation checks to 6% (18 studies) for internal compensatory checks. Internal compensatory checks were reported less frequently than in earlier review periods. For the current review period, ‘other’ validity checks such as tests for theoretical and face validity and consistency were used frequently (34%, 102 studies).
Another way to enhance quality in a DCE is to complement the quantitative study with qualitative methods [35]. For the current review period, 86% (258) of the DCEs used qualitative methods to enhance the process and/or results. Most DCEs used interviews (50%, 151 studies) or focus group techniques (18%, 54 studies). Qualitative methods were usually used to inform attribute (53%, 160 studies) and/or level (44%, 134 studies) selection, which follows the overall upwards trend reported in prior reviews. The proportion of DCEs using qualitative methods for questionnaire pre-testing (38%, 113 studies) was similar to the level in the previous review period. Overall, just as in the previous review periods, few studies in the current review period (4%, 12 studies) used qualitative methods to improve the understanding of results/responses.
3.7 Outcome Measures
Information about the trends regarding the presented outcome measures is presented in Table 6.
Table 6
Presented outcome measures of DCEs
Item
Category
1990–2000
2001–2008
2009–2012
Current: 2013–2017
N = 34a
(%)b
N = 114a
(%)b
N = 179a
(%)b
N = 301a
(%)b
Presented outcome measure
Per WTP unit
10
(29)
44
(39)
54
(30)
80
(27)
Per WTA unit
N/C
N/C
N/C
N/C
N/C
N/C
13
(4)
Per risk unit
3
(9)
2
(2)
4
(2)
9
(3)
Monetary welfare measure
5
(15)
14
(12)
4
(2)
8
(3)
Utility score
8
(24)
18
(16)
14
(8)
50
(17)
Odds ratio
1
(3)
9
(8)
14
(8)
30
(10)
Probability score
1
(3)
15
(13)
14
(8)
38
(13)
Coefficients
N/C
N/C
N/C
N/C
N/C
N/C
169
(56)
Other
N/C
N/C
N/C
N/C
90
(50)
147
(49)
DCE discrete choice experiment, N/C not collected (data were not collected for this specific category), WTA willingness to accept, WTP willingness to pay
aNumbers of individual studies might not add up to total Ns as some studies addressed multiple topics
bPercentages might not add up to 100% because some studies addressed multiple topics and because of rounding error
Anzeige
As mentioned in prior reviews, DCEs often presented their outcomes in terms of willingness to pay (WTP), a monetary welfare measure or a utility score [6, 7, 11]. Use of these methods has declined over the past two review periods (2001–2012), and use of utility scores decreased from 24% to 8% over the past three periods (1990–2012). Relative to the previous period, we observed increases in the use of utility scores (17%, 50 studies, e.g. [61, 128, 141, 164, 317]), odds ratios (10%, 30 studies, e.g. [80, 146, 200, 234, 280]) and probability scores (13%, 38 studies, e.g. [122, 154, 198, 272, 277]). We also collected information about willingness-to-accept (WTA) measures (4%, 13 studies, e.g. [53, 94, 250, 322, 338]) and regression coefficients (56%, 169 studies, e.g. [44, 57, 231, 244, 276]), which were not collected in previous reviews. The proportion of studies with ‘other’ outcome measures remained near one half (49%, 147 studies, e.g. [48, 87, 114, 207, 273]). Examples from this category include (predicted) choice shares, maximum acceptable risk, relative importance and ranking.
4 Discussion
In this study, we reviewed DCEs published between 2013 and 2017. We followed the methods of prior reviews and compared our extraction results to those reviews to identify trends. We identified that DCEs have continued to increase in number and have been undertaken in more and more countries. Studies reported using more sophisticated designs with associated software, for example, D-efficient designs generated using Ngene. The trend towards the use of more sophisticated econometric models has also continued. However, many studies presented sophisticated methods with insufficient detail. For example, we were not able to check whether the results had the correct interpretation or whether the authors had conducted the appropriate diagnostics (e.g. checked that the data possessed the IIA characteristic). Qualitative methods have continued to be popular as an approach to select attributes and levels, which might improve validity. In this study, we also extracted data in several new categories, for example, sample size and type, the use of blocking, software used for econometric analysis and type of qualitative method used. We observed that the mean and median sample size were 728 and 401, respectively, with most samples including patients. We also observed that half of the studies used blocking and most studies used Stata for econometric analysis. Interviewing was the most popular qualitative research method used alongside DCEs.
The observed increase in the total number of DCEs in health economics was similar to the trend reported in prior reviews [6, 7, 11], but less consistent from year to year (Fig. 3). This less consistent increase might be explained by the presence of many competing stated preference methods [4, 5, 347]. We hypothesise that other methods may be increasing in popularity or becoming more useful in health settings [348]. Examples of such methods may include BWS case 1 and case 2 [349‐351], which were not included in this review. Additionally, in this review, we excluded a significant number of studies (n = 31) making methodological considerations about DCEs rather than conducting empirical research. The presence of such studies may indicate that knowledge about DCEs in health has increased and there is more focus on studies to develop the method. Examples include simulation studies about experimental design, studies comparing the outcomes of a DCE to other stated preference method outcomes and studies examining different model specifications [352‐354]. This might be another explanation for the less consistent increase in DCE application studies.
The common use of fractional designs, as described in prior reviews [6, 7], has continued. This review also found that main effects DCEs continue to dominate; however, there is a downwards trend as DCE designs incorporate two-way interactions more often. This is in line with the recommendations of Louviere and Lancsar [12], who suggest inclusion of interaction terms should be explored in the experimental design stage. Ngene became the most popular software tool in the current review period for generating experimental designs, while D-efficient designs became the most popular method to create choice sets. Perhaps as a consequence of the rise in software-generated designs, this review also showed that an increasing percentage of articles did not include information about experimental design features such as the design plan. Omitting this type of information might inhibit quality assessment and reduce confidence in the results. Future research might focus on the specific reasons why such information is missing and the impact of the missing information on quality assessment of DCEs. One potential reason for omitting methodological details is the journal word limit. When confronted with a low word limit, authors should consider using online space to report additional design and analysis details.
Anzeige
In addition to these observations about the generation of experimental designs, we identified design information that would be helpful to report in DCEs and future systematic reviews. For example, prior reviews did not include information about blocking, and although at least half of the DCEs we reviewed used blocking, 30% of the studies we reviewed did not include information about blocking. Blocking could be an important technique in light of the growing literature about the cognitive burden of DCEs and the impact of this cognitive burden on respondent outcomes [345]. However, blocking also has the disadvantage of requiring a larger sample size [345]. The approach described by Sándor and Wedel [355] might be another alternative to increase the validity of DCE outcomes in case of relatively small sample sizes or the investigation of preference heterogeneity.
Prior reviews identified a shift to more flexible econometric analysis models [6, 7], which is not necessarily positive. This trend has continued in this review. Most studies included multinomial logit or mixed logit models. Although we did not formally extract information about variance estimation, we noted that among the DCEs using multinomial logit models to analyse choice data, few reported robust or Huber-White standard errors (most studies reported ‘regular’ standard errors). Since these standard errors allow for more flexible substitution patterns and flexible variances, it is common in economics and econometrics to report these standard errors instead of ‘regular’ standard errors [356]. Also, in the presence of repeated observations from the same individuals, conventional standard errors are biased downward [357]. Thus, future DCEs in health economics could benefit from more appropriate treatment of clustered data (i.e. use of robust standard errors) and more complete reporting of econometric output.
In terms of analytical methods, we also observed some patterns in the exploration of preference and scale heterogeneity. We noted that, among the 39% of studies that used a mixed logit model, many treated heterogeneity as a nuisance, i.e. they used the mixed model to accommodate repeated measures but did not report additional information about the ‘mixed’ aspect of the data (e.g. standard deviation estimates). Since preference heterogeneity is regarded as an important aspect within choice modelling, taking full advantage of the modelling results might help us understand preference heterogeneity better [358]. With regard to scale heterogeneity, work by Fiebig et al. [346] indicated that other models such as the generalised multinomial logit and heteroskedastic multinomial logit models could be considered when analysing DCE data, to identify differences in scale when comparing preferences between groups of respondents [359]. Data from this review identified a small number of DCEs using such methods; for a more detailed breakdown, we refer readers to another review focussing on scale heterogeneity specifically [30]. However, it is important to mention that the generalised multinomial logit model should be used with caution since the ability of this model to capture scale heterogeneity has been questioned in the literature [360].
Articles by Vass and Payne [19] and Mott [20] describe issues influencing the degree to which DCE findings are used in healthcare decision-making (e.g. health-state valuation and health technology assessment). These articles, rising popularity of the method, and interest from regulators and funders suggest that DCEs could play an important role in real-world decision-making [361, 362]. However, concerns have been expressed about the validity, reliability, robustness and generalisability of DCEs [11, 363]. A key stage in understanding the robustness of DCEs is understanding whether stated preferences reflect ‘true’ preferences as revealed in the market [10]. In this study, we observed that the number of studies testing external validity remained small. Future research should focus on identifying and resolving the methodological and practical challenges involved in validity testing, and on guiding the incorporation of DCEs into actual decision-making in healthcare. Another practice that may improve the robustness of DCEs and facilitate their use in healthcare decision-making is the increased use of qualitative methods to complement quantitative DCE analysis [363]. Prior reviews and additional literature suggest that qualitative research methods can strengthen DCEs and other quantitative methods by facilitating numerous investigations such as (1) identification of relevant attributes and levels, (2) verification that respondents understand the presented information, and (3) learning about respondents’ decision strategies [6, 7, 11, 364]. These investigations can help determine whether respondents are making choices in line with the underpinning utility theories, thereby supporting the legitimacy of the underlying assumptions. This review showed an overall upwards trend in the number of DCEs using qualitative methods to select attributes and levels. This move towards a more mixed-methods approach has been observed by others, for example, the study by Ikenwilo et al. [365].
4.1 Strengths and Limitations
The current study has several strengths. First, the detailed data extraction was completed by each author individually, with the total number of articles approximately divided equally among authors because of the relative short timeframe and the need to balance author burden with study quality. Additionally, a subsample of studies (20%) was double-checked by one author (V.S.) for quality control, which enhanced reliability. Second, this study identified trends in empirical DCEs by comparing outcomes from all prior reviews. Additionally, this study included aspects of empirical DCEs not investigated before, although these aspects were recognised in the literature as becoming more important in DCE research (e.g. blocking in experimental design and the type of qualitative methods used in a DCE). Third, our observation of less rapid growth in the number of empirical DCEs (compared to the growth observed in previous reviews) matches the trend in the preference research to focus on the broad range of stated preference methods available (rather than DCEs exclusively) [4, 5, 347].
A potential weakness of this study was the use of multiple reviewers with potentially different interpretations of DCE reports, which might have affected the data extraction and, as a consequence, the results presented. To limit inconsistency between reviewers, all co-authors discussed the data extraction frequently and results were cross-validated by a single author (V.S.). Similarly, this inconsistency in interpretation may also have occurred between the different review periods. Procedural information from the two most recent reviews was used to ensure consistency, and we are therefore confident the general trends reported and the conclusion that more detailed methods reporting is called for holds. Another potential weakness is the use of only one database (PubMed). However, like the authors of the prior reviews [6, 7], we do not expect the review findings to be significantly different when performing searches on other databases. Also, since we were interested in identifying trends and therefore maximising comparability between the different reviews, we preferred to restrict our searches to this single database. As with many systematic reviews, data were extracted from published manuscripts and online appendices. The results are therefore reliant on what was reported in the final article and do not necessarily reflect all activities of the authors. Trends presented could therefore reflect factors such as publication bias, journal scope, editor preferences, and word limits, as well as preferences of journal editors rather than actual practice. Additionally, although we did update the data extraction tool based on changes in the field, future research might benefit from updating other aspects of the systematic review protocol such as search terms and inclusion and exclusion criteria (e.g. inclusion of best–best scaling). Finally, although we believe that DCEs are both useful and common enough to deserve focused attention in this review, DCEs represent one method among many for examining health preferences, and other methods may be preferable depending on the circumstances [4].
5 Conclusion
This study provides an overview of the applications and methods used by DCEs in health. The use of empirical DCEs in health economics has continued to grow, as have the areas of application and the geographic scope. This study identified changes in the experimental design (e.g. more frequent use of D-efficient designs), analysis methods (e.g. mixed logit models most frequently used), validity enhancement (e.g. more diverse use of internal validity checks), qualitative methods (e.g. upwards trend of qualitative methods used for attribute and level selection) and outcome measures (e.g. coefficients most frequently used). However, a large number of studies not reporting methodological details were also identified. DCEs should include more complete information, for example, information about design generation, blocking, model specification, random-parameter estimation and model results. Developing reporting guidelines specifically for DCEs might positively impact quality assessment, increase confidence in the results and improve the ability of decision-makers to act on the results. How and when to integrate health-related DCE outcomes into decision-making remains an important area for future research.
Acknowledgements
The authors would like to thank Dr. Ewan Gray, Prof. Katherine Payne and Dr. Logan Trenaman for their help with reference screening. The authors would also like to thank Mr. Martin Eden for his helpful comments on a draft of the manuscript.
Compliance with Ethical Standards
Funding
Vikas Soekhai was funded through the Research Excellence Initiative-Erasmus Choice Modelling Centre grant from the Erasmus University Rotterdam. Esther de Bekker-Grob was funded through a personal grant from The Netherlands Organisation for Scientific Research (NWO-Talent-Scheme-Veni-Grant No. 451-15-039). Alan Ellis was funded through a Patient-Centered Outcomes Research Institute® (PCORI®) Award (ME-1602-34572). Caroline Vass prepared this manuscript with support from ‘Mind the Risk’, a project funded by Riksbankens Jubileumsfond, part of the Swedish Foundation for Humanities and Social Sciences. The statements presented in this article are solely the responsibility of the authors and do not necessarily represent the views of the funders.
Data Availability
A simplified dataset generated during the current review is available upon request. The full detailed dataset generated during the current study will not be publicly available due to ongoing analysis, but more details regarding the simplified dataset are available from the corresponding author on reasonable request.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
In this review, best–worst scaling (BWS) case 1 and 2 are distinguished from case 3. Since case 1 and 2 BWS do not involve attribute-based comparisons between two or more alternatives, they were excluded from this review [42], consistent with the previous review [6]. Case 3 BWS, however, involves an attribute-based comparison between two or more alternatives and is considered an extension of DCEs in the literature [367, 42]. Therefore, case 3 BWS applications were included in this review.