Doctors and medical students were the focus of two thirds of DCE studies (66.7%, 18/27) [
5,
23,
45‐
60]. Two studies [
51,
58] were from a large longitudinal study of the employment preferences of Australian doctors known as MABEL (“Medicine in Australia: Balancing Employment and Life”). In contrast, mid-level cadres such as clinical officers [
6] and medical assistants [
59] were the focus of one study each, even though these cadres may present a more cost-effective response to health worker shortages, particularly in rural or remote areas. Moreover, no study has yet focused on community health workers, who as mostly volunteer workers may have very different preferences to salaried health professionals.
Students training to be health workers were included as participants in nearly half of all studies (44.4%, 12/27). No study set in a HIC contained just students as participants, compared to seven in LMIC. Undoubtedly, students offer more convenient survey administration, with relatively large populations in a limited number of locations that are far easier to convene than practicing health workers. Yet with most studies aiming to inform policy for practicing health workers, the extrapolation of utility values from students is concerning. Students nearing the end of their course were often targeted with the justification that they would soon graduate and select jobs based on their current preferences. Even students nearing the end of their training, however, are likely to hold different preferences to qualified workers who have managed a job and salary under prevailing working conditions. For example, Vujicic
et al.[
61] found that the location of workplace (rural/urban) was the most important attribute for doctors in a DCE undertaken in Vietnam, whereas it was long-term education for medical students. Moreover, there were five fold differences between doctors and medical students in willingness-to-pay estimates for some job attributes. Rockers
et al. found similar differences in preferences for attributes of rural jobs between practising nurses and nursing students in Laos [
62]. And whilst the target population is often students nearing graduation, shortfalls in recruitment can lead to students from earlier years being included, increasing the disparity in experiences [
59]. Finally, two studies pooled results for students and graduates from the same cadre for at least part of the analysis [
53,
59]. This is likely to lead to less valid results and overestimation of the willingness of qualified health workers to accept certain conditions.
Choice task design
A third of studies (33.3%, 9/27) identified attributes and levels through a combination of literature/policy reviews and qualitative work with target participants and policymakers, which is best practice to obtain valid and policy-relevant attributes [
63,
64] (Table
1). The vast majority (85.2%, 23/27), however, conducted some qualitative work (focus groups or interviews) with representatives of the target population. This is important to ensure the attributes and levels chosen are salient to the target population, encouraging engagement with the choice task presented [
29].
Table 1
Choice task design of included studies
Preparatory work
| Literature review | 20 (74.1) |
| Participant qualitative work | 23 (85.2) |
| Policymaker qualitative work | 16 (59.3) |
| All three methods | 10 (37.0) |
Type of choice
| Binary | 21 (77.8) |
| Ternary | 1 (3.7) |
| Quaternary | 2 (7.4) |
| Mixed binary/ternary | 3 (11.1) |
Attributes
| 5 | 3 (18.5) |
| 6 | 8 (29.6) |
| 7 | 12 (44.4) |
| 8 | 4 (14.8) |
Labelling
| Generic | 20 (74.1) |
| Labelled | 7 (25.9) |
Opt -out option
| Yes | 8 (29.6) |
| No | 19 (70.4) |
Three out of four studies (77.8%, 21/27) presented a binary choice task to participants, with only three studies using higher-order choices of ternary [
53] and quaternary [
57,
65] choices. Yet labour markets for health workers are complex [
66]. Along with the option to remain in their current job, health workers can internally migrate between locations or sectors or overseas, the latter of particular concern in LMIC. In a novel approach, Lagarde
et al.[
65] presented four labelled profiles in different sectors and locations to South African nurses: overseas, public rural, public urban, and private urban. Although there is evidence that increasing task complexity (such as adding more alternatives) can decrease quality of choice responses [
29,
67], the cognitive dissonance created by a less realistic representation of the job market available to participants may in itself produce less valid choices.
Choice tasks can also include an
opt-out, in the form of a “choose none” or a
status quo (“choose my current job”) option [
29]. Nearly one in three studies in this review (8/27, 29.6%) included such an option, compared to just one in the Lagarde-Blaauw review. Three studies presented a two stage choice to participants, one as a forced binary choice between two presented profiles and one ternary choice containing an opt-out [
68‐
70]. The inclusion of an opt-out option can avoid a “forced choice” which assumes that one of the alternatives offered must be taken up and may falsely increase the strength of preference associated with alternatives, distorting related welfare estimates [
29,
31,
71‐
74]. Indeed, the instruction to “assume these are the only options available to you” is a common way of framing a choice task. In real life, however, health workers always have many options in the labour market, including the
status quo of staying in their current job or withdrawing from the health labour market altogether. This holds true even for students or new graduates. Although consumption of the good or service on offer can rarely be assumed in DCE applications in health, except for perhaps comparing new treatments versus current treatments, it is arguably more pertinent here. After all, labour market decisions are complex decisions with significant consequences, frequently associated with major disruptive effects on an individual’s status quo, and the total number made over a lifetime is comparatively few compared to other types of decisions. Maintaining this status quo by opting out of a choice between job profiles may seem very attractive, and its inclusion more closely reflects the real world market. This is especially important for measures of relative attribute impact such as willingness to pay for desirable job characteristics (see below). The disadvantage is that the researcher risks not obtaining sufficient information on preferences to estimate the analytical model if an opt-out option is chosen by the majority of participants. The use of a two stage choice, with both a forced choice and a choice with an opt-out option, seems pragmatic until sufficient information is gleaned on the likely distribution of responses. Scott
et al. used this approach for a DCE on Australian GPs embedded within the MABEL survey [
70], but went on to construct the status quo for each participant through responses to other questions gathered in the larger survey. This innovative use of accompanying survey data meant that no information was lost when participants chose the status quo option, as attributes and levels for this alternative could be defined on an individual level. If the status quo varies within the target population, then participants should be asked to identify their status quo through survey questions in order to model these alternatives [
29]. Researchers should be careful to frame the choice task in a way that does not downplay the opt-out option, in order to increase accuracy of welfare estimates.
Choice tasks profiles can be
generic, e.g. “Job A” versus “Job B”, or
labelled e.g. “Rural clinic” versus “Urban hospital” (Figure
1). Generic designs were used by the majority of studies (74.1%, 20/27), although seven studies featuring a labelled design in the last three years [
4,
52‐
54,
57,
65,
69]. All of these studies presented rural versus urban alternatives, except the above study by Lagarde
et al. that also included jobs overseas and in private facilities [
65]. The use of labelled designs in this way can enhance realism for participants by allowing alternative-specific attributes to be defined in order to avoid unrealistic combinations that might lead to participant confusion and/or disengagement with the questionnaire (for example, the availability of private practice in rural posts) [
4,
54,
56,
75]. Labelled designs can also provide choices between additional qualities associated with the labels by participants, but not captured by the limited number of attributes [
75]. The drawback is that these qualities are not delineated, so researchers cannot be certain if their interpretation of the label matches that of the participants. In addition, label-specific attributes/levels are correlated with the label, and therefore their utilities cannot be distinguished in the analysis [
75]. This may not be a disadvantage, however, if the policy aim is to investigate preferences for specific job types in a given market (e.g. rural/urban/overseas) or how individuals value the same attribute in different posts. In contrast, a generic choice is more appropriate where the research interest is the trade-off between different attributes for one particular type of job.
Experimental design
The assessment of experimental design was hampered by poor reporting (Table
2). All studies used a fractional factorial design to decrease the total number of possible attribute and level combinations to a more manageable number, with SAS software (
http://www.sas.com, 40.7%, 11/27) the most popular design source. Only one study reported using interaction terms within its fractional factorial design so as to be able to identify the modification of the preference for one attribute based on the level of another [
6], with the vast majority (88.9%, 24/27) assessed as including main effects only (the primary effect of each attribute). The inclusion of interaction terms increases the number of choice tasks required to make accurate estimates [
28,
29] and it is not common practice in health economics DCEs, with only 5% of studies including two-way interactions between attributes in the Bekker-Grob review [
30]. Yet preferences for attributes of health workers’ jobs may well depend on the level of other attributes. For example, free transport may be more highly valued in a rural area than an urban post. Thus it is likely to be inaccurate, albeit pragmatic, to assume that the main effects of attributes are not confounded by each other. The inclusion of selected interaction terms in design plans should be encouraged, based on those that are most likely to be conceptually valid.
Table 2
Experimental design of included studies
Design plan
| Main effects only | 4 (14.8) |
| Main effects + interactions | 1 (3.7) |
| Not clearly reported in text but main effects only in primary analysis | 20 (74.1) |
| Not reported and unclear from analysis | 2 (7.4) |
Design source
| SAS | 11 (40.7) |
| Sawtooth Software | 5 (18.5) |
| SPEED | 3 (11.1) |
| IBM SPSS Statistics | 2 (7.4) |
| Sloane’s orthogonal array | 1 (3.7) |
| Not reported | 5 (18.5) |
Design of choice tasks
| Orthogonal array (all using one constant comparator) | 8 (29.6) |
| Efficient design | 15 (55.6) |
| Not clearly reported | 4 (14.8) |
Number of choice tasks
| <10 | 8 (29.6) |
| 10-15 | 6 (22.2) |
| 16-20 | 13 (48.1) |
The majority of studies (55.6%, 15/27) used an efficient design to design their choice tasks, including every study from 2010 onwards that reported design type bar one [
60]. This uses an algorithm to maximise the statistical efficiency of the design, and corroborates the increase in this design approach identified by de Bekker-Grob
et al. Eight studies (29.6%) employed an orthogonal design, which uses an orthogonal array to generate choice profiles and then one of several methods to allocate profiles to choice tasks [
10]. In all these studies, a constant comparator approach was used to construct choice tasks, whereby one profile is selected to be paired in each choice task against the remaining choice profiles. This is in contrast to de Bekker-Grob
et al., who found just one in three studies using orthogonal arrays using this approach. Its popularity here may be an attempt by researchers to represent a
de facto status quo option, with one choice profile used to correspond to the prevailing or baseline job conditions. This approach, however, is inefficient and discards much information on choices between attributes, rather than using a constant “neutral” opt-out alternative [
22].
Efficient designs also have the advantage of being able to incorporate prior estimates of parameter values rather than setting these at zero. This increases the efficiency of the design through a Bayesian approach, with estimates usually obtained through pilot studies [
30,
51]. In contrast to de Bekker-Grob
et al. who found no studies employing this feature, two health workforce DCEs incorporated priors from a pilot survey, both from the MABEL survey [
51,
58]. Given that the limited number of health workers in LMIC and the logistical difficulty of administering surveys to practising health workers, practitioners should consider the use of priors to order to increase the precision of value estimates for small sample sizes [
30].
Nearly half the studies (48.1%, 13/27) presented between 16 and 20 choice tasks to participants, with a mean of 12. Blocking was employed by ten studies, usually to decrease the number of choice tasks to less than ten. The number of choice tasks presented to participants is usually restricted due to fears over choice complexity and cognitive burden that may reduce the quality of responses [
29]. Amongst a target population that has uniformly completed tertiary education courses characterised by frequent testing, however, higher numbers of choice tasks may be handled without any ensuing loss of engagement. It would be interesting to compare the responses from the same group of health workers to varying number of choice tasks.
Conduct
Three quarters of studies (20/27, 74.1%) reported piloting their surveys before full rollout. There was great variation in piloting, however, with pilots ranging from a small focus group of one subgroup within the target population [
59] to a four stage procedure with a final random sample of 1091 participants [
70]. Piloting is an important part of DCEs, allowing verification of presentation, comprehension, coverage of attributes and levels, complexity, likelihood of the selection of an opt-out option, and data collection for priors as discussed above [
29]. The development of a standard checklist for piloting DCEs would be worthwhile, allowing for contextual differences. In particular, pilots should attempt to include representatives from all subgroups of health workers to be analysed in the final sample (e.g. differences in gender, locations, seniority) to ensure that differences in understanding are not leading to variation in preferences associated with these subgroups.
The mode of administration of DCEs is likely to be important both for the response rate and understanding of the task (Additional file
2). Seven studies used postal surveys to contact large numbers of health workers, all in HIC [
5,
23,
47,
48,
51,
70,
76]. Two of these studies also included online questionnaires [
51,
70], although three studies used computer-assisted surveys on student populations in LMIC [
45,
56,
77]. In LMIC, response rates were generally very high, with a mean of 83.2% (range 65.2% to 100%, the latter from a study set in China as reported by authors [
60]), compared to 49.3% (16.8 – 65.0%) in HICs. Unsurprisingly, response rates were significantly lower for graduates (mean of 62.7%, range 16.8 – 100%) than for students (mean 84.1%, range 62.7 – 100%), underscoring the potential for distortion if results from these two subgroups are combined. Surveys were most commonly self-administered with supervision by researchers (10/27, 37.0%), a format that allows participants to ask questions for clarification but complete the survey in their own time.
Total sample sizes (Additional file
2) ranged from 102 doctors in Peru [
57] to 3727 general practitioners in Australia [
58]. Whilst sampling follows the same principles as for other primary data collection i.e. ensuring the sampling frame and sampling strategy are representative of the target population(s), sample size calculation is an ill-defined area within discrete choice experiments. Although various rules of thumb were formed from modelling experience [
8,
29], these have become less relevant with the advent of efficient designs that can take into account limited sample sizes [
63]. Indeed, a very large sample encompassing wide variability in preferences may lead to less precise results than a small, more homogeneous sample [
63]. For health workers, more attention should be placed on the representativeness of the sampling frame in order to extrapolate results to the general population, and the sampling strategy to ensure adequate size of subgroups if significant
post hoc analysis by different characteristics is planned [
29,
63].
Analysis
For a succinct summary of modelling approaches to health DCEs, see de Bekker-Grob
et al.[
30] and Amaya-Amaya
et al.[
63]. While most studies pre-2010 relied on random effects probit or logit models [
63], mixed logit has been the most common econometric model more recently, used in 11 studies (39.3%) after 2010 (Table
3). Mixed logit relaxes the restrictive assumptions of the commonly used multinominal logit model by allowing for heterogeneity of preferences for attributes between participants, which is likely to be high in the fairly diverse health worker populations covered by many of these studies. It does this by introducing an individual-level utility estimate for each attribute calculated from the mean utility estimate for that attribute and an individual-specific deviation from the mean [
29,
70]. Although flexible, the mixed logit model has a number of challenges, such as the choice of parameters to define as random. Moreover, the size of these individual-specific variances are likely to vary within and between participants, reducing the precision of utility estimates rather than increasing it. The latent class model has the same advantage over the multinominal logit as mixed logit, however assumes that there are two or more classes (or groups) of participants underlying the data with more homogeneous tastes. The distribution of participants belonging to these classes is not known to the researcher, but is assumed to be related to observed variables such as attitudes and/or socio-demographic characteristics [
63]. Latent class models have been used only rarely in health DCEs, with none from this review and just one in de Bekker-Grob
et al.[
30], however this model offers much to health workforce DCEs. As described earlier, quite heterogenous populations are typically included in health DCEs, for which latent class models may be able to separate into subgroups with more similar (and accurate) preferences depending on characteristics, for example years of work experience or growing up in a rural area. Four studies (14.8%) used an extension of mixed logit, generalised multinomial logit models, with three of these finding a better fit to data than comparator mixed logit or logit models [
51,
54,
58,
62]. Generalised multinomial logit models are able to account for scale heterogeneity of preferences as well as taste heterogeneity, i.e. utility estimates might vary between individuals not only because of differences in preferences, but also due to differences in variance. Some individuals may be much more certain of their choice than others or use decision heuristics that reduce variance, whilst other participants may not understand the task well or make mistakes that increase variance [
70]. Fiebig
et al.[
78] assert that this model can better account for responses from these “extreme” participants, providing an improved fit to the data. This is undoubtedly an attractive feature for DCEs examining labour market decisions (where participants may be more uncertain) in populations of workers that are typically time-poor and highly pressurised (thus perhaps more likely to employ decision heuristics or make mistakes). This may explain its popularity here, with four studies employing it compared to none in de Bekker-Grob
et al.[
30].
Table 3
Analysis of included studies
Econometric model
| Probit | 1 (3.7) |
| Logit | 2 (7.4) |
| Random effects probit | 7 (25.9) |
| Multinomial logit | 1 (3.7) |
| Conditional logit | 3 (11.1) |
| Mixed logit | 11 (40.7) |
| Generalised multinomial logit | 4 (14.8) |
| Errors component mixed logit | 1 (3.7) |
Analysis software
| Stata | 16 (59.3) |
| NLogit/LIMDEP | 5 (18.5) |
| SPSS | 2 (7.4) |
| Not reported | 4 (14.8) |
Relative attribute impact analysis
| Probability analysis | 16 (59.3) |
| Welfare measures | 12 (44.4) |
| Marginal rates of substitution | 5 (18.5) |
| Partial log-likelihood analysis | 1 (3.7) |
| Compensating differentials | 1 (3.7) |
| Wage equivalents | 1 (3.7) |
| None | 2 (7.4) |
As the importance of different attributes cannot be compared directly using parameter estimates due to confounding with the underlying utility scales, the relative impact of attributes is usually examined by converting estimates to a common scale [
79]. There are a number of methods to do so, including probability analysis, welfare measures and marginal rates of substitution. Probability analysis and welfare measures were the most popular methods in this review, with 16 (59.3%) and 12 (44.4%) studies employing them respectively. It is surprising that more studies did not calculate welfare measures, given all studies included a monetary variable. Ten out of these 12 studies (83.3%) did not include an opt-out/status quo option, however, which as discussed above is likely to distort welfare measures due to the overestimation of preferences resulting from a forced choice [
29]. Despite over half of studies including a time variable, no study presented a marginal rate of substitution for time, in the form of willingness to commit to a post for a defined period. This is an important metric for policymakers, with pragmatic retention policies and incentive packages designed in the knowledge that filling unattractive posts may be for a limited period only.
Nearly all studies using welfare measure(s) framed these as willingness to pay, either marginal (for changes in attributes) or total (for certain alternatives or scenarios). Willingness to pay for health workforce DCEs is rooted in the labour economic theory of compensating wage differentials, which puts forward that differences in wages arise to compensate workers for nonwage characteristics of jobs, for example risk or lack of social amenities [
47,
80]. In health workforce DCEs, negative willingness to pay represents the additional amount of income required to compensate a health worker for a job with negative characteristics. For example, Scott
et al.[
70] modelled a range of unattractive job postings with accompanying negative total willingness to pay values. Conversely, positive willingness to pay is the amount of income that a health worker would forego in order to take up a job with desirable characteristics. For example, Vujicic
et al.[
50] estimated the marginal willingness to pay by doctors in Vietnam for various desirable job characteristics, such as urban location and adequate equipment.
However, two thirds of these studies (66.7%, 8/12) used a current income level accompanied by either actual or percentage increases on this baseline. The negative willingness to pay values obtained in these studies may be overestimates due to the endowment effect. This states that desirable goods are more valuable when they are part of one’s endowment, i.e. individuals put more value on the loss of something they own or have experienced than its acquirement when they have not experienced it [
81]. In this situation, health workers may more easily give up hypothetical additional compensation rather than a decrease in their actual salaries. Compensating wage differentials may be more accurate when a level is included in the monetary attribute to represent a decrease in current income, as seen in four studies for at least some participants [
5,
47,
70,
82].
More recent studies tended to extend the probability analysis by simulating different policy scenarios, particularly predicting the uptake of jobs in rural areas under different incentive packages. Lagarde
et al.[
54] went further by examining the uptake of rural jobs by Thai doctors under different incentive policies for i) the original population; ii) three hypothetical populations with differing proportions of doctors with rural/urban backgrounds; iii) undergraduate training in Bangkok as opposed to outside the capital. Sivey
et al.[
51] investigated specialty choice for junior doctors in Australia with an unlabelled design consisting of attributes describing various job aspects, but then used data from the accompanying survey sent to all Australian doctors to set typical levels for the same attributes for specialist doctors versus general practitioner (e.g. regular continuity of care for general practitioners). The researchers went on to predict the uptake of general practitioner training under different changes to three policy-amenable attributes: procedural work, academic opportunities, and salary. This study is also the first, to our knowledge, to use revealed preference data from the survey on the proportion of junior doctors actually choosing general practice to calibrate their model, so that the predicted choice probabilities matched the actual choices before starting the policy simulations. This comparison with revealed preference data is to be welcomed [
30], although it is rare for DCE practitioners (particularly in LMIC) have access to such comprehensive data.
Five studies combined predictions from a probability analysis with cost data in order to assess the cost impact of favoured policy options [
46,
49,
55,
65,
82]. Chomitz
et al. compared a small number of policy options to improve the maldistribution of doctors in Indonesia with little detail on costings, and reported that bonuses for working in remote or very remote posts would be cheaper to provide than specialist training. In a more detailed analysis, Vujicic
et al.[
82] found that rural allowances would be more cost-effective for attracting nurses to rural posts in Liberia than providing housing or improving equipment. Rao
et al.[
55] showed that reserving postgraduate training places was the most cost-effective policy to encourage both doctors and nurses to take up rural jobs in India, with a higher predicted uptake at a lower cost than salary increases. Lagarde
et al.[
65] combined predicted probabilities from two DCEs, one simulating the current labour market in South Africa and the South African component of the multi-country analysis of policy tools to attract nurses to rural areas [
4]. These were used in a Markov model to simulate the distribution of nurses in the labour market over time under different policy scenarios using rural nurse-years as the effectiveness measure. The results showed that salary increases are dominated by non-wage interventions, and “upstream” measures (i.e. recruiting individuals more likely to choose rural posts willingly, such as those with rural upbringings) are more cost-effective than “downstream” interventions, with the most cost-effective policy being the recruitment of students with rural backgrounds.