Introduction
Many decisions about health are made in deliberation with others, e.g. children, spouses or medical professionals. This collective feature of decisions about health is, however, not typically reflected in health outcomes research focused on Quality-Adjusted Life-Years (QALYs). The weights representing quality of life, that are required to calculate these QALYs (i.e. QALY weights), are typically determined through choice-based methodologies [
1], such as standard gamble (SG) or time trade-off (TTO). Both methods are applied to the individual case, through decisions about one’s own (hypothetical) health outcomes [
2,
3], i.e. no deliberation with others is allowed. As is well-documented in the health economic literature, QALY weights usually differ between SG and TTO [
4‐
6]. SG weights are typically higher than TTO weights, and conventionally, this difference between SG and TTO was explained as resulting from deviations from the linear QALY model and expected utility (EU) theory which have both been found to be descriptively inaccurate [
7‐
9]. Although it may be possible to measure these deviations and correct for their influences in SG and TTO [
10], currently no consensus exists on how these biases
1 are best measured or corrected for. Hence, the main motivation of this paper is to explore if the quality and outcomes of SG and TTO are affected by asking individuals to complete these tasks in groups, and if the difference between SG and TTO weights is reduced as a result.
The extant literature for monetary outcomes provides some indication that allowing individuals to discuss these complex decisions about health with others may be helpful. For example, collective decision making has been associated with less discounting and fewer time inconsistencies [
11]. Other existing work on the effects of collective decision-making gives less firm results, with mixed evidence being reported for risk aversion [
12‐
16], ambiguity aversion [
13,
17,
18] and the violation rate of EU [
19‐
21]. When effects of collective decision making occur, they are hypothesized to result from the deliberation, bargaining and exchange of information that takes place when deciding collectively (e.g. [
14,
19]). Taken together, these studies suggest that risk preferences, which are relevant for SG, and time preferences, which are relevant for TTO, might be affected by collective decision making. For example, discounting of future life years leads to downwards bias in TTO [
22‐
24], and if such discounting is lower in when individuals decide in a group [
11] this could lead to higher TTO weights. Similarly, if groups are more willing to take risks [
13], perhaps due to reduced overweighting of small probabilities of dying, this could yield lower SG weights. If such effects occur simultaneously, the difference between SG and TTO might reduce.
Only a few studies exist documenting effects of deliberation in groups or deciding collectively on SG and TTO weights. McIntosh et al. [
25] found that completing SG in a panel and deliberating about responses decreased subsequent SG weights, and Karimi et al. [
26] found that deliberation in a panel had an effect on individual TTO weights. Just a single study explored collective valuation for both SG and TTO and found only small effects [
27]; however, this study used an anonymous voting system to obtain collective SG and TTO responses, i.e. deliberation between respondents was not allowed. Hence, those few studies on the effects of deliberation or collective decisions on QALY weights differ in several respects from the economic literature, in which typically smaller groups actually decide together (i.e. bargaining is included).
As such, we believe the evidence base on collective decision making precludes the formation of clear hypotheses for three reasons. First, next to the mixed evidence on risk preferences, an extensive psychological literature exists suggesting that in some cases detrimental effects of group decision making can be observed. This literature suggests that groups can engage in ‘groupthink’, which fosters limited information search and enhances confirmation bias [
28,
29]. Second, the extant literature on collective decisions mostly studies monetary decision making, while SG and TTO involve health-related decision making, and these differ in many ways [
30]. Third, those few available investigations on effects of collective decisions for health [
25,
26,
31,
32] did not use an experimental design, i.e. often no control condition or comparator was in place. This complicates the interpretation of these studies’ findings, as these may be caused by learning instead (i.e. as a result of repeated measurement after deliberation). Indeed, it is well-known that such effects may occur in health state valuation (e.g. [
33]). Hence, in our work we explore the effects of collective decisions for SG and TTO, whilst controlling for learning effects.
Our study adds to the earlier literature on collective decisions and health state valuation in several respects. We report the first experimental test of the effects of collective decision making on QALY weights, by using a control condition constructed to control for learning. More specifically, we obtained a baseline measurement for SG and TTO for each subject, after which we distinguished between groups and individuals for repeated decisions. By using such a control condition (similar to that of [
17]), we are able to isolate the effect of deciding collectively on multiple facets of SG and TTO decisions (only related to deliberation, bargaining and information exchange). We explore if such effects of collective decisions exist on internal consistency criteria, and if SG and TTO weights change by deciding collectively. Importantly, we test if the difference between SG and TTO reduces, as this could indicate that the different biases that are suggested to produce this difference are reduced [
22]. If that is the case, the use of collective decisions could provide an answer to the open questions surrounding the validity of QALY weights elicited with SG and TTO [
34]. Finally, we test whether any possible effects of collective decision making carry over onto subsequent individual SG and TTO exercises for groups.
Preliminaries
In this paper, we only consider chronic health profiles described as
\(\left( {Q, T} \right)\), with
\(Q\) denoting health status and
\(T\) denoting its duration in years. For brevity, we denote immediate death as
\(D\) and if health status is equal to full health (
\({\text{FH}}\)) we write
\(Q = {\text{FH}}\). Under the assumption of completeness, decision makers are able to form preferences over health profiles, denoted using the conventional notation:
\(\succ\),
\({ \succcurlyeq }\), and
\({\sim }\) to represent strict preference, weak preference, and indifference, respectively. Most studies applying SG or TTO assume that decision makers form these preferences as modeled within the linear QALY model
2 [
35], i.e.:
$$V\left( {Q,T} \right) = U\left( Q \right) \times T.$$
(1)
Decision makers decide about health profiles, either under certainty (in case of TTO) or under risk (in case of SG). Risk is operationalized by presenting decision maker with lotteries of the following form: \(\left( {Q, T} \right)_{p} \left( {Q^{\prime}, T^{\prime}} \right)\), which signifies that health profile \(\left( {Q,T} \right)\) will be realized with probability \(p\), and health profile \(\left( {Q^{\prime}, T^{\prime}} \right)\) with probability \(1 - p\).
The SG method involves determining probability
p at which decision makers are indifferent between a sure outcome
\(\left( {Q, T} \right)\), and a risky prospect
\(\left( {{\text{FH}}, T} \right)_{p} \left( D \right)\). Probability
\(p\) is varied until the respondent is indifferent between a number of years (
\(T\)) in health state
\(Q\) for certain and a gamble with two outcomes, which are
\({\text{FH}}\) during the same time period (
\(T\)), and
\(D\). These SG indifferences are typically evaluated under expected utility (EU) theory [
36]. The TTO method, on the other hand, asks for a time equivalent in perfect health which yields indifference between
\(\left( {Q,T} \right)\) and
\(\left( {{\text{FH}}, T^{\prime}} \right)\), with
\(T > T^{\prime}\). The number of years
\(T^{\prime}\) is varied until the respondent is indifferent between
\(T\) years in health state
\(Q\) and
\(T^{\prime}\) years in
\({\text{FH}}\). Given the assumptions listed above, and setting
\(U\left( {\text{FH}} \right) = 1\) and
\(U\left( D \right) = 0\) the SG indifference
\(\left( {{\text{Q}}, T} \right)\sim \left( {{\text{FH}}, T} \right)_{p} \left( D \right)\) is evaluated by
\(U\left( Q \right) \times T = p \times \left( {1 \times T} \right) + \left( {1 - p} \right) \times 0\), and, thus:
\(U\left( Q \right) = p\). The TTO indifference
\(\left( {Q, T} \right)\sim \left( {{\text{FH}}, T^{\prime}} \right)\) is evaluated by:
\(U\left( Q \right) \times T = 1 \times T^{\prime}\), and, thus, we obtain
\(U\left( Q \right)\) =
\(T^{\prime}/T\).
Bleichrodt [
22] proposed that the typical differences between SG and TTO weights for the same health states may result from deviations from EU theory or the linear QALY framework, such as discounting, loss aversion and probability weighting. Thus, by evaluating SG and TTO without acknowledging these deviations, we should observe a gap between SG and TTO. Formally, we define the SG–TTO gap as the difference between
\(U\left( Q \right)\) as derived from SG and TTO:
\(\Delta \left( {{\text{SG}} - {\text{TTO}}} \right) = p - \left( {\frac{{T^{\prime}}}{T}} \right)\). We expect
\(\Delta \left( {{\text{SG}} - {\text{TTO}}} \right) > 0\), and explore if collective decision making has effects on the difference between SG and TTO. If that is the case this gap should decrease, which we test empirically in an experiment.
Discussion
In this study, we report the first experimental test of the effects of collective decision making on QALY weights. Collective decision making did not appear to have a systematic effect on quality of decisions for SG and TTO; no effects were found for consistency and the initially low monotonicity became up to par with individual decisions. Furthermore, we did find an effect of collective decision making with regard to outcomes for SG and TTO: we found that QALY weights increased, both for collective decisions and for individual decisions. More sophisticated analyses indicated that this increase was only related to learning (and not to facets of collective decisions such as deliberation or bargaining), i.e. repetition increased SG and TTO weights (both in groups and individually). This trend of increased QALY weights for repetition is in accordance with earlier work by Augestad et al. [
33]. Given that student samples in some cases have been found to yield low SG and TTO weights (e.g. [
10]), this learning effect could be seen as beneficial as it realized a movement towards QALY weights representative of the general population, obtained by a more comprehensive elicitation procedure, shorter SG and TTO durations, and a general public sample [
48]. As expected, we replicated the typical SG-TTO gap at baseline [
4‐
6], although this gap was less apparent for the least severe health state. Perhaps this lack of SG-TTO gap for the mildest health state results from a ceiling effect (as QALY weights were close to 1.00). Importantly, we find that the SG-TTO gap was unaffected by collective decision making or learning, and no carryover effects were observed.
This study adds to the evidence base on collective decision making (mostly studies using monetary outcomes). In agreement with the mixed findings of those studies, we do not find a substantial beneficial effect of collective decisions for SG and TTO. However, earlier work on collective decisions for monetary choice suggested that groups discount the future less [
11]. Because discounting has a negative effect on TTO values [
22], less discounting in the group treatment would cause higher TTO values. Hence, our results could suggest that discounting of health outcomes is not affected by collective decision making; an alternative explanation would be that both discounting and loss aversion decrease in group tasks, which would neutralize each other [
22]. Our results also indicate that collective decision making does not alleviate the typical gap between SG and TTO, which is also partially explained as a result of discounting [
22,
49]. Future research could therefore obtain separate measurements of discounting and loss aversion (and possibly also other traits such as scale compatibility and probability weighting) for health outcomes to test these possibilities.
Our results confirm findings by Krabbe et al. [
27], who find only small differences between collective and individual valuation for SG and TTO using an anonymous voting procedure, and that SG–TTO gaps are unaffected by collective decision making [
27]. Furthermore, our findings are in accordance with earlier non-experimental studies on deliberation in TTO or SG, which generally finds that deliberation has little to no effect on QALY weights [
25,
26,
50]. Hence, it appears that deciding collectively has no added benefit beyond providing respondents with opportunities for learning. Our findings, thus, suggest that this procedure could be relevant for obtaining nationally representative value sets in settings where providing ample opportunity for learning is too costly or otherwise infeasible.
As in most experiments on the effects of collective decision making in the economic literature, the use of a student sample can be considered a drawback of this study. Obviously, students differ from the general population in a number of ways (so does any particular sub-sample used in empirical work). This is a common criticism of laboratory experiments¸ as any experiment using a non-representative sample will generate questions of external validity. To this end, in experimental economics, usually a distinction is made between experiments aimed at
measurement and experiments aimed at documenting
treatment effects [
51]. One can question the representativeness of our measurements, as individual QALY weights are typically lower compared to those in the general population [
48]. However, we believe it is not as straightforward to question external validity of the treatment effects (i.e. within-subject learning effects and between-subjects group effects), unless one explicitly assumes that these causal processes occur differently for our subject pool than for the general population. For one thing, students are likely to be younger, healthier and higher educated than the general population, and hence, the finding of a substantial learning effect in our student sample may suggest that the inclusion of a sufficient number of practice rounds in health state valuation will be necessary for a sample representative of the general public. We have no reason to expect that deliberation and bargaining are likely to occur differently for students as opposed to the general population. Nonetheless, a potential problem is that our sample consisted of students exclusively, of whom it is likely that they had similar views on length and quality of life (and thus relatively few opportunities to influence each other). Hence, we believe future work should study if in less homogenous dyads (e.g. doctor/patient or student/retiree) the effects of collective decision-making on QALY weights are more prominent.
To conclude, our work suggests that collective decision making does not appear to yield an effect for health state valuation. As in earlier work [
27], the difference between SG and TTO does not disappear when moving from an individual to a collective task, which suggests that collective decision making does not help to reduce bias in SG and TTO [
22]. Therefore, other solutions for alleviating these confounding effects, such as more elaborate instructions, practice rounds and correction mechanisms [
10] should be considered if one aims to correct for these biases.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.