Introduction
Health economic evaluations to a large extent rely on the time tradeoff (TTO) method to measure health state utilities. This method measures utilities by letting individuals trade off lifetime against health status. A major problem in employing the TTO method for health state valuations is the finding of violations of procedural invariance, that is, the scores attached to a health state depend heavily on the procedure used to obtain that score [
1‐
3]. In general terms, the TTO method elicits the point of indifference between two streams of health, typically a shorter time span in full health and a longer period in an impaired health state. When a respondent is indifferent between
n years in full health and
x years in health state
β, the value of
β is obtained by dividing
n by
x. Obviously, for health states better than death, one may obtain the value of
β either through fixing the period in
β (
conventional procedure) or that in full health (
alternative procedure). In theory, both should yield the same valuation, but several studies found the conventional procedure to result in significantly higher scores than the alternative procedure [
1,
2]. Since it is unclear which of these procedures, if any, captures underlying preferences, these findings are problematic.
One potential determinant of violations of procedural invariance is the elicitation design. In particular, one can use a matching design or a choice-based design. Many TTO elicitations employ a matching design to determine indifferences, in which a respondent has to indicate the number of years in full health that renders him indifferent to a given number of years in an impaired health state [
4‐
8]. Choice-based designs [
9], though, are better embedded in economic theory [
10] and may lead to fewer inconsistencies [
11]. Furthermore, choice-based designs naturally provide a more neutral situation, suggesting a smaller impact of loss aversion and other biases [
2] and, hence, procedural invariance may be more likely to hold in that case.
This paper addresses the problem of procedural invariance by performing a more rigorous test than previous studies using a fully choice-based, computerized, design and a broad range of time horizons (denoted “gauge durations” henceforth). The advantage of a computerized questionnaire is the facilitation of chaining of the answers, enabling fast elicitation of choice-based indifferences and efficient testing of procedural invariance. The advantage of using a broad range of gauge durations is that we can test whether the earlier observed behavioral discrepancies between short and long-time horizons [
4,
12,
13] can be extended to the domain of procedural invariance. In addition, we investigate the role of discounting when the results are not in accordance with theory, i.e., when both procedures do not result in similar TTO scores.
Another interesting question concerns the validity of the constant proportional tradeoffs (CPTO) property using the alternative elicitation procedure. There is a considerable amount of evidence about CPTO for the conventional procedure [
14,
15], but not so much for the alternative procedure. The two studies on this topic rejected CPTO [
2,
14], but both used a more limited range of gauge durations than the current study. Using a broad range of gauge durations is especially important in this context, since the available evidence seems to indicate that findings of violations of CPTO may be related to the gauge durations used [
14]. The use of multiple durations allows us to perform a new and more elaborate test of CPTO for the alternative procedure in this paper, covering gauge durations between 3 and 46 years. Furthermore, we investigate the role of discounting by adjusting for utility of life duration curvature.
Terminology
Let us start by introducing the terminology used throughout this paper. h = (h
j
,…,h
n
) denotes a health profile where ht is the health state in period t = j,…,n, with n denoting the final period under consideration. A constant
health profile
h = (h
j
= α,…,h
n
= α) is indicated as (α, n). Further, v(h
t
) is a value function that represents the individual’s preferences over health quality and δ(t) denotes the corresponding weight attached to the value in this period.
The Quality-Adjusted Life Year (QALY) model is a widely used model in health economic evaluations. A general version of this model can be written as:
$$ U\left( {t,h_{t} } \right) = \sum\limits_{t = j}^{n} {\delta (t)*v(h_{t} )} $$
(1)
The TTO method infers health state utilities by asking subjects to consider two constant health profiles (β,
n
β
) and (γ,
n
γ
), with, in general, γ a better health state than β and
n
β
>
n
γ
. When an individual is indifferent between these two profiles, according to (
1), we obtain the following equality:
$$ \sum\limits_{t = 1}^{n\beta } {\delta \left( t \right)*v\left( \beta \right)} = \sum\limits_{t = 1}^{n\gamma } {\delta \left( t \right)*v\left( \gamma \right)} $$
(2)
Often no discounting is assumed so that Eq.
2 can be simplified to
n
β
*
v(
β) =
n
γ
*
v(
γ), which means, after normalizing
v(
γ) to 1, that the utility of the health state is simply
v(
β) =
n
γ
/
n
β
. In this case, it does not matter whether preferences are elicited by fixing the duration in β and asking for
n
γ
, or by fixing the duration in γ and asking for
n
β
, as the ratio in theory will be the same for both. Although most TTO studies perform the conventional procedure to measure health state utilities [
9,
16], the alternative procedure should, in principle, give the same results according to this model.
Unfortunately, the QALY model may not be a good descriptive model, caused by phenomena such as loss aversion, scale compatibility, and maximum endurable time [
17‐
19]. If these or other distorting factors would differ between the two procedures, resulting in differences between them, discounting may become relevant in explaining the differences [
1]. Therefore, we first have to measure δ(
t), since a test of these differences depends on the real size of the utility differences, which is only available if one adjusts for discounting.
Background
All studies that thus far have tested procedural invariance between the conventional and alternative TTO procedures indeed rejected it or found mixed evidence. In particular, they found systematically higher values for the alternative procedure for at least some of the included comparisons [
1‐
3]. Discounting was not found to explain this dichotomy [
1,
2], pointing toward other biases that should be taken into account. Adjusting for discounting did, however, reduce the differences between the elicited utilities.
Three published papers have studied procedural invariance so far. First, Bleichrodt et al. [
2] asked five conventional TTO questions, with a gauge duration ranging between 13 and 38 years, and back pain as the impaired health state. The answers (i.e., number of years in full health that made respondents indifferent to the specified number of years with back pain) were used as gauge duration in the alternative procedure 2 weeks later, when the subjects had to come back. For example, if a subject in the first session had expressed indifference between 13 years with back pain and 10 years in full health, in the second session he had to indicate how many years with back pain was of equal value as 10 years in full health. Procedural invariance would then require the subject to elicit 13 years again here. Bleichrodt et al. [
2] showed that discounting should not distort the results in this case, since the involved periods in case of procedural invariance are identical.
Bleichrodt et al. [
2] used a choice-based design to elicit TTO scores. They used a questionnaire, in which the respondents were confronted with a list of choices between a number of years with back pain and a number of years in full health (see
Appendix 1). The subjects indicated in this test per choice whether they preferred the number of years with back pain or the number of years in full health. The conventional procedure resulted in higher TTO scores for the three shortest gauge durations, whereas no significant differences were present for the two longest gauge durations. Bleichrodt et al. [
2] attributed the difference for the shorter durations to loss aversion, which tends to be more influential for shorter durations [
20]. This finding is consistent with lexicographic preferences (or
short-
term indifferent subjects) for short durations that were found in earlier studies [
12,
21].
Secondly, Spencer [
3] used four conventional and two alternative questions. The questions for the conventional procedure all had a gauge duration of 10 years, but in different health states, which were described in terms of the five EuroQol dimensions. The questions for the alternative procedure had a gauge duration of 2 years in two different imperfect health states. Indifferences were reached by letting subjects choose and varying the response mode until the subject was indifferent between the two alternatives. Spencer did not use answers to one procedure as input in the other procedure, however, and, hence, the results were distorted by discounting, but not adjusted for. She found mixed results. For one health state, the conventional procedure yielded a higher TTO score than the alternative procedure, but for the other, there was no significant difference. Maximum endurable time may, however, have played a role in the latter finding, as this question used a very poor health state.
Third, Attema and Brouwer [
1] performed two conventional and two alternative TTO exercises, with back pain and full health as the health states. They used fixed gauge durations for all questions, but they did adjust for discounting by means of a discounting elicitation method [
22]. The gauge durations for the conventional procedure were 14 and 27 years, and 10 and 22 years for the alternative procedure. An open ended matching design was used to elicit indifferences. Higher TTO scores for the conventional procedure were found for both questions, also after adjusting for discounting.
The present study was designed to address the issue of procedural invariance, attempting to combine the best elements of previous studies and to add an improved elicitation method. That is, we used multiple gauge durations and a choice-based design, and our experiment was in principle designed in such a way that the equality of both procedures was not distorted by discounting. In that sense, it was comparable to the study by Bleichrodt et al. [
2] in a number of respects, but also attempted to further refine this. In accordance with their study, we took the answers of the subjects to the questions in the conventional procedure and used them as gauge durations in the alternative procedure. Their study also used a choice-based design to elicit TTO scores, although it differed from ours in the following way. In contrast to [
2], our design gave choices one by one on a computer screen, so that subjects faced only one choice at a time and could not see the other choices during the making of one particular choice. Another difference with the preceding studies is that we considered a broader range of time horizons, between 3 and 46 years, allowing us to test procedural invariance for short, intermediate, and long durations.
As discounting should theoretically not matter in our design, we could have simply compared the unadjusted TTO scores for the two procedures. However, we also investigated the results for the adjusted TTO scores, since discounting does influence the results when procedural invariance does not hold, for example, due to loss aversion [
1]. Any difference between the two methods is then influenced by discounting, so that adjusting for this provides a better insight into the differences in utility terms. We elicited the utility function for life duration by means of a risk-free method [
22] for this purpose, and employed the adjustment procedure of Attema and Brouwer [
8] to adjust the TTO results for the elicited discounting results.
Finally, the QALY model without discounting implies the presence of CPTO, i.e., the ratio
n
γ
/
n
β
is constant and independent of the gauge duration. Similarly, for the generalized QALY model, the ratio
\( {{\sum\nolimits_{t = 1}^{{n_{\gamma } }} {\delta (t)} } \mathord{\left/ {\vphantom {{\sum\nolimits_{t = 1}^{{n_{\gamma } }} {\delta (t)} } {\sum\nolimits_{t = 1}^{{n_{\beta } }} {\delta (t)} }}} \right. \kern-\nulldelimiterspace} {\sum\nolimits_{t = 1}^{{n_{\beta } }} {\delta (t)} }} \) is constant. There is, however, some empirical evidence rejecting CPTO for the conventional procedure [
14,
15]. The broad range of durations in this study allowed us to test whether the same findings occur for the alternative procedure, both in its traditional form and in a more generalized form (i.e., adjusted for discounting).
Discussion
Procedural invariance is an important topic in the area of TTO measurements. There are essentially two procedures that can be used to determine indifferences, i.e., varying the duration in full health or varying the duration in an impaired health state. It is unclear which of these two procedures is better, in the sense of better describing preferences of the respondents, and often relatively large differences are found between the two procedures. Therefore, research explaining these differences and inferring which procedure gives better estimates of true preferences seems warranted.
In this study, we found relatively modest differences between the two different procedures, which, to some extent, is a different result than the previous studies in this area. A logical explanation of this result would be that it is correlated with the use of longer gauge durations and a choice-based design. In particular, replacing a matching design by a choice-based design tends to decrease the impact of loss aversion considerably [
2], only leaving a significant impact for short durations. This tendency was confirmed in another study, which, for the same subjects as in this study, found a significant difference between the conventional and alternative procedure while using a matching design [
25]. We conjecture that a choice-based design introduces a more neutral scenario, causing subjects to focus more on tradeoffs; in matching designs, they may tend to give more attention to the amount of years given up (conventional procedure) or the deterioration in health (alternative procedure). However, the use of different samples and health states may also be a determinant of the different results.
While our findings are encouraging when considering procedural invariance, the property of CPTO was rejected in our study. This finding adds to the evidence against the QALY model, as our results are in line with previous results [
2,
14]. Those studies also reported a positive correlation between gauge duration and TTO scores for the alternative procedure.
In our study, we were able to adjust for discounting to see whether this would reduce the observed violations. It is clear that adjusting for discounting does not reduce the problem, but instead results in even stronger violations of CPTO. This implies that when differences occur in terms of proportions traded off, these are not mainly driven by discounting, but rather by utility differences. An explanation for the upward trend in TTO scores in the alternative procedure is that, for longer durations, subjects do not require relatively as many extra life years to compensate for the decreased health status as for shorter durations. It may be that the subjective life expectancy (SLE) plays an important role here. Recent studies found evidence for an impact of SLE on TTO valuations [
26,
27]. In particular, people tend to take their SLE as their reference point and relate the time frame of the TTO questions to that reference point. Most of the subjects in our study had an SLE that exceeded the time frames in the TTO questions and, hence, all these situations involved losses as seen from this reference point. However, these losses were smaller for the longer gauge durations, causing subjects to demand fewer extra life years in return for a worse health status.
Although we put substantial effort into decreasing the impact of remembrance on the results in the alternative procedure, it cannot be ruled out that this has influenced our findings. Also, the fact that for the gauge duration of 3 years, we found a violation of procedural invariance in the direction predicted by loss aversion indicates differently. Another limitation of the present study is that we always started with the conventional procedure. This may have created an ordering effect. Therefore, future research is needed to replicate this test without the possible influence of remembrance, while randomizing the order of the procedures to control for ordering. Nevertheless, the findings regarding procedural invariance are promising and suggest that a choice-based design is less susceptible to the option that is varied (i.e., the number of years in the impaired health state or the number of years in full health). Moreover, the reduction in variance after adjusting for discounting, which was also found in another study [
24] for the conventional procedure, emphasizes the usefulness of this adjustment, especially when keeping in mind the large variation found in TTO studies [
28]. Moreover, our sample only included university students and considered only one disease state (back pain), which may hamper generalization of our findings. For example, Dolan and Roberts [
29] reported age and education to be correlated with health states values. On the other hand, de Wit et al. [
30] did not find systematic differences between student samples and general population samples. Therefore, we recommend future research on this topic to investigate a sample representative of the general population and to include more than one health state.
Contrary to most previous findings, this paper has provided some evidence in favor of procedural invariance. When using a choice-based design and intermediate and long-gauge durations, the conventional and alternative procedure did not produce significantly different results. These findings are to some extent in agreement with those of Bleichrodt and Pinto [
20], who found less evidence of an influence of loss aversion for longer gauge durations than for shorter gauge durations, although they considered relatively long durations (between 13 and 38 years). The role of choice-based designs in reducing deviations from procedural invariance deserves more investigation. The findings regarding CPTO, on the other hand, highlight the poor empirical validity of the QALY model in its current form. We conclude therefore that, especially given the popularity of the TTO and the QALY model (also in relation to health care decision-making), more research is warranted. In particular, more research is needed that develops and tests criteria to assist determining the preferable procedure for the performance of TTOs. In addition, the relation between proportions traded off and gauge durations deserves further attention. This includes qualifying and quantifying the several influential factors, such as discounting, loss aversion, and SLE, and how they change for different time horizons.