Mode of administration
The mode of administration will also influence the results [
38]. The most preferable mode is to use personal interviews, which are the most expensive as well. The advantage is that interaction between interviewer and respondent promotes good data quality. Disadvantages are the high cost and possible interviewer effects. TTO studies have increased in size over time, often necessitating the participation of multiple interviewers. The effort that is made to minimize interviewer differences is therefore relevant (e.g., training, availability of an interview script and intervision). Moreover, the interviewer help may lead to interviewer bias such as socially desirable bias and acquiescence bias. For example, respondents find it easier to agree than to disagree [
39]. Even small verbal reinforcements have been shown to lead to different reactions of respondents [
40].
Internet experiments have emerged as a way to obtain large representative data sets against relatively low costs [
41,
42]. However, Internet experiments do not allow the researcher to monitor the effort put forward by the respondent, nor do they give the respondent the opportunity to ask questions for clarification or feedback. Versteegh et al. [
41] in this issue report that Internet studies can be problematic for eliciting TTO tariffs. In-between these two is the group experiment, where sessions with small groups are run, with one experimenter present for about each 4–10 respondents. After a plenary description of the purpose of the experiment, the respondents can then answer with the experimenters walking around and answering questions if needed. Although these studies have been shown to be feasible [
43,
44], this method also seems less favorable than personal interviews.
Visual aids
Investigators of TTO tend to have a preference for the use of graphs/illustrations to present the choice situation, since it appears that respondents find this easier than a numerical description [
45,
46]. In the old days, TTO boards were commonly used. Today, the norm is computer-assisted personal interviews, because they promote correct implementation of the iteration procedure as well as a graphical illustration of the tasks. The
visual presentation still varies between studies, which may influence results [
47]. Often, a screen-shot of valuation software or applied visual presentation is requested during the peer-review procedure for the publication of results of TTO studies.
Context effects
People tend to learn during a TTO experiment [
48]. This is typically dealt with by inclusion of a warm-up task. TTO applications differ in the efforts put in to familiarize respondents with both the tasks and the health problems under consideration. Common
warm-
up tasks are TTO questions using different health states or valuation of the same health states using
different valuation techniques (such as the visual analog scale, a ranking task, discrete choices or best-worse scaling) [
41,
49,
50]. A further concern is the
order of health states that a respondent has to value. Randomization is common practice, but more research into the most appropriate strategy may be warranted. Pinto Prades found in a recent study that the precision of health state values is contingent on
ordering of the states [
51]; more precise values are obtained when a TTO sequence begins with a mild state rather than a severe state.
Sampling frame
It is generally recognized that the value of a health state varies with the sampling frame. Economic evaluations in the setting of health care are recommended to be made from the social perspective. Organizations involved in developing guidelines on the use of new and existing treatments, such as the National Institute for Health and Clinical Excellence (NICE), the panel of the US Public Health Service and the Dutch Health Care Insurance Board (CvZ), prefer health state values elicited from a fully informed representative sample of members of the public [
8,
52,
53]. It might be challenging to fully inform members of the public. Instead, there are good arguments to use a patient sample, because these people are more familiar with the symptoms of the disease than non-patients. The panel of the US Public Health Service already suggested that in economic evaluations in which alternative interventions are compared patients' preferences might be the better choice [
52]. However, when investigating a patient sample, one should be aware that adaptation and/or strategic misrepresentation may influence valuation estimates [
54]. Values shaped by adaptation typically lead to smaller effect sizes in the valuation of quality of life-enhancing treatments [
55]. On the other hand, the influence of adaptation will differ between health states, and it provides valid information about the perceived severity of a health state. For instance, people might better adapt to physical impairments compared to mental diseases such as depression or skin diseases such as eczema.
Indirect valuation
Health state valuation methods such as TTO may be used to value the health states of a health state classification system, such as the EQ-5D, the SF-6D or disease-specific questionnaires. Most classification systems contain too many health states to value all of them, and so values are elicited only for a subset. A modeling approach is used to estimate values for all health states. Modeling may be based on multi-attribute utility theory (such as with the Health Utility Index, HUI) or statistical inference. Both approaches are built on different assumptions and come with different requirements with regard to the subset of states that have to be valued directly. In the comparison of TTO values elicited from different experiments, comparing the health state selection and modeling efforts may be relevant.
When modeling is based on statistical inference, regression analysis is applied to estimate values for all health states on the basis of the subset of state observed. The impact of regression assumptions on the predicted values is greater in the case of extrapolation (outside the range of values in the data set) than in the case of interpolation (within the range of values in the data set). Therefore, it is relevant to report the worst health state offered to respondents in the valuation study. Furthermore, prediction intervals and goodness of fit criteria ought to be reported.
There has been little guidance to researchers about state selection, resulting in an unclear state of play. Researchers have considered covering the severity range, orthogonality and health state plausibility, but practice varies. A further issue is how many states need to be valued. Based on theory and observations, Lamers et al. [
56] suggest that a minimum number of respondents per health state is required (for TTO approximately
n = 100) and that in principle adding more states (each assessed by 100 respondents) leads to more information, hence more precise regression estimates, than increasing the number of respondents per health state. But good results have also been obtained valuing many health states with few observations per state [
57]. Bagust [
58] has recently argued that state selection may be improved by adopting more criteria for state selection, such as health state relevance and direct coverage of simple increments in health. Versteegh et al. [
44] argue that the statistically most interesting set of health states may not be the set of health states that occurs most often in patients and show that the inclusion of the states that occur most in patients affects modeled health state values. Whatever the selection method, the selection may still result in a number of health states that is too high to value for an individual. A common solution then is to use a
blocked design, including only a part of the subset of health states in each individual’s questionnaire, while making sure all health states of the subset are valued by a sufficient number of respondents.
In blocking the design, a concern may be obtaining a low anchor. The worst possible health state in a classification scheme is the health state where all dimensions are at their worst possible level, in other words, having severe problems on all dimensions. This state is called the PITS state. This is state 55555 in the EQ-5D-5L system. It would be advisable to include this health state in the valuation task in order to have a lower anchor, but also in this regard practice has varied. Moreover, it is essential to list the number of health states that were valued (overall and per respondent) and sample size, as these characteristics may also affect the predictive quality of the regression model.
When only directly observed TTO values are used rather than modeled ones, the above concerns do not apply. Direct health-state valuations could be used when a limited number of health states have to be valued, e.g., to obtain health state values to health states presented in a Markov model. This approach generates another set of methodological concerns, e.g., related to how the disease state is described (generic or disease specific terms), narrated or bulleted, labeled or unlabeled [
59]. Health state descriptions are developed based on literature, on expert experience or using classification systems such as the EQ-5D, SF-6D and HUI. Health state descriptions need be specific to ensure respondents are fully informed, but also restrictive to avoid information overload. Evidence of the impact of such choices is limited. Two studies found that the exact labeling and framing of the health description did not seem to affect respondents’ valuations [
60], nor did the sparseness of an EQ-5D health state description [
61].
Part of direct health-state valuations are health-state valuations of the own health. This avoids the need to describe health, since the person experiencing the health problem is also the one valuing it [
59]. However, health state valuations of the own health are difficult to interpret because of the lack of clarity about the health problem, e.g., respondents tend to value their whole life including minor positive [
62] and minor negative events [
63]. Direct health state valuations of the own health are preferred when researchers want to incorporate the effect of adaptation, for instance, cost-effectiveness analyses of psychological interventions. Direct health state valuations of the own health are also preferred for psychological illnesses [
64,
65].