Background
A challenge in the interpretation of health-related quality of life (HRQOL) data in clinical research is that HRQOL is self-reported by the patient, and might be influenced by psychological phenomena such as adaptation to illness. Patients who experience changes in health often accommodate and adapt to these changed conditions. When measuring changes in HRQOL with a pre-test (assessment prior to intervention)/post-test (assessment after intervention) design, as in a Randomized Controlled Trial (RCT), adaption to increased symptom level or impaired HRQOL can affect results, a change referred to as
response shift (RS) [
1]. Sprangers and Schwartz defined RS in the field of HRQOL as a change in the meaning of an individual's self-reported HRQOL [
2]. It can be divided into 1)
Reconceptualization (i.e. a re-definition of HRQOL), 2)
Reprioritization (i.e. a change in the importance attributed to component domains constituting HRQOL) and 3)
Recalibration (i.e. a change in a patient's internal standards of measurements). The most widely used approach for assessing changes in a patient's internal standard is the retrospective pre-test design (then-test) [
1,
3]. At post-test, patients are retrospectively asked to provide a renewed judgment of their HRQOL at baseline (pre-test). The then-test is ideally completed simultaneously with or in close proximity to the post-test, assuming patients rate their HRQOL on both tests using the same internal standards.
During the last years, several studies have found evidence for the occurrence of RS in HRQOL in cancer patients -e.g. [
4‐
10]. RS may sometimes be the result of an adaptive response to a changed health status, and may then be viewed as a positive phenomenon to patients. However, the altered meaning of HRQOL over time poses a challenge to clinicians in the interpretation of changes in HRQOL. In a study by Visser et al, fatigue was assessed in 216 cancer patients before and after treatment with radiotherapy [
4]. When the conventional pre-test was compared to the post-test, no differences in fatigue were found. This might lead to the conclusion that radiotherapy does not affect fatigue. However, when the then-test was used as the measure of fatigue at baseline, there appeared to be a statistically significant increase in fatigue after treatment.
The magnitude and importance of the RS phenomenon remains unsolved. A meta-analysis by Schwartz et al suggested that RS may play a significant role in HRQOL research and that the direction of this shift varies across studies [
11]. In a previous report we attempted to determine the clinical significance of changes in quality-of-life scores in patients with multiple myeloma (MM) [
12]. MM is an incurable malignant disease of the bone marrow with an expected median survival of five years [
13]. At diagnosis, myeloma patients report a pronounced impairment of HRQOL, with reduced physical functioning, fatigue and pain as the major problems [
14]. Aims of treatment are to control disease, maximize quality of life and prolong survival. Hence, HRQOL is an important outcome in clinical trials. We estimated the Minimal Important Difference (MID) in patients with MM for the HRQOL instrument, the EORTC QLQ-C30. MID is defined as "the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patients' management" [
15]. Our results suggested that a change in the EORTC QLQ-C30 score in the range of approximately 6-17 (on a 0-100 scale) is considered important by patients with MM. Here, we evaluate whether patients experienced RS, and if so its magnitude and direction. We also explore how RS affects the MID-results and whether RS impacts on the interpretation of HRQOL results in clinical trials.
Methods
Patients
Patients with MM, irrespective of their disease status (newly diagnosed, plateau phase, relapsed) or treatment, were enrolled from January 2006 to April 2008. Eligibility was expected survival greater than three months and ability to complete a self-report questionnaire in Norwegian. Consecutive patients admitted to 17 hospitals in the South-Eastern Norway Regional-Health-Authority, a region representing about 50% of the Norwegian population, were recruited. Written informed consent was obtained from all participants. The Helsinki Declaration guidelines were followed. The Regional Committee for Medical Research Ethics, Health region I, Norway, approved the study.
Questionnaire
HRQOL was measured using the EORTC QLQ-C30, a cancer-specific questionnaire with 30 items [
16]. The questionnaire is composed of five functional scales, three symptom scales, a global health/quality of life scale, and six single items. All scores were calculated and transformed to a 0-100 scale according to EORTC methods [
17]. For the functional scales and global health status, higher scores represent a higher level of functioning. In the symptom scales and single items, higher scores represent more symptoms or difficulties. The questionnaire is reliable and valid for MM patients [
18].
Interview and Then-test approach
Patients completed the EORTC QLQ-C30 at inclusion (T1) and after three months (± 2 weeks window) (T2). At T2, a structured interview was performed and the patients were asked: "Compared with the last time you filled in the questionnaire (T1, date mentioned to the patients), has your quality-of-life improved, stayed the same or deteriorated?" The response choices ranged on a seven-point scale from 1 = much better to 7 = much worse. This global rating of change (GRC) question was asked for the four domains physical functioning, fatigue, pain and global quality of life. Because of small sample sizes in some of the GRC categories, we pooled the data into three categories (improved, unchanged, deteriorated) to yield sufficient numbers of cases in each category. "Improved" included much better, moderately better and a little better and "deteriorated" included a little worse, moderately worse and much worse for the four domains. MIDs for improvement and deterioration were defined as the mean score changes in these domains for patients declaring improvement or deterioration. During the article we would use improved as shorthand for patients "who reported themselves as improved", and similarly for deteriorated and unchanged patients.
After the GRC questions, the patients were asked to provide a renewed judgment of their baseline ratings of the EORTC QLQ-C30 for the four domains (Then-test). The questions were asked in past tense for each of the 12 items included in these domains. We emphasized that the purpose of the then-test was not to recall their previous answers but to provide a renewed judgment of their HRQOL at baseline.
The mean difference between the pre-test and then-test scores was used to provide an estimate of the direction and magnitude of the RS effect. Observed changes were calculated by the difference between the mean post-test and pre-test scores while adjusted changes were measured as the difference between mean post-test and then-test scores.
Statistical methods
Wilcoxon tests for pair differences were used to calculate the significance of differences between pre-test, post-test and then-test. We divided the patients into groups according to whether they thought they were improved, unchanged or deteriorated for the four domains.
To examine the magnitude of recalibration RS, effect sizes (ES) were calculated by dividing the mean score changes by the standard deviation at baseline (T1). We used Cohen's generally accepted criteria for interpreting the magnitude of an ES: > 0.20 is a small change, > 0.50 a moderate change, and > 0.80 a large change [
19].
The GRC results and the observed and adjusted changes all appeared approximately to reflect underlying normal distributions. Analysis of variance (ANOVA) was performed and F-statistics values were calculated to see which approach (a seven-point GRC scale, observed changes or adjusted changes) was most efficient at detecting changes in phases of the disease (newly diagnosed, relapse/progression or stable disease). Newly diagnosed patients were expected to improve, relapsed patients to deteriorate and patients with a stable disease to stay unchanged. The relative efficiency of a test is measured by the ratio of the F-statistics values [
20]
Missing data
If any item was missing in the first questionnaire (T1), we accepted the data as missing. For the second questionnaire (T2), the forms were checked and if any item was missing the patients were asked to fill it in before the interview. Still, if any of the constituent items in a scale were missing, the scale score for that patient was excluded from the statistical analyses.
Sample size calculation
The study primarily aimed to estimate the MID and sample size calculation was based on being able to detect a MID of 0.50 × SD, yielding a sample size of 260 patients. The response shift evaluation is descriptive and so the impact of sample size is indicated by confidence intervals around the estimates.
The statistical analysis was performed using The Statistical Package for the Social Sciences (SPSS), version 16 (SPSS Inc., Chicago, IL, USA).
Discussion
The results of the present study indicate that RS exists in MM patients, mainly in those who deteriorated over the 3-month observation period. We found that patients who deteriorated in the domains pain, fatigue and physical function, retrospectively minimized their troubles at baseline. These changes in internalized standards could be a desirable adaptation mechanism to patients with cancer to maintain equilibrium in HRQOL in the face of loss.
Our findings are generally consistent with those of previous studies among other categories of cancer patients with deteriorating health conditions [
4,
5,
21]. Jansen et al assessed RS in 46 patients with breast-cancer undergoing radiotherapy. They found that patients, who had deteriorated, retrospectively reported fewer symptoms at baseline. They concluded that RS measured by the then-test was stronger for deterioration in HRQOL than for improvement in HRQOL.
For patients who
improved, there was no statistically significant evidence of RS except for the domain global quality of life. In RCTs in newly diagnosed patients with MM or cancer in general, patients are usually followed from the start of treatment and the majority of patients are expected to improve [
22,
23]. Thus, the RS phenomenon may arguably be disregarded in the interpretation of the HRQOL results from such trials. Our results are in contrast to findings in studies regarding patients with non-fatal disorders, where
improved patients retrospectively have reported significantly higher disability [
24,
25]. Razmjou et al discussed this issue in a study of patients with total knee arthroplasty and concluded that "it appears that patients who wish to maintain a stable HRQOL would consciously or unconsciously magnify their treatment effect by endorsing a higher disability level retrospectively" [
25].
We found some evidence for RS even in patients who were
unchanged from T1 to T2, mainly for the domains pain and fatigue. On the average, these patients retrospectively underestimated their symptoms. A meta-analysis by Hagedoorn et al [
7] concluded that RS is a common and significant phenomenon in HRQOL measurement, and that in cancer studies, patients with a declining HRQOL may report no decrease in their HRQOL due to positive adaptation. This could be an explanation for the findings for the unchanged group in our study.
ESs can be calculated to evaluate the importance of the observed RS. In our study, we found that the ESs of the RS were small according to Cohen's criteria with the largest ES detected for fatigue. Fatigue has been identified by patients with cancer as a major obstacle to normal functioning and a good quality of life [
26]. Previous studies have suggested that fatigue is a symptom that is especially RS prone [
4,
21].
It is important to know the clinical significance of changes in HRQOL scores for the interpretation of the results from clinical trials. We have previously reported that a difference of 6-17 points (scale range 0-100) in the EORTC QLQ-C30 score represents a clinically meaningful change in patients with MM. In the present study, we found that by controlling for RS in patients who improved, the same interval for MIDs could be used. However, if we adjust for RS in patients who deteriorated, larger MIDs (12-27 points) are obtained. The question is still: does adjusting for RS provides more reliable estimates of MIDs?
The F-statistics values from ANOVA indicates that the GRC is the most effective method for detecting differences in phase of disease, with RS adjusted changes being second best. The GRC method accords most with actual clinical practice, in which health-care providers usually rely on patients' judgment if they are better, the same or worse. However, the question remains, which is the most meaningful and least biased outcome? The most sensitive outcome could be the most biased. If patients are aware that the phase of their disease is deteriorating, they may be more prone to assuming that their HRQOL must as a consequence be similarly declining, resulting in biased reports of GRC and possibly RS adjusted changes.
A possible explanation for the discrepancy between the pre-test and then-test assessment is the potential for recall bias. In HRQOL research, recall bias refers to memory distortion; that is if patients incorrectly recall their health condition at T1 [
27,
28]. However, in a study by Visser et al comparing different approaches to detect RS, recall bias did not invalidate then-test result [
29]. A factor such as the length of period between measurements may affect the influence of recall bias. Like Visser and others [
9,
29], we used a relatively short interval between assessments (3 months). If we had chosen a shorter interval between pre-test and then-test, the patients could have remembered what they actually answered on the pre-test. A longer period between the initial measurement and the retrospective then-test would pose a considerable challenge to memory. The choice of 3 months in the present study was a compromise between these considerations. Another possible explanation for the observed results could be the "implicit theory of change". This theory suggests that patients begin with their presumed present state (post-test) and work backwards to their pre-test state (pre-test), and not on their perception of their health at a specific time point [
27]. A consequence could be that patients view the decline in their HRQOL as bigger than it actually is because they believe their disease is progressing and that consequently their HRQOL must be deteriorating.
Although RS could be a challenge for the measurement and interpretation of self-reported HRQOL, adaption to illness could serve as a form of psychological buffer that helps reduce the stressful impact of a deteriorating health status. For most patients, living after being diagnosed with cancer is not the same as before. An important part of every cancer treatment is helping patients to adapt to their illness. Thus, the positive adaption we found in our study in patients saying that they deteriorated is actually a desired effect for the patients.
We chose to study MM patients because we anticipated large differences in HRQOL score between those who
improved or
deteriorated. A comparison with the results obtained with the EORTC QLQ-C30 in patients with other haematological diseases [
30] and in solitaire cancers [
31,
32] indicates that patients' HRQOL is lower in MM than in several other malignant diseases.
The evaluation of external validity is important to enhance the transfer of results into the clinical routine. The strength of our study is that we included an almost representative sample of patients with MM within the South-Eastern Norway, although the median age was somewhat lower (66 years) than in a newly published population based study from Sweden (72 years) [
33]. However, the mean EORTC QLQ-C30 scores for the whole sample in our study is comparable to a nationally representative study among MM patients in Denmark [
30]. Given the representativeness of the patients included, we can expect the results to be relevant to other MM patients. We would also expect these findings to apply to other cancers or other illnesses, and we encourage confirmatory studies to investigate this.
Acknowledgements
This project has been financed with the aid of EXTRA funds from the Norwegian Foundation for Health and Rehabilitation. The authors thank health-care providers at the following hospitals for recruiting patients to this study: Akershus University Hospital; Diakonhjemmet Hospital; Lovisenberg Diaconal Hospital; Oslo University Hospital, Aker; Oslo University Hospital, Ulleval; Sorlandet Hospital, Arendal; Sykehuset in Vestfold, Tonsberg; Sykehuset Innlandet, division Gjovik, Hamar, Kongsvinger and Lillehammer; Sykehuset Ostfold, Fredrikstad; Telemark Hospital, Skien; Vestre Viken HF, division Asker and Baerum, Buskerud, Kongsberg and Ringerike.
Preliminary results have been presented as a paper at the ISOQOL annual meeting, New Orleans, October 2009
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
All authors conceived of the study, participated in the design of the study, performed the statistical analysis, read and approved the final manuscript. AKK performed the interviews of the patients.