Introduction
The assessment of patient-reported outcomes (PRO) has become very common in oncological research and to a lesser degree in daily clinical routine. Information gathered through PRO-monitoring, especially data on quality of life (QOL), has proved to be useful in symptom management and evaluation of oncological treatment [
1‐
5]. But to date the number of studies on QOL in patients with brain tumours is limited, although the limited curative options underline the importance of QOL.
Naturally, the assessment of PRO is restricted to patients having the ability to report on what they experience throughout the course of the disease. In patients with brain tumours the assessment of QOL can prove difficult not only due to physical condition but also because of cognitive impairments such as lack of concentration, thought disorder, communication deficits and visual disorders.
If during the course of the disease the patient's ability to report on his QOL and symptoms diminishes, ratings by others gain importance. Since significant others such as spouses, children or other family members are often intimately involved in patient care, their impression of a patient's well-being could contribute to symptom management and treatment evaluation if gathering information from the patient is not possible. In a research context proxy ratings may reduce drop out bias by allowing patients with progressive cognitive deterioration to remain in the study.
There is some evidence that significant others show agreement with patients' self-ratings on QOL for various types of cancer, although proxies tend to underrate QOL. Furthermore, agreement is lower for psychosocial issues and higher for physical symptoms [
6‐
9].
This kind of proxy-ratings was also found to be more concordant with patients' self-ratings than ratings by physicians [
10,
11]. Besides neurooncological patients, proxy-ratings have also been proven useful in many other patient groups that can not be assessed directly, e.g. in patients suffering from dementia [
12] or in children [
13].
Obviously, the usefulness of a proxy-approach to PRO-assessment depends strongly on the reliability of the rating in terms of agreement with the patient's self-rating. Therefore it is of interest whether or not self- and proxy-ratings correlate highly and whether or not there is a bias induced by proxies over- or underestimating patients' QOL.
The current study aimed to investigate the relation between ratings of patients and their significant others on QOL assessed with the EORTC QLQ-C30 and QLQ-BN20. Thus, we addressed the following questions:
1.) To what degree do self- and proxy-ratings on QOL correlate?
2.) Is there a systematic difference between self- and proxy-ratings on QOL?
3.) What percentage of ratings on QOL show strong agreement?
Discussion
The comparison of patients' rating on their QOL with proxy-ratings obtained from their significant others is of importance to the decision whether or not these proxy-ratings are a useful measure, if patients' ability to report on his QOL diminishes due to physical or cognitive deterioration.
Our study found that for a considerable number of subscales of the EORTC QLQ-C30 and QLQ-BN20 proxy-ratings by significant others can be regarded as useful. This was especially true for Physical Functioning, Sleeping Disturbances, Appetite Loss, Constipation, Financial Impact and Taste Alterations. Worse rater agreement was found for Social Functioning, Emotional Functioning, Cognitive Functioning, Fatigue, Pain, Dyspnoea and Seizures. For these scales correlations as well as percentage of agreement (+/-5 points) were low. However, with the exception of Social Functioning and Dyspnoea means of patients' ratings and proxy-ratings were rather similar (less than 5 points difference).
The additional module QLQ-BN20 showed fairly good rater agreement for most scales. Worst agreement was found for Seizures and Bladder Control.
With reference to Osoba et al. [
17] and King [
18] we considered mean differences above 5 points as relevant rater disagreement. Taking this into account discrepancies between proxy- and self-ratings were rather insiginficant for most scales. No uniform pattern was found with respect to systematic under/over-rating by proxies.
Another important issue is the extent of rater-agreement across the scale range, especially with regard to generalisability of our results to patients in a poor condition. Analysis of Bland and Altman plots indicate that agreement is worst for the central section of a scale. This finding is probably a result of the fact that possible differences between raters are necessarily minimised by the limited range scale.
Overall, proxy-ratings performed somewhat better for more overt aspects of QOL such as physical symptoms, whereas ratings on social and psychological aspects showed less congruency.
A limitation of our study is the small sample size which did not allow to detect small mean differences between patient and proxy ratings. For the same reason, it was not possible to perform subgroup analyses on certain patient groups. In addition, patients in a very bad physical condition, would have been of importance to our study, as proxy-ratings are most useful in that patient group. However, due to ethical considerations it was not possible to include such, since burden caused by filling in both questionnaires was considered not acceptable for these patients. Another limitation of our study is the high rate of significant others refusing participation in the study.
The results for accuracy (percentage of mean differences equal or below 5 points) may have been affected by the number of items in a scale, more precisely the number of possible scores on a scale. Two contrary effects can be expected from this. On the one hand a low number of possible scores increases agreement due to chance, on the other hand if the distance between two possible scores is higher than 10 points (e.g. for scales containing one or two items) only exact agreement is taken into account by this accuracy parameter.
The study most similar to ours [
6] found more pronounced mean differences for Physical Functioning, Role Functioning, Cognitive Functioning, Social Functioning and Fatigue (all between 5 and 10 points). With the exception of Physical Functioning, these scales showed also only a moderate proportion of exact agreement. A slight difference to our study was the use of a previous version of the QLQ-C30 in the study by Sneeuw et al. [
6] that employed a dichotomous response format for the scales Physical Functioning and Role Functioning.
Proxies' relationship with the patient, age, gender and culture showed no significant association with rater agreement. But agreement was worse in patients with mental confusion, cognitive impairments and motor deficits. We think that the finding that rater agreement is low in patients with severe cognitive impairments should not be considered per se as an indication for inaccurate proxy rating. It might also reflect patients' inability to report on their condition. On the other hand, it may as well be difficult for proxies to understand the individual consequences of cognitive decline. Additional clinical variables as more objective criteria may be helpful in evaluating rater disagreement in this patient group.
In a recent study by Brown et al. [
21] on rater agreement in patients with newly diagnosed high-grade gliomas proxy-ratings by a caregiver chosen by the patient himself also showed good congruence. As QOL-instrument this study employed the FACT-Br [
22]. Correlation between patient-ratings and caregiver-ratings was 0.63 at baseline and 0.64 at 2 and 4 months follow-up, percentage of agreement (+/- 10 points on a scale ranging from 0 to 100) was 63-68% at the three assessment time points.
With regard to type of proxy-rating, proxy-raters can not only differ in their relation to the patient (significant other, treating physician, caregiver etc.) but also in the perspective they take towards the patient. Gundy and Aaronson [
23] investigated whether or not there are differences in proxy-ratings if the proxy rates the patient taking the patient's perspective or if he makes his own assessment of the patient. No differences with regard to bias were found between both types of ratings, although it should be mentioned that the study might have been not sufficiently powered to detect possible differences between these types of ratings.
Taking our own findings and those from similar studies into account, the assessment of QOL in brain cancer patients through ratings from their significant others seems to be a feasible strategy to gain information about important aspects of a patient's QOL, if the patient is not able to provide information himself. However, in general rater agreement is lower for psychosocial issues compared to physical symptoms.
In a research context proxy ratings may allow to reduce bias from patients droping out of studies because of deteriorating health and in a clinical context proxy-ratings could contribute to medical decision making. Future research, should further evaluate the impact of patient and proxy characteristics on rater agreement and include further criteria for accuracy of proxy ratings.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
GJ, GM, EA and HB were responsible for study design, conceptualization and writing of the manuscript as well as for data collection. MA, HM and SG were the treating neurologists and therefore in charge of patient recruitment and gave important input for medical content. GJ and KG performed the statistical analysis. RG and SMG helped to draft the manuscript. All authors read and approved the final manuscript.