Background
Observation of a construct, such as a quality of life tool, first requires conceptualisation of the construct at a theoretical level followed by its operationalisation at an empirical level. Operationalisation involves selecting indicators to be measured in the observation of the construct. Both are vulnerable to variation in interpretation by different researchers and can result in a divergence in their measurement of the ‘same’ construct. A new theory recently proposed is that of a ‘Neutral Observer’, which provides a framework on which a determination of the neutrality, or accuracy, of an observation of a given construct can be based [
1]. Neutral Theory represents the ideal and assumes a Neutral or exhaustive list of relevant indicators in the construct, whereby the sensitivity and specificity are both 1 (i.e.
, 100% accurate). The operationalisation of constructs using disease-specific indicators can perhaps be considered closer to achieving neutrality than those based on generic observations.
Understanding the impact of treatment on patients’ quality of life is a pivotal component in the economic evaluation of health interventions. There is, however, no universally agreed definition of the construct of quality of life, with the one provided by the World Health Organization (WHO) perhaps the most commonly cited: “an individual’s perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards and concerns” [
2]. This broad definition includes the person’s physical health, psychological state, personal beliefs, social relationships and their relationship to salient features of their environment. The WHO definition, and other similar ones, were influential in the concept of health-related quality of life (HRQoL), which refers to how well a person functions in their life and his or her perceived well-being in physical, mental, and social domains of health [
3].
Two independently operationalised tools that are frequently used to objectively assess HRQoL are: the Medical Outcomes Study Short Form family of measures (e.g. SF-36 [
4,
5]) and the EuroQol five-dimensional (EQ-5D) [
6,
7]. Both of these tools capture HRQoL (or, strictly speaking, health status for the EQ-5D [
6,
8]) across a series of domains or dimensions: vitality, physical functioning, bodily pain, general health perceptions, physical functioning, emotional functioning, social functioning, and mental health in the SF-36 [
5]; and mobility, self-care, usual activities, pain/discomfort, and anxiety/depression in the EQ-5D [
7].
Generic HRQoL tools have been widely adopted in Health Technology Assessments (HTAs), with the National Institute for Health and Care Excellence (NICE) in the UK recommending use of the EQ-5D in its Technology Appraisals [
9]. Generic HRQoL tools, by their nature, depict aspects of well-being and quality of life from the patients’ point of view across all diseases and, therefore, have utility in population-level studies as well as informing comparisons between diseases. While this allows for potentially more consistent, transparent and predictable decision-making, it is open to criticism, as generic measures may be insensitive or fail to capture important aspects of health for a specific disease or condition [
10]. Disease- or condition-specific HRQoL tools have the advantage of being clinically relevant to the health problem and responsive to clinically important changes in state, such as the impact of treatment. Conversely, this specificity complicates comparisons with the general population and across treatments for different diseases, limiting their application in HTAs.
This study aimed to apply Neutral theory in assessing the neutrality, or accuracy, and applicability of generic tools (SF-36 and EQ-5D) at measuring HRQoL in diseases or conditions where there is a specific tool available, to act a surrogate for the Neutral list in the measurement of HRQoL.
Methods
Identification of disease- or condition-specific health-related quality of life tools
A literature search was performed to identify all published disease- and condition-specific HRQoL tools. Medline (PubMed) was searched through 01 July 2019 using the following terms: [“patient reported outcome” OR “PRO” OR “Quality of life” OR “QoL” AND “disease specific” OR “condition specific”]; limit: [English language]. Two reviewers undertook the search, with initial screening of abstracts and titles conducted using the semi-automated Rayyan tool (
https://rayyan.qcri.org/) [
11]. Full descriptions of the identified disease/condition-specific HRQoL tools were sourced as were the SF-36 and EQ-5D. In addition, all original studies where HRQoL was assessed using a disease/condition-specific HRQoL tool and the SF-36 and/or the EQ-5D were reviewed.
Inclusion of appropriate domains and items
The risk that the generic tools (SF-36 and EQ-5D) might include irrelevant domains or items or exclude relevant domains or items for a specific disease or condition was assessed. Firstly, for each condition- or disease-specific tool the number of items with and without a direct match to the SF-36 and EQ-5D was recorded (for the EQ-5D, it was permitted for each of the five questions to cover more than one item in each disease/condition-specific tool). The sensitivity and specificity of the generic tool versus the disease/condition-specific tool was then calculated as follows. True positives represented items captured in both the disease/condition specific and generic tool; false positives, those captured in the generic tool, but not in the disease/condition specific tool; and false negatives, those captured in the disease/condition specific tool, but not in the generic tool. Since it is not possible to know if the disease/condition specific tool fully captures all relevant items or domains, the true negative fraction was assumed to be 0.9 (i.e. an arbitrary 10% missing). Sensitivity/specificity results were further stratified into rare diseases (defined as affecting < 1 in 2000 population) [
12], non-rare diseases (≥1 in 2000 population), and symptom-specific tools (i.e. those that cover symptoms [e.g. urological symptoms; respiratory problems
etc] that might be present in multiple diseases/conditions).
The potential for misclassification of patients’ HRQoL by a generic tool was expressed as the median proportion of false positives and false negatives (with 95% prediction intervals), based on 1000 studies, with prevalence of poor HRQoL set at 20, 50, and 80%.
Concordance of quality of life scores
For each of the studies comparing a disease/condition-specific tool with the SF-36 and/or EQ-5D, a measure of concordance of the results was assigned. No (none) concordance was assigned if a significant impact on HRQoL was seen with the disease/condition specific tool, but no change or the opposite impact was seen with the generic tool (or vice versa); Moderate concordance if HRQoL impact was scored in the same direction with both tools, but was statistically significant with only one of them; and Strong concordance if the results were fully aligned (significant/non-significant impact in same direction). For studies that measured HRQoL changes over time, it was determined whether the concordance between the generic and disease/condition-specific tool varied at different time points. Results were split into rare diseases non-rare diseases, and symptom-specific tools.
All analyses were performed using R 3.6.0 (Revolutions Analytics) and Microsoft Excel 365 (Microsoft).
Discussion
An accurate measure of HRQoL is of fundamental importance when considering the clinical- and cost-effectiveness of a therapy or intervention during economic evaluations/HTAs to determine use within a healthcare system. Overestimating the impact on HRQoL could result in excessive healthcare expenditure for minimal health gain (money which could be better spent elsewhere). Conversely, underestimating the impact could cause unnecessary restrictions on use to the detriment of patients. Selection of the appropriate tool or tools to assess HRQoL is, therefore, essential. This study has found that by applying Neutral Theory, commonly used generic HRQoL tools, such as the SF-36 or EQ-5D, appear poorly aligned with disease- or condition-specific tools.
Neither the SF-36 nor the EQ-5D achieved a sensitivity and specificity for included items both > 50% against any of the 162/163 disease- or condition-specific tools included in this study. Even when using a high prevalence of poor HRQoL set at 80% (i.e. 4/5 patients with this disease/condition have a notably impacted HRQoL), less than one-third of tools had a FPR of < 50% against the generic tools (SF-36: 29% of tools; EQ-5D: 29%). The situation was worse for rare disease tools, where sensitivity ranged from 0 to 40% for the SF-36 and 0–22% for the EQ-5D. Predicated on these results, it is unsurprising, therefore, that there were low levels of concordance between HRQoL scores from the generic versus the disease/condition-specific tools (no concordance vs SF-36: 18–36% of studies; vs EQ-5D: 16–35%).
Salient limitations of this study included the necessity of having to assume a true negative fraction of 0.9, as it was not possible to know if the disease/condition-specific tool fully captures all relevant items or domains (i.e., is completely Neutral). The measure of concordance between results of the generic and disease/condition-specific tools was also necessarily crude to allow for cross comparison between multiple studies of numerous diseases/conditions. Importantly, however, a high number of studies (up to approximately one-third) reported zero concordance between generic and disease/condition tools. The use of the EQ-5D can also be considered a limitation in that this is not strictly a tool to measure HRQoL, but rather generic health status [
6,
8]. The EQ-5D is, however, widely used to assess HRQoL [
9,
10] and, therefore, was a valid choice for this study. It is also worthy of note that, surprisingly, a full description of 29% (65/228) of the identified tools could not be obtained, despite their publication in indexed journals. This is an unacceptably high rate; such descriptions should be a standard component of publication.
HRQoL tools generate scores on the basis of individual item measures – a construct. The concept of ‘True’ HRQoL at any given time is, therefore, important. The tools generate a value of observed HRQoL on a subject based on relevant items and lack of irrelevant items. The principle underpinning the development and use of disease- and condition-specific tools is that they are inherently more accurate than generic tools at measuring HRQoL for patients with that particular disease or condition. Thereby, closer to neutrality. However, tools have been developed for the same disease/condition that do not include all the same items and domains [
13,
14]. This raises the question of what is the correct construct to ensure an accurate assessment of HRQoL for that disease/condition. Several approaches have been taken to improve the accurate assessment of HRQoL, including: the parallel use of generic and disease/condition tools [
15,
16]; using mapping algorithms from disease/condition-specific tools to generic tools [
10]; tailoring standard items to specific diseases/conditions [
17]; and use of bolt-on items to generic questionnaires [
10]. Despite these approaches, the pertinent question remains – what is acceptable accuracy for a HRQoL tool? Given the importance of having an accurate measure of the impact of a therapy or intervention on HRQoL, may be it is time for there to be rethink on how HRQoL is assessed and measured. Moreover, to consider how improvements can be made to the current widespread use of generic tools.
Conclusions
A new theory recently proposed is that of a ‘Neutral Observer’, which provides a set of principles on which a determination of the accuracy of an observation of a given construct can be based [
1]. It is theorised that the “true” value of a construct can be measured by an abstract or Neutral observer who has access to a complete list of indicators that are all relevant to the empirical measurement of a construct. This Neutral Observation thereby serves as the reference against which observations using the construct can be assessed for accuracy [
1]. Adoption of such an approach in the development and assessment of HRQoL tools could improve their relevance, accuracy, and utility in economic evaluations of health interventions.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (
http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.