Background
Low back pain (LBP) is a common condition that can cause severe activity impairment and physical limitations [
1]. Among employees of China, the prevalence of LBP is around 42.7–72.0%, which makes LBP the most common cause of physical disability [
2,
3]. As an incapacitating disease, LBP is related to significant reduction in health-related quality of life (HRQoL) [
4]. Hence, a valid and reliable HRQoL measure is needed to evaluate interventions or programs for LBP, and inform resource allocation decisions.
In general, HRQoL can be assessed using either disease-specific or generic instruments. The generic instruments can be in turn subdivided into: preference-based and non-preference based. The main benefit of generic preference-based measures is their broad range of health dimensions, which makes the comparisons of various disease, interventions and health programs possible [
5]. Besides, generic preference-based measures provide a general estimate of health outcomes and can capture survival data in the form of quality-adjusted life years (QALYs), which is largely used as clinical effectiveness indicator [
6].
The EuroQol 5-dimension (EQ-5D) is the most frequently used preference-based instrument around world [
7]. Due to the high ceiling effects of the three level of EQ-5D (EQ-5D-3 L), a new version of the EQ-5D (known as EQ-5D-5 L) was developed [
8]. With increasing availability of national value sets, crosswalk algorithms for converting 3 L scores to 5 L scores and more evidence about better psychometric properties of EQ-5D-5 L, we could observed increased uptake of the EQ-5D-5 L. Since Luo and colleagues [
9] developed the scoring algorithm for the EQ-5D-5 L based on Chinese preference, the EQ-5D-5 L is becoming popular in clinical studies in China. The Short Form 6-dimension (SF-6D) is a utility measure from the 36-item Short Form Health Survey (SF-36) [
10], which has been considered as one of the most widely used generic measures of HRQoL in clinical trials. A number of studies have explored the performance of EQ-5D and SF-6D in various patient sets, and the results showed that comparative validity and responsiveness differed depending on the target population [
11‐
14].
The comparative performance of the EQ-5D and SF-6D has been investigated in patients with LBP [
15,
16], and it was found that EQ-5D and SF-6D were not interchangeable with the SF-6D largely outperforming the former in terms of measurement characteristics. However, both studies applied the 3-level version of the EQ-5D (EQ-5D-3 L), which was found to possess poor discriminative ability [
17] and ceiling effects [
18]. Several studies found better psychometric properties for the EQ-5D-5 L compared with EQ-5D-3 L [
19‐
22]. Therefore, it seems vital to compare the EQ-5D-5 L with SF-6D in LBP patients. Hence, this study attempts to evaluate agreement, convergent validity as well as known-groups of EQ-5D-5 L and SF-6D in patients with LBP.
Methods
Study design and patient recruitment
After being approved by ethics committee, consecutive patients of this cross-sectional study were recruited at the General Hospital of Shenyang Military Area Command in Shenyang city of China from June 2017 to October 2017. The inclusion and exclusion criteria were as follows.
Inclusion criteria: Patients with LBP aged more than 18, with or without the lower limb pain, not experiencing any other coexisting treatments for pain except routine painkilling, understanding and speaking Mandarin; Exclusion criteria: patients with coexisting infection, malignancy, severe spinal cord disease or inflammatory joint disease; patients with myocardial infarction, cerebrovascular events, chronic lung disease, kidney disease or severe mental illness; pregnant women.
Confidence intervals were used to estimate the sample mean using following equation [
23]:
$$ \mathrm{n}=\frac{\sigma^2}{{\left[\frac{\omega }{1.96}\right]}^2} $$
ω is the margin of error, σ is the outcome variable standard deviation (assumed to be the same under the null and alternative hypotheses). We wish ω to be 0.03 for all measures, σ = 0.238 for EQ-5D-5 L [
24], σ = 0.152 for SF-6D [
15], σ = 0.2026 for ODI [
15], which gives an estimated sample size for the survey of
n = 242, 98 and 176 for EQ-5D-5 L, SF-6D and ODI respectively. Assuming an 80% response rate to the survey, we would like to interview 300 LBP patients.
The diagnosis of LBP was based on the imaging information, physical examination as well as patients’ complaints of LBP. As all the questionnaires used in this survey were verified, no pilot or pre-testing survey was performed. After submitting formal consent, every patient was questioned by the same interviewer. The interviewer was trained to conduct the survey in the same manner. At outpatient clinics, individuals were interviewed in the waiting room after consultation; at inpatient clinics, the survey was implemented in the sickroom before operation. The questions of the survey were organized in the following order: socio-demographic queries, Oswestry disability questionnaire, questions regarding the EQ-5D-5 L and SF-36. The interviewer, procedure, and questionnaire were the same for all patients.
Instruments and measures
EQ-5D-5 l
The EQ-5D-5 L contains two parts that assesses health status of respondents on the day of interview [
8]. The first part is a descriptive system with five items (mobility, self-care, pain/discomfort, usual activities, and anxiety/depression), every item has five different levels of severity. Theoretically, the EQ-5D-5 L can define 3125 different health states. In accordance with the Chinese scoring algorithm [
9], the EQ-5D-5 L gives a score from − 0.39 to 1 where 1 is the best possible health state. The other part of EQ-5D-5 L is a visual analogue scale (EQ-VAS), asking interviewees to mark their present health status on a 20 cm vertical scale from 0 to 100. The simplified Chinese version of EQ-5D-5 L in our research is approved by the EuroQol Group.
SF-36 based SF-6D
The SF-6D is an utility measure which was derived from the SF-36 [
10]. Health status here is defined in terms of 6 dimensions (physical functioning, role limitation, social functioning, pain, energy and mental health), with each dimension having four to six levels. There are potentially 18,000 different health states. A value set for general population in Hong Kong [
25] was used to estimate utility index for the SF-6D in this study. Utility score of SF-6D can range from 0.315 to 1.00. As recommended by previously published research [
26], SF-36v2 was used as questionnaire when the survey was conducted instead of applying SF-6D as an independent instrument. The official version of SF-36 in simplified Chinese was authorized by QualityMetric [
27].
Oswestry disability index
The Oswestry Disability Index (ODI) [
28,
29] is an instrument measuring degree of disability in people with LBP. This questionnaire contains 10 items, including intensity of pain, personal care, lifting, walking, sitting, standing, sleeping, sex life, social life, and traveling. Each item is followed by 6 different levels, with scores from 0 (the least disability) to 5 (the most severe disability). The sum of all item scores is needed to transform into a 0 to 100% index. Patients with scores between 0 and 20% have minimal disability, 21 to 40% moderate disability, 41 to 60% severe disability, 61 to 80% unable to walk which was always defined as crippled, and 81 to 100% [
30] bedbound or overstating their symptoms. Previous studies found the item about “sex life” culturally inappropriate for Chinese citizens [
31]. Hence, we applied only 9 items in the ODI. The Chinese version of the ODI was an official version from Mapi Research Trust.
Statistical analysis
Patient characteristics and descriptive statistics
Only patients who completed all questionnaires were included in this analysis, we did not perform further imputation for missing scores. Continuous variables were reported as means and standard deviations (SD), frequencies and proportions were used for categorical variables. Descriptive statistics (mean, SD, median, inter-quartile range, minimum and maximum) for the ODI, EQ-5D-5 L and SF-6D were computed. Floor and ceiling effects for EQ-5D-5 L and SF-6D were evaluated by calculating the proportion of sample in the worst and best possible health states. Statistical analysis was conducted using IBM SPSS version 23.0 [
32].
Agreement between the EQ-5D-5 L and SF-6D
When we repeat measurements by each of two methods on the same subjects, agreement analysis is essential to see whether they agree sufficiently for one method to replace the other one [
33]. Both EQ-5D-5 L and SF-6D are measures for health utility, even though the EQ-5D-5 L has a possible range of − 0.39 to 1.00, while the SF-6D has a range of 0.315 to 1.00. Hence, it is necessary for us to know to what degree these two utility measures agree and if it is possible to use these two measures interchangeably in the context of LBP patients in China. Agreement was assessed by intra-class correlation coefficients (ICCs) and Bland-Altman plots. The ICCs were calculated with two-way random effects model using average measures and absolute agreement. The ICCs can range between 0 and 1. An ICC < 0.4 suggests poor agreement, 0.4–0.59 fair, 0.6–0.74 good, and 0.75–1 excellent agreement [
34]. Bland-Altman plots were also performed to explore the agreement between these two measures. In this method, the differences between the scores of the two instruments were plotted against the average utility scores [
35].
Convergent validity
Following previous research [
12,
36‐
38], the size of the correlations was compared for the EQ-5D-5 L and SF-6D scores with the ODI, the EQ-VAS, SF-36 physical (PCS) and mental component summary (MCS). The association was evaluated by Spearman’s rank correlation coefficient, considering 0.9–1.0 as very highly correlated, 0.7–0.9 as highly correlated, 0.5–0.7 as moderately correlated, and 0.3–0.5 as low correlated [
39].
Discussion
The purpose of this research was to compare the performance of EQ-5D-5 L and SF-6D including agreement, convergent validity and known-groups validity in patients with LBP. It was turned out that the agreement between EQ-5D-5 L and SF-6D was good. In terms of convergent validity, most priori assumptions were more associated with EQ-5D-5 L than SF-6D, but MCS derived from SF-36 was more correlated with SF-6D. As for known-groups validity, EQ-5D-5 L demonstrated better performance for most groups except location and the general assessment of health item from SF-36. Besides, EQ-5D-5 L had higher ES, RE and AUC scores when we applied ODI as external indicator of health status, which indicated that EQ-5D-5 L was more efficient at detecting clinical differences.
We found that the distributions of ODI and EQ-5D-5 L skewed towards full health. However, the distribution of SF-6D was more symmetric around its mean, reflecting previous findings [
36,
51]. The distributions of these measures implied that EQ-5D-5 L might be more related to the ODI. Previously published papers declared that EQ-5D-5 L suffered high ceiling effect, which was not observed in this research [
51]. One possible reason is that patients recruited in this research were from a tertiary hospital which patients visit only when they cannot endure their symptoms. In addition, unlike other diseases, LBP may drastically deteriorate quality of life [
52].
The ICC of EQ-5D-5 L and SF-6D was 0.661, representing good agreement between these two measurements. This is higher than that in other similar studies in China, which is 0.448 for stable angina patients [
53], 0.444 for chronic prostatitis patients [
54]. Except for the fact that the study was conducted on a different disease, another possible reason for the discrepancy is that EQ-5D-3 L rather than EQ-5D-5 L were applied in these two studies. A smaller range of EQ-5D-5 L utility scores (− 0.39 to 1) was used compared with that of EQ-5D-3 L (− 0.59 to 1), which might account for the better agreement between the EQ-5D-5 L and SF-6D in this research. In consensus with previous studies in low back pain [
15,
55,
56], for poorer health status, SF-6D yielded higher score, whereas EQ-5D-5 L inclined to produce higher scores for better health status. This is means that these two measures cannot be used interchangeably.
Our convergent validity analysis showed that the ODI was interrelated strongly with the EQ-5D-5 L while moderately with SF-6D. One may find this is in agreement with the previously published research [
16]. The EQ-5D-5 L was more correlated with the EQ-VAS than SF-6D. A possible explanation could be that self-rated health on a VAS is a fragment of the EQ-5D-5 L, both measure the health state on the day of interview. However, a four-week recall period is used for SF-6D, which is derived from the SF-36. The fact that the SF-6D was derived from the SF-36 might show positive impact on the correlations among SF-6D and the PCS and MCS. However, the EQ-5D-5 L was more related with PCS, the SF-6D was more correlated with MCS. This is in line with previous studies from Richardson et al. [
12] and Sakthong et al. [
36]. Due to the fact that four of five items of the EQ-5D-5 L covers physical health, while the SF-6D entails a relatively equivalent number of physical-related items and mental-related items, one may find that the EQ-5D-5 L performs better for individuals with more physical-related health problems than those with mental-related problems [
12,
36]. Given the concern that the items of ODI are more physical-related than psychological-related, this might explain that ODI correlated strongly with the EQ-5D-5 L while only moderately with SF-6D.
Both measures can discriminate patients in most known groups. EQ-5D-5 L provided higher ES and RE values for all known groups apart from location and general health grouped by the general assessment of health item from SF-36. It turned out that the outcomes of validity analysis here were in agreement with previously published studies [
14,
36]. EQ-5D-5 L was 42% more efficient than SF-6D at detecting clinically relevant differences measured by ODI. Furthermore, the AUC score of EQ-5D-5 L was higher, even though there was some overlapping of 95% confidence interval between these two measures. Our study do not support the findings of Johnsen et al. [
16], which concluded that SF-6D had the better ability of detecting clinical change of LBP patients than EQ-5D-3 L. Quite a few studies in various patient populations have found that EQ-5D-5 L is more discriminative than the EQ-5D-3 L [
17‐
22,
57,
58]. Therefore, in all likelihood the increased discriminative power from the 2 additional categories is the reason for the disagreement between our research and previous study [
16].
It was hypothesized that patients with higher income should have higher utility scores. Nevertheless, the estimates of utility scores of different income groups showed different results. Specifically, those who earned more than 5000 yuan had lower utility scores than those who had income between 3501 to 5000 yuan. We further analyzed the ODI for different income groups, which indicated similar tendency of health utility score. The survey was conducted at both outpatient clinics and inpatients clinics. With higher possibility to afford the operation, more severe LBP patients with high income were recruited for this research, which may explain above-mentioned issue.
The overall dissimilarity among different measures is the product of the differences in description, valuation and changes in population views of health. Since there is no comparability between the health utility scores measured by different methods [
59], a rather consistent measure should be suggested. For example, EQ-5D is the only health utility measure that the National Institute of Health and Care Excellence (NICE) in England recommends. In many respects, EQ-5D-5 L is superior to the 3 L version including distributional evenness, efficiency of scale use and the face validity of the resulting distributions [
60]. In many cases, EQ-5D-5 L performed better than the SF-6D. If SF-6D was used in relative clinical trial, mapping algorithm might be needed.
Obviously, there are a number of limitations to this study. Firstly, responsiveness and reliability of EQ-5D-5 L and SF-6D, which are also essential factors to choose a proper measure, were not evaluated in this study. Secondly, considering the rank of this survey is characteristic, ODI, EQ-5D-5 L and SF-36, questions in ODI may have context effect on EQ-5D-5 L, moreover, questions in ODI and EQ-5D-5 L may have context effect on SF-36. The term “context effect” refers to a process in which prior questions affect responses to later questions in surveys [
61]. Thirdly, since there was no Chinese value set for SF-6D, Hong Kong value set was applied, which might influence our findings. Fourth, interview administration rather than self-completed mode of administration was applied in this research, which might influence the generalizability of the outcomes. Finally, the sample were recruited from the orthopedics outpatient and inpatient clinic from one tertiary hospital in China, hence, the conclusion here might be less representative for LBP patients from other locations as well as non-Chinese population.