Data
A total of 300 COPD patients from three tertiary teaching hospital were surveyed from July–December, 2014. All three hospitals are located in Seoul, the metropolitan city, and therefore most of the patients are urban residents.
The survey was approved by the Institutional Review Board (IRB) of each institute. Patients were eligible for the study if they fulfilled the following criteria: 1) were over the age of 40, 2) had been diagnosed with COPD before 2013, 3) had continuously received outpatient care since January 2013, 4) had an FEV1 (forced expiratory volume in 1 s)/FVC (forced vital capacity) ratio of less than 0.7 immediately after using bronchodilators, and 5) had more than 10 pack-years of smoking history. Patients were excluded if they suffered an acute exacerbation in the six weeks prior to the survey or a cardiovascular event (myocardial infarction or arrhythmias) in the three months prior to the survey.
The survey was conducted by nurses from the respective hospitals using the Korean version of the CAT and EQ-5D-3 L questionnaire. The nurses explained the questionnaires and conducted in-person interviews with the patients. After the interview, the patients’ characteristics (sex, age, duration of COPD, lung function measurement, complication, prescription drugs, resource usage, etc.) were recorded by reviewing their medical records. One of the 300 patients withdrew their consent, so the results from 299 patients were included in this study. For the CAT instrument used in this survey, a Korean version had been evaluated for validity by Lee et al. [
13], Hwang et al. [
14]. Both evaluations concluded that the Korean version of CAT had good internal consistency and could be used to assess the impacts of COPD on patient health.
Table
1 shows the descriptive statistics of patient characteristics that contains sex, age, BMI, smoking history and FEV1% predicted. The mean age of the patients was 69.2 years, and greater than 74% of the study population consisted of patients over 65 years. The proportion of males was 86.3%. The mean BMI was 22.85, and 8.7% of the patients reported a BMI of less than 18.5 (underweight). Mean smoking history was 36.9 pack-years, and 85% of the patients were current or ex-smokers. The mean duration of COPD was 5.4 years. The FEV1/FVC ratio represents the proportion of a person’s vital capacity that they are able to expire in the first second of forced expiration (FEV1) to the full, forced vital capacity (FVC). The result of this ratio is expressed as FEV1%. FEV1% predicted is defined as FEV1% of the patient divided by the average FEV1% in the population for any person of similar age, sex, and body composition. Less FEV1% predicted value means more severe condition. Severity is divided into four groups based on the FEV1% predicted value. The mean predicted FEV1% was 55.8%, and approximately 90% of patients had moderate or severe COPD (Table
1).
Table 1
Patient Characteristics
Total patients, n | 299 | |
Male, n (%) | 258 | (86.3) |
Female, n (%) | 41 | (13.7) |
Age, mean (sd) | 69.16 | (8.8) |
40–54, n (%) | 17 | (5.7) |
55–64, n (%) | 62 | (20.7) |
65–74, n (%) | 132 | (44.1) |
75-, n (%) | 88 | (29.4) |
Body mass index, mean (sd) | 22.85 | (3.3) |
18.5- (underweight), n (%) | 26 | (8.7) |
18.5–25 (normal), n (%) | 198 | (66.2) |
25–30 (overweight), n (%) | 67 | (22.4) |
30+ (obese), n (%) | 8 | (2.7) |
Smoking history (pack-years), mean (sd) | 36.93 | (31.3) |
current or ex-smoker, n (%) | 254 | (84.9) |
FEV1 (% predicted), mean (sd) | 55.85 | (17.94) |
80+ (mild), n (%) | 32 | (10.7) |
50–80 (moderate), n (%) | 156 | (52.4) |
30–50 (severe), n (%) | 90 | (30.2) |
30- (very severe), n (%) | 20 | (6.71) |
The results of the EQ-5D-3 L and CAT questionnaire survey of 299 people are presented in Table
2. The most frequent response for every EQ-5D-3 L item was 1 (42.1–72.6%), and very few respondents (0.7–2.3%) selected option 3. Eighty-two respondents (27.4%) chose option 1 for all five items. Subsequently, the EQ-5D-3 L utilities were calculated using the method for Korean population developed by Lee et al. [
7].
Table 2
Summary statistics of EQ-5D utilities and CAT scores
EQ-5D utility | 0.83 | (0.15) |
Total CAT score | 16.38 | (8.96) |
Q1: cough | 1.68 | (1.38) |
Q2: phlegm | 2.13 | (1.42) |
Q3: chest tightness | 1.59 | (1.43) |
Q4: breathlessness | 3.23 | (1.46) |
Q5: home activities | 1.81 | (1.62) |
Q6: leaving home | 1.82 | (1.68) |
Q7: sleep | 1.74 | (1.58) |
Q8: energy | 2.38 | (1.38) |
EQ-5D-3 L utility = 1–0.050 - 0.096 M2–0.418 M3–0.046SC2–0.136SC3–0.051UA2-0.208UA3–0.037PD2–0.151PD3–0.043 AD2–0.043 AD3–0.050 N3 (M2, mobility level 2; M3, mobility level 3; SC2, self-care level 2; SC3, self-care level 3; UA2, usual activities level 2; UA3, usual activities level 3; PD2, pain or discomfort level 2; PD3, pain or discomfort level 3; AD2, anxiety or depression level 2; AD3, anxiety or depression level 3; N3, any dimension on level 3)
The mean of utility scores was 0.83 (SD = 0.15). The mean of total CAT scores was 16.38, with scores ranging from 0 to 38. Among the eight items, the fourth item (breathlessness) was calculated as the highest (severe) average score at 3.23, and the lowest average score was reported for the first item (cough) at 1.68. There was no missing value in the variables used for this study.
Model development
The mapping models were developed using the EQ-5D-3 L utilities as the dependent variables and either the total CAT score or eight scores of each CAT item as the explanatory variables in the following formulas:
$$ \bullet \mathrm{Model}\kern0.28em 1:\mathrm{EQ}-5\mathrm{D}-3\mathrm{L}\kern0.28em \mathrm{Utility}=\mathrm{a}+\mathrm{b}1\ast \mathrm{total}\left(\mathrm{CATscore}\right)+\mathrm{c}1\ast \mathrm{age}+\mathrm{c}2\ast \mathrm{sex} $$
$$ \bullet \mathrm{Model}\kern0.28em 2:\mathrm{EQ}-5\mathrm{D}-3\mathrm{L}\kern0.28em \mathrm{Utility}=\mathrm{a}+\mathrm{b}1\ast \mathrm{total}+\mathrm{b}2\ast {\mathrm{total}}^2+\mathrm{c}1\ast \mathrm{age}+\mathrm{c}2\ast \mathrm{sex} $$
$$ \bullet \mathrm{Model}\kern0.28em 3:\mathrm{EQ}-5\mathrm{D}-3\mathrm{L}\kern0.28em \mathrm{Utility}=\mathrm{a}+\mathrm{b}1\ast \mathrm{Q}1+\mathrm{b}2\ast \mathrm{Q}2+\cdots +\mathrm{b}8\ast \mathrm{Q}8+\mathrm{c}1\ast \mathrm{age}+\mathrm{c}2\ast \mathrm{sex} $$
Models 1 and 2 used the total CAT score, age, and sex as explanatory variables. In contrast, Model 3 used eight scores of each CAT item instead of the total CAT score. Backward stepwise selection of explanatory variables was used with significance defined as α = 0.05. The following estimation methods were used: ordinary least squares (OLS), generalized linear models (GLM), Tobit models, and beta regression. Because EQ-5D-3 L utilities have skewed and censored values, we used and compared GLM, Tobit and beta regression as well as OLS. The probability distributions and link function of GLM that we investigated are Gaussian-log, Poisson-log, gamma-inverse, quasi-identity.
A two-part model was also considered as an alternative estimation method to analyze skewed data [
8]. In this study, a large proportion (27%) of observed EQ-5D-3 L utilities had a value of 1, which indicated perfect health status. The first part of the two-part model, logistic regression, would determine the probability of having a perfect health status. The second part would use previous OLS, GLM, Tobit and beta regression estimations to predict EQ-5D-3 L utilities. The EQ-5D-3 L utility score was calculated using the following equation:
$$ \bullet \mathrm{Two}-\mathrm{partmodel}:\mathrm{EQ}-5\mathrm{D}-3\mathrm{L}\kern0.28em \mathrm{Utility}=\mathrm{P}\left(\mathrm{perfecthealth}\right)+\left[1-\mathrm{P}\left(\mathrm{perfecthealth}\right)\right]\ast \mathrm{Predicted}-\mathrm{EQ}-5\mathrm{D}-\mathrm{Utility} $$
P(perfect health) is the probability of perfect health obtained from logistic regression. The Predicted-EQ-5D-3 L-Utility value is derived from previous OLS, GLM, Tobit and beta regression estimations.
All statistical analyses were conducted using R (ver 3.3.3; R Foundation for Statistical Computing).
The R code used for this analysis is provided as Additional file
2.
Model validation
The datasets of 299 COPD patients were randomly split into a training set of 150 patients (50%) and a validation set of 149 patients (50%). The training dataset was used to develop the models. The validation set was used to validate the models through calculations and comparisons of root mean square errors (RMSE) and mean absolute errors (MAE).
The bootstrap method was used to generate a robust estimate of RMSE and MAE in the limited sample size. The previously described framework, which consists of random splitting, training and validation, was iterated 10,000 times to collect 10,000 RMSEs and MAEs. The means of the collected RMSEs and MAEs were used as criteria for model selection. As such, the model with the lowest RMSE or MAE was selected as the most suitable method.