Background
Idiopathic pulmonary fibrosis (IPF) is the most burdensome interstitial lung disease (ILD). It is a chronic, fibrotic lung disease characterised by progressive decline in lung function and increasing dyspnoea [
1]. Cough, fatigue, loss of emotional well-being and social isolation are other consequences of the disease [
2]. Along with a wide range of comorbidities, patients with IPF often experience impaired health-related quality of life (HRQL) [
2,
3]. As the disease progresses, the symptom burden increases resulting in decreasing HRQL, and in the terminal phase of the disease, HRQL plummets considerably [
4]. Antifibrotic treatments successfully slow down lung function decline, but do not improve HRQL convincingly [
5,
6].
Patient-reported outcome measures (PROMs) are used to quantify HRQL. Like any other measurement instrument, PROMs must be tested to ensure sufficient validity and reliability. It is essential to evaluate an instrument’s ability to respond to change in health status (responsiveness), before it can be used as an endpoint in longitudinal studies. Another fundamental aspect is the minimal clinically important difference (MCID) denoting the smallest change in score of the instrument perceived as clinically relevant. Therefore, longitudinal studies are needed to ensure valid and responsive instruments in the target population studied.
A modified version of Saint George’s Respiratory Questionnaire (SGRQ) was developed for patients with IPF (SGRQ-I) [
7,
8]. To our knowledge, no studies have examined responsiveness or MCID of SGRQ-I, and assessment of the longitudinal validity of SGRQ-I is important before implementing the instrument in clinical trials or daily practice.
King’s Brief Interstitial Lung Disease questionnaire (K-BILD) was developed as a HRQL instrument for patients with ILD [
9] and has recently been validated in IPF [
10]. Even though K-BILD is used in clinical trials [
11,
12], responsiveness and MCID have not yet been sufficiently determined [
13,
14]. Validation of an instrument is an iterative process increasing robustness by evaluation in different cohorts.
Although fibrotic changes in IPF are irreversible often resulting in decreased HRQL and pulmonary function, improvement in HRQL is seen in some patients [
14‐
16]. MCID should thus be examined separately, as estimates may differ [
17].
The aim of this study was to assess the responsiveness of SGRQ-I and K-BILD and determine MCID separately for deterioration and improvement in both instruments in a large prospective cohort of patients with IPF in a real-world setting.
Discussion
This is the first study prospectively examining responsiveness and MCID in a large sample of patients with IPF in a real-world, multicentre setting. Responsiveness was investigated using different approaches and MCID was determined separately for improvement and deterioration. Sensitivity analyses of patients with different baseline HRQL were performed. Results indicated that SGRQ-I and K-BILD are responsive to change according to all HRQL anchors and most physiological anchors. Estimates of MCID total scores differed by 1–2 points between improvement and deterioration. Results were comparable in the sensitivity analyses. An association between SGRQ-I and mortality was observed and a trend was found between K-BILD and mortality.
The ability of an instrument to respond to changes is essential to longitudinal validity; otherwise assessment of MCID is irrelevant. Hence, responsiveness should be assessed by different methods. In this study, we used both correlation analyses and compared groups with different disease stages by linear regression. Both methods indicated that SGRQ-I and K-BILD responded to changes in all HRQL and most physiological anchors; scores of the two instruments changed in concordance with changes in the anchors. The weaker correlations to physiological than to HRQL anchors were expected, as cross-sectional studies of K-BILD and SGRQ-I have shown similar results [
7‐
10]. As a result of measurement error on two measures, correlations between changes in scores are expected to be smaller. This may explain the generally weaker correlations in this longitudinal study compared to the cross-sectional studies on SGRQ-I and K-BILD [
7‐
10]. DLCO showed the weakest associations to SGRQ-I and K-BILD in both analyses. Correspondingly, significant changes in another study using DLCO as an anchor were not achieved [
16]. One explanation might be the considerable inherent variability in measurements of DLCO and thus less significant results [
28]. All things considered, the analyses supported the evidence pointing towards SGRQ-I and K-BILD as responsive instruments.
Responsiveness of SGRQ-I has not been assessed previously, but it has been evaluated for SGRQ [
22,
32]. Responsiveness of K-BILD was only briefly described by Sinha et al. [
13], whereas Nolan et al. limited their investigations to correlation analyses [
14]. Interestingly, correlations were stronger to SOBQ (SGRQ-I: 0.39 to 0.60; K-BILD: − 0.47 to − 0.26;) than to Medical Research Council dyspnoea score (SGRQ: 0.25 to 0.39; K-BILD: − 0.29 to − 0.23) and Transition Dyspnoea Index (SGRQ: − 0.47 to − 0.28) [
14,
22,
32]. Correlations between K-BILD and SGRQ were also stronger (− 0.63 to − 0.42) than to the Chronic Respiratory Questionnaire (0.27 to 0.54) [
14]. These divergencies may be explained by differences between the psychometric properties of the instruments and it is thus important to choose anchors with established responsiveness [
30]. Both SOBQ and SGRQ have been longitudinally validated for use in IPF [
15,
16]. Chronic Respiratory Questionnaire has only been validated for longitudinal use in chronic obstructive pulmonary disease [
33], and neither Medical Research Council dyspnoea score nor Transition Dyspnoea Index have, to our knowledge, been validated for longitudinal use in IPF. The association between low HRQL and increased mortality has been investigated in other studies using for instance SGRQ and SOBQ. In concordance with the IPF-specific SGRQ-I, both baseline SGRQ score and changes in SGRQ and SOBQ scores were found to be prognostic factors in patients with IPF [
4,
34]. The small number of mortalities and less advanced disease in our cohort might explain why only a trend was observed between K-BILD and mortality.
Currently, there is no consensus on the best method to estimate MCID. Anchor-based and distribution-based methods are used, and both have strengths and limitations [
35]. Anchor-based methods use an external measurement as an anchor with a well-determined threshold for improvement or deterioration. The advantage is that the definition of a ‘minimal clinically important’ difference is well described and included in the method. On the other hand, the variability of the measurements is not taken into account. Distribution-based methods incorporate the variability by comparing change in the PROM to a measure of variation, hence obtaining a more standardised result. The disadvantage is that there is no good definition of a ‘clinically important’ change. These measures may also be different when comparing a homogeneous and a heterogeneous group. A combination of the methods to estimate MCID was proposed [
17,
30] and therefore, we used the ROC curve approach proposed by de Vet et al., as this method combines anchor- and distribution-based methods [
31].
This study is the first to assess MCID for SGRQ-I. MCID for SGRQ was determined in two studies based on patients with IPF from clinical trials [
15,
22]. MCID for K-BILD has been determined in two other studies. Sinha et al. estimated a combined MCID in a mixed group of ILDs without specific analyses for IPF, probably due to a small sample size [
13]. Nolan et al. determined MCID after an intervention of pulmonary rehabilitation [
14]. MCIDs were marginally higher in the mentioned studies compared to our results: SGRQ vs. SGRQ-I Total (4.0–6.6 vs. 3.9–4.9) [
15,
22] and K-BILD Total (3.9–5.0 vs. 2.7–4.7, [
13,
14]. There may be numerous reasons explaining the differences between the studies. The larger sample size in our study increases the statistical power to determine a more exact estimate of MCID (our study
n = 150, Nolan et al.
n = 105, Sinha et al.
n = 57 (17 IPF)). Differences in the composition of the cohorts with regard to age, gender and disease severity may all affect results. Additionally, different time frames, pulmonary rehabilitation vs. no intervention and variation in methods and anchors were used. The other studies used distribution-based approaches which may explain the larger estimates [
36]. Also, the generalisability of the results of the other studies is limited due to the mixed group of ILDs and selection of patients for pulmonary rehabilitation or clinical trials.
Most studies only determined a single MCID for both improvement and deterioration. Even though HRQL generally deteriorated, up to one third of the patients experienced improvement in the anchors, and MCID for this group of patients should thus be analysed separately. As our study shows, MCID is different concerning improvement and deterioration, respectively; the largest difference was observed in SGRQ-I Total and Symptoms along with K-BILD Total and Psychological domains. Evidence of different MCIDs for improvement and deterioration has also been reported in other diseases [
15,
36]. Hence, changes in SGRQ-I and K-BILD scores should be interpreted separately depending on the direction of change. Generally, the sensitivity analyses showed comparable results. The largest deviations were observed in MCID for improvement in SGRQ-I among patients with the best HRQL (5.8) and improvement in K-BILD among patients with the worst HRQL (2.0). This is consistent with a clinically important change having to be large in patients with good HRQL and smaller in patients with worse HRQL. Antifibrotic treatment at baseline hardly changed MCID estimates for K-BILD, whereas MCID estimates for SGRQ-I were slightly larger in this subgroup. However, a large proportion of the patients not receiving antifibrotic treatment at baseline (51%) initiated antifibrotic treatment during the 12-month follow-up, and these patients were included in the initial analyses.
Improvement in physiological and HRQL anchors observed in our study has also been reported in other IPF studies. In the INPULSIS trial, 19% of patients improved in FVC and up to 36% improved in HRQL anchors [
15]. Comparable results have been reported in other studies [
13,
16]. Improvement in HRQL may be due to better coping strategies for living with a severe disease, rehabilitation and oxygen treatment. The confidence intervals of change in SGRQ-I from baseline to 12 months are wider than the confidence intervals of K-BILD. An explanation might be the different response options in the two instruments; K-BILD uses the same Likert scale for all items, whereas SGRQ-I (and SGRQ) has a variety of response options throughout the instrument. This could lead to a larger variation in SGRQ-I scores as the instrument is less intuitive to complete. During the study, more patients needed guidance on how to complete SGRQ-I than K-BILD. As K-BILD is shorter, easier to complete and has comparable validity and reliability, we would recommend using K-BILD instead of SGRQ-I for future studies and in clinical practice.
This study had several strengths. First of all, a cohort of both incident and prevalent patients with IPF were recruited from multiple centres with very limited exclusion criteria, constituting a broad sample of the background IPF population, which enhances the external validity of our results. Secondly, MCID was determined separately for improvement and deterioration. The results revealed different estimates, displaying the importance of performing independent analyses for each direction of change. Furthermore, sensitivity analyses were performed to assess the robustness of the results. A limitation to our study was the recall bias associated with GRCS. It can be difficult to recall your health status 12 months back and compare it to your current health status. Still, GRCS can be easily interpreted, tailored to reflect specific domains of a PROM and have been reported to be reliable, valid and sensitive to change [
23]. In addition, GRCS provide a simple assessment of the patients’ perception of their current HRQL.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.