Background
Heart failure (HF) is a clinical syndrome caused by a structural and/or functional cardiac abnormality that is characterized by signs and symptoms such as breathlessness, ankle swelling, and fatigue [
1]. HF is a public health concern worldwide with a prevalence of 1–4% in most European countries [
2], with the prevalence and incidence increasing progressively with age. In the US, the incidence of HF is reported to be 10 per 1000 after 65 years of age [
3]. In Japan, the number of patients with HF is expected to increase rapidly with the aging of the population, and the number of patients with left ventricular dysfunction is estimated to reach 1.3 million by 2030 [
4].
Although the prognosis of patients with HF has been improved with advances in treatments [
5], it has been reported that the mortality and hospitalization rates still remain high [
3,
6,
7]. A Japanese study reported that the rehospitalization rates for HF within 1 year after discharge were 23.7–25.7% [
8]. Moreover, HF significantly affects physical function and health state of patients [
9‐
11]. Thus, the goal of treatment is to improve the overall well-being of patients as well as survival, and the use of patient-reported outcomes (PROs) has been gaining momentum in cardiovascular research [
12,
13]. PRO measures are reported by patients and are useful to capture the realities of disease burden and treatment impacts. Disease-specific PRO measures may be more useful than generic instruments because they quantify the health state related to a particular disease and are therefore more sensitive to clinical changes [
12].
The Kansas City Cardiomyopathy Questionnaire (KCCQ) [
14], originally developed in the English language in 2000, is one such disease-specific PRO measure for HF, which assesses symptoms, physical and social limitations, and health-related quality of life (HRQoL). Having been translated into various languages and validated in many country-specific settings [
15‐
19], the KCCQ is now one of the most widely used PRO measures for patients with HF, along with the Minnesota Living with Heart Failure Questionnaire (MLHFQ) [
20]. Among the many existing HF-specific PRO measures, the KCCQ and MLHFQ were the only two measures that fit all eight evaluation criteria (e.g., psychometric properties, feasibility, interpretability, and symptom coverage) in a previous systematic review [
21]. One beneficial characteristic of the KCCQ is that it provides a summary score specifically focused on patient symptoms and physical limitations (physical function), along with the overall summary score. Symptoms and physical function are the most relevant domains for the clinical assessment of patients with HF, and these domains are also the main concepts of interest in the development of new HF treatments as they are proximal to patient experience of the disease [
22].
A linguistically validated Japanese version of the KCCQ is available and used often in clinical trials involving Japanese HF patients [
23‐
25]. However, the psychometric properties of the tool have not yet been evaluated. Therefore, in this study, we evaluated the validity and reliability of the Japanese version of the KCCQ in Japanese patients with chronic HF, with a focus on its domains and summary scores related to symptoms and physical function.
Discussion
The use of a valid PRO measure is essential for the adequate assessment of patients’ health states. In this study, to assess the psychometric properties of the Japanese version of the KCCQ, we evaluated the validity and reliability of the tool with a focus on the CSS and its component domains, which are considered most relevant for the clinical assessment of patients’ symptoms and physical functioning. The results of this study demonstrated that the Japanese version of the KCCQ had construct validity, good internal consistency, and high reproducibility and responsiveness when used in Japanese patients with chronic HF.
The known-group analysis showed that the three symptoms or physical function-related domain scores (i.e., physical limitations, symptom frequency, and symptom severity) and KCCQ summary scores were all associated with NYHA class, indicating that these scores accurately differentiated patients with differing disease severity. However, the social limitation domain score did not show a decreasing trend with the NYHA class. This result may be due to the disproportionate distribution of patients across NYHA classes (i.e., few patients were in higher NYHA classes III and IV) in this pooled sample. In addition, there was a response option that was coded as missing, which further contributed to the small number of patients with analyzable data in this domain. Although the known-group validity of this domain remains to be confirmed, a moderate correlation of this domain with the EQ-5D usual activity (ρ = − 0.43) partially supports its construct validity. The construct validity of the tool for the assessment of patients’ symptoms and physical functioning was further supported by moderate correlations of the CSS and physical limitations domain with the three EQ-5D dimensions that are related to functional domains. Considering that the EQ-5D-3L is a generic measure and the KCCQ is a HF-specific measure, their scores do not represent an exactly comparable assessment of domains, leading to understandably moderate rather than high correlation.
For reliability, all KCCQ domain/summary scores showed good internal consistency, as demonstrated by a high Cronbach’s α (> 0.7), which indicates that the items constituting the domain or summary scale can be considered to measure the same construct. In particular, the CSS had excellent internal consistency with an α of 0.90, which was almost equivalent to that of its original KCCQ counterpart (α = 0.93 [
14]). In the test-retest analysis using clinically stable patients, minimal changes in scores between the two assessments with ICCs of 0.69–0.78 demonstrated the moderate to high reproducibility of the three component domains of the CSS. The CSS and the other two summary scores also had high ICCs of 0.77–0.84, showing good reproducibility of these scales. The mean changes in scores between the two assessments were minimal (by 0.4–4.2 points on a 100-point scale) for other domains as well; however, the ICC of the symptom stability domain was exceptionally low (ICC = 0.19). This was probably because this is a single-item domain, and thus even a one-point change on a 5-point scale in a patient’s response was converted into a substantial score change on a scale of 0 to 100 for the domain score.
One advantage of the KCCQ over the MLHFQ is that the KCCQ is more sensitive to clinical change [
14]. Although a comparison with existing tools could not be performed in this study due to secondary use of trial data, our analyses showed that the Japanese version of the KCCQ was highly responsive to patients’ clinical change. All domain scores significantly increased by 17.2–26.9 points after 1 month of treatment in patients with improved health states, except for the symptom stability and self-efficacy domains. In particular, the symptom frequency domain and all summary scores, including the CSS, showed especially high responsiveness with a large effect size (> 0.80). However, the responsiveness of the symptom stability and self-efficacy domains, neither of which are incorporated into any of the KCCQ summary scores, could not be confirmed in this analysis. As they are conceptually different from other domains, their responsiveness may need to be evaluated in a more appropriate method.
In the development study of the original KCCQ, the baseline CSS was significantly lower in patients who subsequently died or required rehospitalization than in event-free survivors (35.1 vs. 55.3,
p < 0.001), suggesting the prognostic value of the tool [
14]. Unfortunately, we were unable to assess the prognostic value of the Japanese version of the KCCQ in this study owing to certain methodological limitations, such as a small number of patients with few numbers of prognostic events, which may be due to a short observation period and a disproportionately large proportion of patients with less severe symptoms (85.1% were classed as NYHA class I–II at baseline), as well as confounding by treatment effects (e.g., patients received different treatments according to their treatment group). The prognostic value of the Japanese version of the KCCQ would be worthy of further investigation.
PRO measures have been historically underused as metrics in clinical studies [
12]. However, in light of the increased focus on improving the overall well-being of patients, they are encouraged to be used as endpoints in cardiovascular studies [
13], and selected KCCQ domains are increasingly being used as such in heart failure trials. The KCCQ not only assesses all three principal components of patients’ health states, i.e., symptoms, functional status, and HRQoL, but can also be an independent predictor of poor prognosis [
38] and future healthcare costs [
39]. In addition, because the KCCQ is available in many languages, its use as a metric in clinical studies would enable international comparison of the health states of patients with HF. Furthermore, the KCCQ may also help to enhance patient care by directly informing clinicians of the patients’ disease burden and treatment impacts when used in clinical setting. Continued exploration of the usefulness of the KCCQ in clinical practice is warranted in future studies.
This study has several limitations. First, because this study involved the secondary use of three trials’ data and analyzed a pooled sample, the generalizability of the results of this study may be limited by the inclusion/exclusion criteria of the source trials. For example, as the majority of patients (76.6%) were classed as NYHA class II at baseline, our results may not be applicable to patients with more severe symptoms. Second, construct validity was assessed using only NYHA class and the EQ-5D-3L because of the limited measures available in the secondary use of trial data. For the symptoms and physical function-related scales of the KCCQ, assessment of correlations with measures with more similar constructs (e.g., MLHFQ) and measures that assess related functional domains (e.g., 6-min walk test) would have been useful. Moreover, construct validity of other domains, especially the self-efficacy and social limitations domains, require further evaluation using a more related, appropriate reference measures for each domain. Third, the reliability and validity of the single-item, symptom stability domain could not be confirmed in the present analysis. This item is inherently different from the other KCCQ items because it asks the patient to rate the degree of change in their symptoms over the past 2 weeks. Therefore, it was not expected to perform similarly to the other items and domain scores that do not require a comparison of current and previous experiences. Further assessment of this domain is warranted. However, the present analysis confirmed the reliability and validity of the CSS, the most relevant KCCQ summary score for clinical assessment. Thus, we believe that our results would provide valuable information for users of the Japanese version of the KCCQ. Lastly, although we defined patients’ symptom stability and changes in clinical status using the EQ-5D-3L, which has been reported to be responsive to clinical changes in patients with HF [
36], they may not have been adequately captured by the EQ-5D-3L. The EQ-5D-3L may be responsive to only relatively large changes, thereby limiting the analysis sample for the assessment of responsiveness, which may have contributed somewhat to the better responsiveness of the KCCQ. Likewise, although we observed robust stability estimated in the test–retest analysis, the analysis may have included some patients who had clinical changes.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.