Introduction
Over a quarter of US adults (27.2%) have been diagnosed with multiple chronic conditions (MCC) that adversely impact health status, functioning, or health-related quality of life (QOL), with greater prevalence among women, non-Hispanic white adults, those over age 65, and those living in rural areas [
1]. One-third have three or more. Because MCC are predictive of mortality, disability, response to treatment, health care utilization, expenditures, and declines in QOL (particularly physical QOL) [
2], case mix adjustments for differences in MCC have become essential in health outcomes and effectiveness research.[
3‐
9]. As noted with the first clinical definition of comorbidity[
10] and other clinimetric principles used in developing the first comorbidity indices[
11], MCC data can also enhance the staging of individual patient complexity, aid in treatment planning, and improve application of quality of care guidelines [
12‐
14]. Therefore, more accurate estimates of MCC impact have potential for improving adjustments for case mix differences in comparative effectiveness research and provider outcome comparisons [
12‐
17], and play a critical role in improving patient care by identifying individuals most likely to benefit from specific treatments/services [
18,
19].
QOL is a patient-reported outcome (PRO) of particular importance in the movement toward patient-centered care and shared decision-making. While the relevance of QOL status and outcomes is straightforward, their clinical utility is predicated on the use of reliable and validated measurement that is sensitive to change[
20] and satisfies other clinimetric principles [
11,
21,
22]. QOL assessments fall into two categories—generic or disease-specific measures. One major difference between these measurement categories is whether QOL is attributed to a specific disease or diagnosis [
23,
24]. Generic QOL measures have the advantage of enabling comparisons of disease burden across MCC, while disease-specific QOL measures provide greater responsiveness to a specific condition [
23]. As hypothesized decades ago[
23], disease-specific QOL measures that summarize the impact attributed to one disease have been shown to be a substantial improvement in clinical usefulness compared to generic measures of the same QOL domains [
25‐
27].
At the heart of all disease-specific QOL impact attributions is the assumption that patients can validly parse the impact of any one of their conditions in the presence of MCC, a common situation in the interpretation of outcomes. This assumption has been supported by comparisons of the validity of specific and generic measurement methods within a given clinical condition [
28,
29], including a large US chronically ill population study results showing significant convergent and discriminant validity across 90% of 924 tests of MCC within nine pre-identified disease conditions. Specifically, tests comparing correlations among different methods (e.g., clinical markers and disease-specific QOL ratings) measuring the same condition (i.e., convergent validity) were substantial in magnitude and significantly higher than correlations between comorbid diseases measured using the same method (i.e., discriminant validity) [
27]. Notably, previous evaluations of validity for a specific condition have almost always been limited to convergent evidence, i.e., that different methods measuring the same condition reach substantial agreement.
Thus far, the assessment of MCC impact has been hindered by a proliferation of diverse measures and a dearth of studies addressing the practical implications of differences across MCC assessment methods. For example, legacy methods for aggregating MCC have been based on simple condition count, which ignores differences between conditions and assumes that the impact of each condition is the same for all who have it [
2,
9,
17,
30‐
33]. Methods addressing those differences have weighted conditions on a population level using criteria such as mortality [
2,
34] or health care utilization [
24]. Among the first MCC measures to recognize the importance of both the number of conditions and the differences in their impact, the Charlson Comorbidity Index (CCI) [
34] has been the most frequently and extensively studied [
2]. The Elixhauser alternative expands the list of conditions [
35]. While shown to be useful for some case mix adjustment purposes, these indexes have been criticized for their reliance on
mortality weighting, omission of prevalent and morbid conditions, and assignment of the same population weight to everyone with a given condition. Emphasis on generic QOL outcomes monitoring and evidence regarding the substantial impact of prevalent and morbid conditions (e.g. osteoarthritis, back problems, depression) omitted from the CCI are known to affect the generic physical, psychological, and social QOL domains [
2].
In response, QOL-based patient reported outcome (PRO) monitoring research has spawned several advances in MCC assessment methods, including approaches using (1) models that standardize disease-specific QOL impact scoring across different conditions, (2) a summary disease-specific impact score aggregating across QOL domains (i.e., simplified 1-factor scoring), (3) IRT-based calibrations of single and multi-item measure scoring across diseases [
24,
25,
36,
37], and (4) items proven to discriminate QOL impact for a given condition in the presence of other comorbid conditions. Scoring can distinguish between
comorbidity, or impact in the context of an “index” condition in tertiary and secondary care settings, and
multimorbidity (total QOL impact in primary care and other generalist settings). [
38,
39]
Until recently, a lack of standardization of QOL content and scoring across
disease-specific QOL measures has impeded meaningful comparisons across conditions and aggregation of total MCC QOL impact. Prior work has addressed this issue with disease-specific QOL assessments reflecting the richness of widely-used generic QOL surveys and standardized across diseases with items differing only in terms of disease-specific attributions for QOL impact [
25,
40].
The practical implications of a
single factor disease-specific measurement model leaving very little disease-specific QOL shared item variance unexplained[
36,
37] is that it enables a summary score for each condition with minimal loss of information and reduced respondent burden in comparison with multiple scores for each condition [
24,
37]. Other practical implications include greater measurement efficiency, standardized comparisons of disease-specific QOL burden, improved aggregation of QOL impact across MCC, and adaptive
disease-specific QOL assessments [
24,
25,
27,
41]. Results showing high
single item correlations with
individualized disease-specific QOL item bank total scores enabled further reduction in respondent burden for estimating MCC impact for each individual [
24,
26,
27,
36,
37]. Finally, measures such as the QDIS-MCC used in the current study reflect an advance, with rare exceptions, in both convergent and discriminant validity for priority MCC [
27,
28,
42].
This paper examines whether an expanded condition checklist, population QOL-weighting (in contrast to mortality weighting), and use of each patient’s own QOL impact rating (individualized ratings in contrast to population weights for their MCC) improve predictions of generic QOL outcomes. To date, these methods have only been tested on small [
43‐
45] or age-restricted samples.[
46]. This is the first study to systematically compare legacy and improved methods for aggregating the impact of MCC with the goal of better understanding their effects on the accuracy of predictions of generic physical and mental QOL status and outcomes.
Discussion
Systematic comparisons among unique features of different MCC aggregation methods, while holding other features constant, linked improved accuracy to: (a) use of an expanded list of chronic conditions, (b) population weighting of reported conditions in terms of their
QOL impact, as opposed to
mortality weighting, and (c) use of
individualized estimates of QOL impact for each condition rather than its population weight that ignores individual differences within each condition. An expanded chronic condition checklist paired with
individualized disease-specific QOL impact measures standardized across multiple chronic conditions (QDIS-MCC) enabled the largest improvements increasing the accuracy of predictions of physical and mental QOL outcomes into the moderate to strong model range [
9].
Our findings have implications for purposes of group comparisons in outcomes research and improving individual patient quality of care. These include: (a) more confidently attributing QOL differences observed between self-selected groups to the effects of group membership as opposed to case-mix differences [
3‐
8,
12‐
16], (b) adding better QOL estimates of individual experiences of clinical care and treatment outcomes, consistent with the principles of clinimetrics [
11] to improve tailored treatment decision making [
16,
18], (c) determining whether individuals are currently functioning and feeling better/worse than expected for their age, comorbid conditions, and other characteristics; and (d) identifying those more or less likely to experience clinically significant improvement as a result of treatment [
15,
16,
68‐
70].
Figure
1 for adults with pre-identified OA illustrates how individualized disease-specific impact ratings work to improve QOL prediction accuracy and reveals systemic patterns of substantial over- or under-estimation errors from using the population main effect adjustments based on the legacy comorbidity methods studied. Applying the same population adjustment for all with a given condition tilts this teeter-totter pattern of errors up or down depending on adjustment magnitude, whereas individualized adjustments reduced both over- and under-estimation errors. Given that only errors > 0.5 SD units were counted, the large percentage of errors observed are likely of importance given that they exceed clinically, economically, and socially important effect sizes in the range of 0.2–0.3 SD units recommended by developers of the generic QOL outcome measures studied [
17,
53,
54,
67]. Although adjusted R
2 estimates were consistently lower for mental (MCS) compared to physical (PCS) predictions, improved incremental validity (or predictive validity beyond that provided by legacy methods) was consistently observed for individualized estimates in predictions of both PCS and MCS.
At the core of the improved MCC impact estimation is a psychometrically-sound summary disease-specific QOL impact score. Despite the breadth of QOL domains represented in the standardized QDIS item bank for each specific condition, those items are sufficiently homogeneous to justify a 1-factor model summary score [
24,
25]. Further, the single global QDIS item representing each disease-specific bank correlates highly enough (r > 0.90) with the total item bank to justify its use in the shortest possible 2–3 min QDIS-MCC survey, combining a standardized checklist and QOL impact item attributing impact specifically to each reported condition [
24]. Although the evolving applications of psychometric theory and methods [
71] in parallel with clinimetric principles has not been without debate [
72,
73], it is important to note that differences in their emphasis are complementary and that they share some commonalities [
21,
74,
75]. For example, the current study uses incremental validity methods promoted by clinimetricians [
11] and by psychometricians [
56]; both of which advocate for validity testing using clinical criteria.
In support of generalizing results, significant unique MCC effects in predicting PCS and MCS, as well as patterns of larger effect sizes in the current study are concordant with results from the US Medical Outcomes Study (MOS) [
54], US general population surveys [
53,
76,
77], as well as studies in eight other countries [
55]. For example, across common conditions, negative effects on PCS were largest for arthritis, heart, and lung conditions in the US, seven European countries and Japan. Given the consistency across samples and countries and languages, it has been suggested these estimates can be generalized as a basis for defining important effect sizes [
55]. Accordingly, for standardized record based and self-reported chronic condition checklists, the population weights documented in Table
4 and elsewhere [
48] are recommended for use in achieving the advantages of QOL population-weighted MCC impact scoring over simple counts or population mortality weighting without additional primary data collection.
The relatively large unique effects of OA, back problems, chronic fatigue, and fibromyalgia on PCS and depression on MCS, conditions not included in the CCI, may at least partly explain its relatively poorer performance. The pattern of higher adjusted R
2 estimates for predictions of PCS compared to MCS based on MCC is also consistent with prior research [
31,
33]. Increased variances explained by expanded condition checklists in the current study (adjusted R
2 0.39 and 0.34, respectively) are also consistent with prior US research (adjusted R
2 0.45 and 0.31) [
53]. Unfortunately, prior studies in Europe and Japan did not report model R
2 estimates [
55].
There are noteworthy strengths and limitations of the current study. Data came from large, nationally representative US population samples supplemented with pre-identified chronically ill adults, which enabled both interpretations of QOL in relation to more representative national norms and the greater precision from larger supplemental samples required for within-disease MCC comparisons. The potential shortcoming of regressions overfitting data was addressed by cross-validation of model comparisons in an independent sample and data from a later time point. It is a strength of the current study that (a) cross-sectional baseline developmental sample population weights for chronic conditions used in standardizing aggregate QDIS-MCC scores were cross-validated in an independent sample and (b) longitudinal (nine-month) outcome models replicated the overall pattern of cross-sectional predictions. Reliance on the 8-item MOS survey (SF-8) [
24] estimates of chronic condition effects on PCS and MCS (effect sizes and adjusted R
2 estimates) is a potential limitation, although literature suggests otherwise [
53,
78]. To address this concern, model comparisons were replicated for a random subsample who completed the full-length 36-item MOS Health Survey (SF-36) in parallel with SF-8. Overall patterns of PCS and MCS results were comparable with results from other studies using the SF-36, SF-12, and SF-8 surveys [
47,
53,
78].
Two other potential limitations are that all data were self-reported and collected electronically with no clinical verification of diagnoses or condition severity, and it was assumed that participants can validly rate the QOL impact of one specific condition in the presence of MCC. Addressing the first, prior research has identified discrepancies between patient self-report and administrative data, with higher rates of disagreement for cancer or mental health diagnoses compared to diabetes [
79]. However, other studies suggest that patient self-reported conditions perform equally as well in predicting QOL in comparison with comorbidity data obtained from medical records [
80,
81]. To the extent reliance on self-report was a limitation, it is likely to have similarly effected all MCC aggregation methods tested. Second, QDIS global item and multi-item attributions to a specific condition have been shown to be sufficiently valid [
27] in the presence of MCC. Specifically, for pairs of pre-identified and other comorbid conditions studied here (asthma, diabetes, OA), correlational tests supported convergent (same condition-different methods and criteria) and discriminant validity (different conditions, same method) in more than 90% of tests [
27]. Some noted exceptions involving MCC characterized by the same symptom (e.g., SOB) warrant further study. In such cases, the individualized multimorbidity QOL impact aggregation provides a better case-mix adjustment for predicting generic QOL outcomes, in comparison with a simple count, population QOL- and mortality-weighted methods. While clinic data and judgement are still required to discern among confounded causes of patient experiences, attribution of ambiguous symptoms to a specific condition is not as informative as the extent of QOL impairment.
Finally, data limitations noted above limited method comparisons to 12 conditions common to the CCI and 35-item checklist. Further, some CCI conditions required data that were only available for the developmental sample with pre-identified conditions. While the CCI was scored conservatively, consistent with previous studies [
59], excluded conditions may have contributed to relatively poorer CCI performance. However, it should be noted that the CCI is less comprehensive, omitting more than a dozen conditions (e.g., OA) shown to significantly diminish QOL [
53,
76,
77]. The Elixhauser index [
35,
82] is a more comprehensive alternative to the CCI, although it requires specific ICD coding (beyond 3 digits) for accuracy, still excludes conditions known to adversely impact QOL (e.g., fibromyalgia, migraines), and has limited potential to discriminate severity of impact within each condition. All comparisons of aggregation methods in terms of simple counts versus population QOL and mortality weighting, and individualized weighting in this study were standardized using the 12 common CCI conditions. The optimal number and selection of specific checklist conditions used to standardize adjustments for chronic condition case-mix differences in QOL outcomes monitoring warrants further attention.
Practical considerations present other potential limitations, particularly for individualized QDIS-MCC impact assessments that require primary data collection, which increase costs and respondent burden. Whereas use of more practical generic QOL measures is increasing in EHRs, short-form solutions have only recently been available for disease-specific measures due to length of legacy tools and lack of disease-specific QOL impact comparability across conditions [
23]. Single-item QDIS measures with specific attributions to each condition reported in the current study are substantially more practical. They yield directly comparable scores that correlate highly with their full QDIS item bank for the same condition, and are valid in relation to full-length legacy measures of the same disease, despite their coarseness and lower reliability [
24,
27]. Supplementing the global QDIS item used for each disease in the current study with multi-item paper–pencil or internet-based CAT administrations of items making attributions to the same disease has been shown to increase precision for clinical research and practice. [
24,
26] This adaptive logic is the next step when more reliable individualized estimates (e.g., likelihood of treatment relief) are needed [
16]. Feasibility, respondent burden reduction, and clinical utility of such adaptive logic were supported in a national registry pilot study before and after joint replacement, where responsiveness and high correlations between QDIS-OA, QDIS-MCC, and generic PCS outcomes were statistically significant despite a very small sample [
83]. Other findings suggest there are points beyond which additional measurement precision may not be worth the burden and cost [
26,
41]. Further condition-by-condition research is recommended to optimize adaptive logic for patient selection to maximize measurement efficiency. Other issues warranting further study are whether MCC aggregation methods shown to be more predictive of generic QOL are also more predictive of other outcomes (e.g., hospitalization, job loss, and costs of care). Given that simple condition counts have been shown to predict such other outcomes [
8,
29], it is reasonable to hypothesize that individualized estimates will do even better.
To summarize, individualized single-item measures of QOL impact with standardized content and scoring across MCC, that differ only in attribution to a specific condition, provided a more practical method of aggregating MCC QOL impact. This new comorbidity index (QDIS-MCC) was more useful than legacy MCC aggregation methods for purposes of adjusting for case-mix differences in predicting generic physical and mental QOL outcomes. The QDIS-MCC short form combines a standardized chronic condition checklist with a single global QDIS impact item for each reported condition and required less than three minutes for most respondents to complete (median one minute for checklist, median two minutes for comorbid QDIS item administrations). This approach illustrates the potential for improving the staging of individual patients, deciding whether more reliable (e.g., additional) measurement is likely to be worthwhile, and providing a better adjustment for individual and group case-mix differences in MCC burden for purposes of more accurately predicting generic physical and mental QOL outcomes. For comparative effectiveness research, such advances can strengthen case mix adjustments essential to attributing differences in health outcomes across self-selected groups [
57]. In clinical practice, individualized disease-specific MCC QOL impact stratifications can provide actionable information about the severity of MCC accounting for likely differences in patient’s generic health status and outcomes. To assure the availability of QDIS-MCC forms for further research by scholars and individuals for academic research, the non-profit MAPI Research Trust (MRT) is managing and distributing licenses for use worldwide (
https://mapi-trust.org/) for a minimal handling fee. MRT is also handling commercial licenses to companies, healthcare delivery organizations, and others for commercial applications.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.