Background
In the United States (US), progressive chronic kidney disease (CKD) and particularly end-stage renal disease (ESRD) disproportionately affect traditionally underserved groups including racial-ethnic minorities and persons of low socioeconomic means [
1‐
6]. Despite the disproportionate burden of ESRD observed among racial-ethnic minority and low income groups, effective interventions to slow CKD progression and reduce mortality appear to be underutilized in these populations [
7,
8].
A central barrier to applying proven therapies in traditionally underserved settings lies in the absence of mechanisms to efficiently monitor and optimize care provided to the nation’s poor and underinsured [
9,
10]. Recently, one large safety-net health system has leveraged prediction analytics and data from the electronic health record (EHR) to accurately identify inpatients with specific conditions who are at high risk for subsequent re-hospitalization [
11]. Most patients with moderate or severe CKD suffer from multiple chronic conditions and increasingly receive their care from Chronic Disease Management teams [
12,
13]. These clinic-based teams typically target patients with specific conditions (e.g., diabetes mellitus, hypertension, chronic viral diseases, congestive heart failure, severe CKD, etc.) and seek to optimize important risk factors for progressive disease, disability and mortality [
13,
14]. However, even within these clinic- or disease-based practices most individuals with CKD will not progress to ESRD. Whether data from the EHR can be “meaningfully” used to identify persons at high risk for progressive CKD within this practice-based construct is unclear.
To address this issue, we examined the performance of EHR-based risk predictive models to discriminate among persons with CKD who would and would not progress to ESRD for time frames up to 7 years. We hypothesized that the discriminatory ability (and usefulness) of these models to accurately predict ESRD would vary substantially within clinical subgroups due primarily to differences in patient composition and in the distribution of ESRD risk.
Discussion
Surveillance of “real-world” care delivery to vulnerable groups is challenging because kidney disease metrics are not routinely measured or reported by federally qualified health centers nor are they part of reporting requirements for health plans [
33,
34]. Thus, underserved or vulnerable patients with non-dialysis dependent CKD remain invisible to much of the healthcare system unless and until they reach ESRD (at which time most become eligible for Medicare). In this diverse cohort of persons with CKD, we observed that risk predictive models using common data from the EHR can accurately discriminated between most persons who did and did not progress to ESRD for time frames up to 7 years. However, the performance of these risk models varied when applied to specific conditions frequently targeted for Chronic Disease Management. Model performance was highest in the hypertension subgroup, intermediate in chronic viral disease and diabetes mellitus subgroups, and lowest in the subgroup with severe CKD (eGFR < 30 ml/min/1.73 m
2). Our study findings may help health organizations and their clinical practices to optimize care assessment by estimating the scope and potential needs of patients with CKD who are at highest risk for disease progression, disability and ESRD-related costs.
Health researchers frequently assess risk at the individual level using epidemiological studies or by examining patient-level interventions in randomized clinical trials [
24‐
26]. Tangri et al. evaluated the performance of several risk predictive models based on data extracted from the EHR of 2 CKD cohorts in Canada using traditional discriminatory criteria (C-statistic/AUC and integrated discrimination improvement). They observed that most ESRD risk predictive models performed well in patients with moderate or severe CKD who were referred for nephrology evaluation [
24]. Complementary studies in large study cohorts have yielded additional risk predictive models for ESRD which are intended for use by clinicians to estimate individual patient- rather than population-level risk [
25,
26]. Using criteria (proportion of cases followed and proportion needed to follow) designed to inform population-level disease assessment, we recently observed that a model incorporating age, race, sex, eGFR and dipstick proteinuria adequately predicted progression to ESRD among vulnerable persons with moderate or severe CKD who were identified through systematic review of the EHR [
18]. Our current study findings extend and leverage our prior work, by placing the proportion of cases followed (PCF) and proportion of the population needed to be followed (PNF) more squarely in the context of how clinical care is actually delivered for this patient population. Collectively, our observations suggest that readily available data from the EHR might be efficiently used in earlier stages of CKD to further inform care assessment and planning for organizations or practices based on clinical conditions commonly targeted by disease management programs. For example, our methods could be applied to hypertension clinic to estimate the potential feasibility or effectiveness of an intervention or program targeting of patients in the highest decile or quintile of ESRD risk (e.g., for additional interventions or pragmatic studies, etc.). The resources needed to follow a targeted group of high-risk patients from such disease-based program could be markedly lower than required for risk-stratifying an entire health system. Because CKD represents a heterogeneous array of underlying disease states, this disease- or risk factor-based approach could theoretically leverage existing programmatic resources and infrastructure as an alternative to lumping all high-risk CKD patients into a single category.
In terms of discriminatory assessment, our study illustrates how the PCF and PNF can be more informative than traditional discriminatory criteria such as AUC by providing estimates of risk concentration (for the event of interest) in the study population. If risk is concentrated, such as in the hypertension subgroup, values for traditional discriminatory criteria will typically be large, but the converse is not necessarily true [
32]. In other words, as observed in the severe CKD subgroup, values for traditional discriminatory criteria may be favorable even when risk is not concentrated [
32,
35]. This observation occurs because the AUC evaluates the model’s ability to discriminate between progressors and non-progressors for all risk thresholds, even those that are not as relevant in clinical practice, such as risks close to zero. In contrast, when using PCF and PNF, we can set the risk threshold at a clinically meaningful level. Thus, desirable values of PCF and PNF are achieved only if risk in the target population is concentrated among a relatively small proportion of individuals (at highest risk). The suboptimal performance of our risk models based on PCF/PNF values among persons with severe CKD likely reflected the broader distribution of ESRD risk in this subgroup. Notably, values for AUC were universally favorable in all subgroups, and hence less informative from the perspective of clinical practice. Reduced variance of influential predictors such as eGFR and dipstick proteinuria among patients with severe CKD (relative to patients from the other subgroups) likely further reduced the predictive capacity of our risk models in this subgroup. While the urinalysis dipstick remains an excellent population-level screening tool for proteinuria, its limited utility in predicting progression to ESRD among patients in later stages of CKD has been previously documented [
36]. Thus, models which incorporate additional predictors such urinary albumin-to-creatinine ratio (which was measured only in a small fraction of patients in our cohort) and annual decline in eGFR would likely enhance predictive performance in the setting of severe CKD. In addition, patients with severe CKD are also at markedly higher risk of premature death than those with higher levels of kidney function [
20]. The elevated risk of death in this severe CKD subgroup poses additional challenges for predicting ESRD since many (if not all) of the covariates examined are also significantly associated with death.
Historically, low individual-level provider and patient awareness of CKD have reinforced the need to optimize multi-level strategies (at the community, organization, practice, and patient levels) to help assess and manage CKD [
18,
37‐
41]. Our study findings demonstrate the potential usefulness of clinical data from the EHR to provide reliable information for CKD care assessment at the level of the organization and at the level of a disease- or clinic-based practice which might readily generalize to other chronic diseases. Accordingly, clinical practices might leverage the EHR, for example, to identify, triage, and monitor the care of patients at highest risk for progressing to ESRD. As evidenced by the high prevalence of psychiatric conditions, drug and alcohol use in our study cohort, such a multi-level health approach would likely require the consideration of a broader array of health determinants than in conventional health settings. Intervention approaches might thus combine EHR-based risk surveillance with facilitators of care engagement such as assistance with transportation, housing, or health insurance applications, drug or alcohol cessation programs, and mental health co-management. When necessary, our methods might also be applied to refer and track patients at highest risk for imminent (<1 year) ESRD to ensure timely placement of dialysis access, transplant referral, and dialysis education. Recent advancements in software now enable predictive analytics to interdigitate with the EHR, further highlighting the potential “real world” and “real time” application of EHR-derived predictive models [
11].
Our study is strengthened by the inclusion of adults with moderate or severe CKD from two large safety-net health systems—populations which traditionally bear a disproportionate burden of ESRD and which may benefit from enhanced CKD surveillance. Our study also includes several limitations. First, while we were able to provide detailed demographic and clinical data, and link our cohort to national registries (to obtain complete or nearly complete capture of treated ESRD and vital status), our study was subject to potential bias from under-ascertainment of comorbidities based on diagnostic and procedural codes. Second, misclassification of CKD and its severity using the MDRD GFR estimating equation may also be operative since the MDRD study equation was derived in a population of mostly white and black patients with moderate-to-advanced CKD, very few of whom had diabetes mellitus. Third, while our study included patients from diverse social and demographic groups, this cohort may not be fully representative of patients who utilize the healthcare safety-net in other geographic locations. In addition, our observations require further validation using external data as our predictive models may perform differently in other populations. Lastly, rather than restricting our study to only patients with complete data (and potentially introducing bias from case deletion), we leveraged multiple imputation under the assumption that missing values carried no information about probabilities of missingness. However, our study results may be potentially biased in the unlikely event that this assumption of ‘missing at random’ (MAR) was violated [
29].
Abbreviations
AUC, area under the receiver operating characteristics curve; CKD, chronic kidney disease; ESRD, end-stage renal disease; EHR, electronic health record; eGFR, estimated glomerular filtration rate; HBV, hepatitis B virus; HCV, hepatitis C virus; HIV, human immunodeficiency virus; HMC, Harborview Medical Center; MAR, missing at random; MDRD, Modification of Diet in Renal Disease; PCF, proportion of cases followed; PNF, proportion of the population needed to be followed; ROC, receiver operating characteristics; SFHN, San Francisco Health Network; US, United States; USRDS, United States Renal Data System
Acknowledgements
We dedicate the study to the memory of our friend and colleague, Dr. Andy Choi. We are further indebted to Dr. Andy Bindman for administrative and logistical support.