Linking hospital discharge and death records—accuracy and sources of bias

https://doi.org/10.1016/S0895-4356(03)00250-6Get rights and content

Abstract

Background and Objective

The aim of this study was to develop and apply an automated linkage algorithm to 10 years of California hospitalization discharge abstracts and death records (1990 to 1999), evaluate linkage accuracy, and identify sources of bias.

Methods

Among the 1,858,458 acute hospital discharge records with unique social security numbers (SSNs) from 1 representative year of discharge data (1997), which had at least 2 years of follow-up, 66,410 of 69,757 deaths occurring in the hospital (95%) and 66,998 of 1,788,701 of individuals discharged alive (3.7%) linked to death records. Linkage sensitivity and specificity were estimated as 0.9524 and 0.9998 and positive and negative predictive values as 0.994 and 0.998 (corresponding to 400 incorrect death linkages among out-of-hospital death record linkages and 3,300 unidentified record pairs among unlinked live discharges).

Results

Based upon gold standard linkage rates, discharge records for those of age 1 year and older without SSNs may have 2,520 additional uncounted posthospitalization deaths at 1 year after admission. Gold standard comparison for those with SSNs showed women, the elderly, and Hispanics and non-Hispanic Blacks had more unlinked hospital death records, although absolute differences were small. The concentration of unidentified linkages among discharge records of traditionally vulnerable populations may result in understating mortality rates and other estimates (i.e., events with competing hazard of death) for these populations if SSN is differentially related to a patient's disease severity and comorbidities.

Conclusion

Because identification of cases of out-of-hospital deaths has improved over the past decade, observed improvements in patient survival over this time are likely to be conservative.

Introduction

The ability to assess the quality of health care delivery and access to care has been immeasurably improved through the increased availability of large-scale health care data sets. Linking existing data resources has resulted in even more complex data sets providing new contextual and outcome measures allowing for richer and more revealing analyses of health care. Advances in creating these data resources have been limited because of privacy considerations, size, and cost.

Mortality after hospitalization is an accepted measure of quality of health care and has been used in quality of care report cards for hospitals in California, Pennsylvania, and New York [1], [2], [3], [4]. Overall mortality rather than within-hospital mortality is recognized as a more reliable measure given the potential for dumping or shifting sicker patients, a possibility accentuated by secular trends of shortening lengths of stay. Rates of death during hospitalization do predict total mortality rates after admission, but the strength of association is condition-specific [5], [6]. Other measures, such as readmission, must account for censoring due to out of hospital mortality [7], [8].

Linkage of hospitalization to death records is already performed, usually on an as-needed basis for research purposes, while generally available reliably linked data sets for non-Medicare hospitalized populations are less common. A significant weakness of many data linkage projects and subsequent uses of such data is the lack of knowledge regarding the magnitude of potential bias due to missing data. Such assessments are necessary if we are to understand the limitations of these data for measuring quality and making subsequent inferences and decisions.

In this article, we describe the creation and evaluation of a new research database linking the California hospital discharge data set to the state vital statistics registry for the years 1990 to 1999. In the first portion of this article, we describe the data used, review the important issues regarding data linkage, and describe the actual implementation used for our approach. In the latter half of this article, we estimate the magnitude of two separate sources of potential bias: (1) the accuracy of linkages between records with personal identifiers, and (2) the contribution of differential rates of missing personal identifiers to undercounting posthospitalization deaths.

Section snippets

Source databases for the linked records

The annual California Office of Statewide Health Planning and Development (OSHPD) Patient Discharge Database (PDD) consists of records for all discharges from all non-Federal hospitals located in California. For any given year, approximately 3.5 million discharges are reported within this database. Of these discharges, approximately 15% of individuals have more than one discharge in a given calendar year, and 5% of admissions are classified as “nonacute” level of care. Thus, PDD reports the

Data linkage

Results by year were comparable. For each set of linked PDD and DSMF records from the same year, approximately 125,000 records linked using the first set of blocking variables, 3,200 with the second set of blocking variables, 800 with the third set of blocking variables, and 2,000 with the final set of blocking variables. For those dying within the same calendar year, about half of all death linkages occurred within the hospital. Linked PDD and DSMF records from successive years (e.g., PDD 1997

Discussion

As automated linkage of data sets using probabilistic methods has gained popularity, linkage efforts have become ever larger and more complex making monitoring and evaluating linkage results more challenging than ever. The current effort is an order of magnitude larger in size than previous efforts by OSHPD to create the very low birth weight baby outcomes registry [27]. The current approach captures most deaths occurring among hospitalized individuals and provides estimates for deaths

Limitations

Lack of SSNs disproportionately impacted record linkages for records of individuals from more vulnerable populations; thus, estimated out-of-hospital death linkages are likely to be underestimates of the true posthospitalization death rates. The sequential application of the linkage across years may result in misassignment of linkages. Strict consistency rules may have rejected a number of otherwise correct record pairs.

The gold standard definition has assumptions that may not hold true. There

Conclusion

The linkage of hospital discharge and death records can be a significant resource for evaluating care and outcomes. Based upon comparison to a gold standard data set, identified linkages appear to be highly accurate, and the number of unidentified record linkages has decreased in more recent years of data. This work appears to be the first attempt to explicitly estimate the number of missing record linkages among unlinked or unlinkable records, and points to the relatively small number of

Acknowledgements

This work was funded by the California Office of Statewide Health and Development (OSHPD Contracts 99-0220 and 01-2365). Dr. Zingmond is supported by a UCLA Claude D. Pepper Older Americans Independence Center Development Award (AG10415) and the Irving and Mary Lazar Foundation.

References (27)

  • P.S. Romano et al.

    The California Hospital Outcomes Project: using administrative data to compare hospital performance

    Jt Community J Qual Improv

    (1995)
  • R.W. Dubois et al.

    Adjusted hospital death rates: a potential screen for quality of medical care

    Am J Public Health

    (1987)
  • A.M. Epstein

    Public release of performance data: a progress report from the front

    JAMA

    (2000)
  • E.C. Schneider et al.

    Influence of cardiac-surgery performance reports on referral practices and access to care. A survey of cardiovascular specialists

    N Engl J Med

    (1996)
  • M.L. Johnson et al.

    Effect of definition of mortality on hospital profiles

    Med Care

    (2002)
  • G.E. Rosenthal et al.

    Relationships between in-hospital and 30-day standardized hospital mortality: implications for profiling hospitals

    Health Serv Res

    (2000)
  • L.I. Iezzoni et al.

    Screening inpatient quality using post-discharge events

    Med Care

    (1999)
  • N.P. Wray et al.

    The hospital multistay rate as an indicator of quality of care

    Health Serv Res

    (1999)
  • Department of Health Services Center for Health Statistics

    Death Statistical Master File 1989 to 1998 Technical Documentation

    (2000)
  • M.A. Jaro

    Probabilistic linkage of large public health data files

    Stat Med

    (1995)
  • J. Copas et al.

    Record linkage: statistical models for matching computer records

    J R Stat Soc A

    (1990)
  • I.P. Fellegi et al.

    A theory for record linkage

    J Am Stat Assoc

    (1969)
  • J.E. Keller et al.

    An algorithm for matching anonymous hospital discharge records used in occupational disease surveillance: anonymous record matching algorithm

    Am J Ind Med

    (1991)
  • Cited by (0)

    View full text