Linking hospital discharge and death records—accuracy and sources of bias
Introduction
The ability to assess the quality of health care delivery and access to care has been immeasurably improved through the increased availability of large-scale health care data sets. Linking existing data resources has resulted in even more complex data sets providing new contextual and outcome measures allowing for richer and more revealing analyses of health care. Advances in creating these data resources have been limited because of privacy considerations, size, and cost.
Mortality after hospitalization is an accepted measure of quality of health care and has been used in quality of care report cards for hospitals in California, Pennsylvania, and New York [1], [2], [3], [4]. Overall mortality rather than within-hospital mortality is recognized as a more reliable measure given the potential for dumping or shifting sicker patients, a possibility accentuated by secular trends of shortening lengths of stay. Rates of death during hospitalization do predict total mortality rates after admission, but the strength of association is condition-specific [5], [6]. Other measures, such as readmission, must account for censoring due to out of hospital mortality [7], [8].
Linkage of hospitalization to death records is already performed, usually on an as-needed basis for research purposes, while generally available reliably linked data sets for non-Medicare hospitalized populations are less common. A significant weakness of many data linkage projects and subsequent uses of such data is the lack of knowledge regarding the magnitude of potential bias due to missing data. Such assessments are necessary if we are to understand the limitations of these data for measuring quality and making subsequent inferences and decisions.
In this article, we describe the creation and evaluation of a new research database linking the California hospital discharge data set to the state vital statistics registry for the years 1990 to 1999. In the first portion of this article, we describe the data used, review the important issues regarding data linkage, and describe the actual implementation used for our approach. In the latter half of this article, we estimate the magnitude of two separate sources of potential bias: (1) the accuracy of linkages between records with personal identifiers, and (2) the contribution of differential rates of missing personal identifiers to undercounting posthospitalization deaths.
Section snippets
Source databases for the linked records
The annual California Office of Statewide Health Planning and Development (OSHPD) Patient Discharge Database (PDD) consists of records for all discharges from all non-Federal hospitals located in California. For any given year, approximately 3.5 million discharges are reported within this database. Of these discharges, approximately 15% of individuals have more than one discharge in a given calendar year, and 5% of admissions are classified as “nonacute” level of care. Thus, PDD reports the
Data linkage
Results by year were comparable. For each set of linked PDD and DSMF records from the same year, approximately 125,000 records linked using the first set of blocking variables, 3,200 with the second set of blocking variables, 800 with the third set of blocking variables, and 2,000 with the final set of blocking variables. For those dying within the same calendar year, about half of all death linkages occurred within the hospital. Linked PDD and DSMF records from successive years (e.g., PDD 1997
Discussion
As automated linkage of data sets using probabilistic methods has gained popularity, linkage efforts have become ever larger and more complex making monitoring and evaluating linkage results more challenging than ever. The current effort is an order of magnitude larger in size than previous efforts by OSHPD to create the very low birth weight baby outcomes registry [27]. The current approach captures most deaths occurring among hospitalized individuals and provides estimates for deaths
Limitations
Lack of SSNs disproportionately impacted record linkages for records of individuals from more vulnerable populations; thus, estimated out-of-hospital death linkages are likely to be underestimates of the true posthospitalization death rates. The sequential application of the linkage across years may result in misassignment of linkages. Strict consistency rules may have rejected a number of otherwise correct record pairs.
The gold standard definition has assumptions that may not hold true. There
Conclusion
The linkage of hospital discharge and death records can be a significant resource for evaluating care and outcomes. Based upon comparison to a gold standard data set, identified linkages appear to be highly accurate, and the number of unidentified record linkages has decreased in more recent years of data. This work appears to be the first attempt to explicitly estimate the number of missing record linkages among unlinked or unlinkable records, and points to the relatively small number of
Acknowledgements
This work was funded by the California Office of Statewide Health and Development (OSHPD Contracts 99-0220 and 01-2365). Dr. Zingmond is supported by a UCLA Claude D. Pepper Older Americans Independence Center Development Award (AG10415) and the Irving and Mary Lazar Foundation.
References (27)
- et al.
The California Hospital Outcomes Project: using administrative data to compare hospital performance
Jt Community J Qual Improv
(1995) - et al.
Adjusted hospital death rates: a potential screen for quality of medical care
Am J Public Health
(1987) Public release of performance data: a progress report from the front
JAMA
(2000)- et al.
Influence of cardiac-surgery performance reports on referral practices and access to care. A survey of cardiovascular specialists
N Engl J Med
(1996) - et al.
Effect of definition of mortality on hospital profiles
Med Care
(2002) - et al.
Relationships between in-hospital and 30-day standardized hospital mortality: implications for profiling hospitals
Health Serv Res
(2000) - et al.
Screening inpatient quality using post-discharge events
Med Care
(1999) - et al.
The hospital multistay rate as an indicator of quality of care
Health Serv Res
(1999) Death Statistical Master File 1989 to 1998 Technical Documentation
(2000)Probabilistic linkage of large public health data files
Stat Med
(1995)