Original Article
A high positive predictive value algorithm using hospital administrative data identified incident cancer cases

https://doi.org/10.1016/j.jclinepi.2007.05.017Get rights and content

Abstract

Objective

We have developed and validated an algorithm based on Piedmont hospital discharge abstracts for ascertainment of incident cases of breast, colorectal, and lung cancer.

Study Design and Setting

The algorithm training and validation sets were based on data from 2000 and 2001, respectively. The validation was carried out at an individual level by linkage of cases identified by the algorithm with cases in the Piedmont Cancer Registry diagnosed in 2001.

Results

The sensitivity of the algorithm was higher for lung cancer (80.8%) than for breast (76.7%) and colorectal (72.4%) cancers. The positive predictive values were 78.7%, 87.9%, and 92.6% for lung, colorectal, and breast cancer, respectively. The high values for colorectal and breast cancers were due to the model's ability to distinguish prevalent from incident cases and to the accuracy of surgery claims for case identification.

Conclusions

Given its moderate sensitivity, this algorithm is not intended to replace cancer registration, but it is a valuable tool to investigate other aspects of cancer surveillance. This method provides a valid study base for timely monitoring cancer practice and related outcomes, geographic and temporal variations, and costs.

Introduction

Hospital claims were designed primarily for administrative purposes, but they can be exploited as a rich, inexpensive source of current epidemiological information on large, heterogeneous patient populations. The adequacy of administrative data for answering questions of medical effectiveness is variable. They can easily be used to identify diagnoses, treatment, and outcome at a general level, whereas it might be more difficult to understand the mediating factors and decision-making variables that result in sending a patient on a specific therapeutic path. A major limitation of claims data is the questionable accuracy of diagnostic information.

Use of these data to identify incident cancer cases represents a major challenge, especially in geographic areas not covered by cancer registries. Identification of incident cases through claims data is based on the detection of cancer diagnosis codes or cancer-specific diagnosis and procedure codes. The standards used to evaluate identification performance are cancer registries and medical records. Evaluation is based on a comparison of incidence rates or is carried out at the individual level, by cross-linkage of claims and standard databases [1]. Few studies, except those based on Medicare data, have validated hospital-based medical data [2], [3].

The Italian National Health Service, in contrast to the fragmented medical insurance system in the USA, provides universal coverage with standard care to all Italian citizens, and no social or economic selection bias limits access to available high technology resources. Essential health services are provided free of charge or at a minimal charge, and therefore, inequality in health care accessibility due to insurance status does not apply. The system is primarily government funded, with a large majority of public hospitals (approximately 84 percent of hospital beds). Also private hospitals receive most of their funds through the National Health Service. Beginning in 1995, a series of health care reforms were implemented [4]. A major component of the reforms included changes in hospital financing, moving to a perspective payment system based on Diagnosis Related Group (DRG). Explicit incentives for the increased use of outpatient care to reduce a perceived overuse of acute inpatient care were introduced.

The hospital discharge abstracts database, consisting of routinely collected, standardized medical and administrative information, is a valuable resource for planning and research, reflecting hospital utilization and outcome experience of an entire geographic population.

Piedmont, a northwestern Italian region, has a population of about four million and covers an area of over 25,000 km2. The population-based Piedmont Cancer Registry is the most comprehensive system for tracking cancer incidence and patient survival in Piedmont. The Registry covers only one-fourth of the Piedmont population, and it is not designed for analytical cancer epidemiology studies or assessments of health care quality that require information on exposures, comorbid conditions, treatment, and health care use.

The aim of this study was to develop and validate an algorithm for detecting newly diagnosed cases of breast cancer (in women) and of colorectal and lung cancer (in men and women) from hospital discharge abstracts for Piedmont. The validity, sensitivity, and positive predictive value (PPV) of the method were measured by individual-level comparisons with data in the Piedmont Cancer Registry (gold standard).

The second section of this paper describes the criteria in the algorithm for identification of cancers at the three sites and the data sources used. The third section illustrates the validation, in terms of sensitivity, PPV, and description of incorrect identifications. The last section comprises some concluding remarks.

Section snippets

Discharge abstracts data

In Italy, each inpatient case is registered in a national standard nosological form (hospital discharge form set up by a Ministry of Health decree) summarizing all the clinical information. This form permits the reimbursement of hospital activities on the basis of the DRG system.

Piedmont hospital discharge abstracts from all regional hospitals, covering both hospital stays and day care, have been collected routinely since 1995 and include with reasonable levels of accuracy demographic data on

Results

The record-linkage process involving hospital discharge abstracts and PCRT data was mainly automatic. Actually, only about 10% of the matched cases for each cancer site needed a manual check, and only a few were excluded accordingly.

When applied to the training cohort, the algorithm had moderate sensitivity and a good PPV (Table 2). If an error in the date of diagnosis of at least 1 month (i.e., PCRT cases diagnosed in December 1999 appearing in the 2000 hospital discharge abstracts) is

Discussion

In this study, we analyzed the adequacy of algorithms based on Piedmont hospital discharge abstracts for ascertaining incident cases of three common cancers, with a preference for case identification with a low proportion of false positives.

Because it is of particular importance to recognize prevalent cases, especially for cancers with good prognoses, we assumed that patients with prior hospitalizations with the same cancer diagnosis had recurrent disease.

The validation confirmed that the

Acknowledgments

This study was partially supported by the Compagnia di San Paolo and the M.I.U.R./PRIN grant 2005068001. The authors thank Benedetto Terracini for useful comments.

References (16)

There are more references available in the full text version of this article.

Cited by (66)

  • Diagnostic accuracy of administrative database for bile duct cancer by ICD-10 code in a tertiary institute in Korea

    2020, Hepatobiliary and Pancreatic Diseases International
    Citation Excerpt :

    These studies suggested that evaluation of the administrative database is essential for large-cohort studies and that ICD codes alone are insufficient to identify patients [22–25]. Other information should be added to the principal ICD codes to improve PPV for the identification of cancer patients [28]. In recent years, several population based cancer studies using the NHIS database have been reported [4,5,9,10].

  • Using health insurance reimbursement data to identify incident cancer cases

    2019, Journal of Clinical Epidemiology
    Citation Excerpt :

    In addition, most registry systems have a time delay, which may have limited its application in long-term prospective studies [3,6]. Claims data from a health insurance system have proved to be an ideal substitute for cancer registry in identifying incident cancer cases in developed countries [7–19]. However, few studies have explored the value of health insurance reimbursement data in less developed areas, where availability of PBCRs was limited, leaving the active door-to-door interview as the only choice.

  • Colorectal cancer screening: The surgery rates they are a-changing. A nationwide study on surgical resections in Italy

    2019, Digestive and Liver Disease
    Citation Excerpt :

    This study was based on data obtained from the National Italian Hospital Discharge Database. The accuracy of this database has been demonstrated to be satisfactory for surveillance of geographical and temporal variation of colorectal cancer by validation studies involving regions of Northern, Central, and Southern Italy [37,38]. However, there was no information on cancer stage.

View all citing articles on Scopus
View full text