Background
Cancers associated with pregnancy are reported to be increasing, and this has been attributed to increasing maternal age and women’s interaction with health services during pregnancy [
1]. Women with cancer diagnosed during pregnancy or within 12 months of delivery (referred to as ‘pregnancy-associated cancer’) are at increased risk of maternal morbidities and adverse pregnancy outcomes [
1,
2]. Current information on incidence of cancer associated with pregnancy is central to evaluation and improvement of clinical care for women.
Population-based statutory cancer registries are a reliable source for identifying incident cancers in populations [
3‐
5]. However, the extensive quality assurance processes implemented by cancer registries can delay the timely availability of cancer data [
6]. Consequently, there is increasing interest in the use of routinely collected and easily accessible administrative data, such as hospital discharge data, for identifying incident cancers and assessing health service utilisation and quality of care for cancer patients [
3‐
5,
7]. One disadvantage of using databases of hospitalisation records is that individuals with multiple hospitalisations cannot always be identified. For example, several obstetric research studies in the United States (US) to date have relied solely on hospitalisation records, which means 100 records may come from 100 individuals or 10 individuals each admitted 10 times [
8‐
10]. Further complications arise in identifying pregnancy-associated cancers from hospital data due to the inability to determine the duration of pregnancy in weeks of gestation, resulting in an imprecise pregnancy exposure period [
8].
The quality of estimates of hospital-ascertained cancer incidence relies on complete and accurate ascertainment of recorded diagnoses. However, the validity of hospital data for identifying incident cancers associated with pregnancy has yet to be established. Several US studies have validated Medicaid or Medicare claims data (inpatient, outpatient and physician claims) for identification of incident cancers with cancer registries as the “gold standard”, with sensitivities ranging from 68% to 97% and positive predictive values ranging from 83% to 96% [
11‐
16]. There have also been limited efforts to link private insurer claims to cancer registries, with a focus on cancer treatment information [
17]. The generalisability of such findings to specific populations (e.g., obstetric populations) remains questionable, given that the majority of published studies are among elderly populations. Therefore, the aim of this study was to determine the validity of hospital diagnoses for identification of incident pregnancy-associated cancers, both overall and by cancer type, compared with incident diagnoses from a population-based statutory cancer registry.
Discussion
This record linkage-based validation study determined the validity of ICD-10-AM hospital diagnoses in estimating the incidence of pregnancy-associated cancer. We have demonstrated that using hospitalisations as the unit of analysis, rather than the individual, substantially over-estimated the incidence of pregnancy-associated cancer. In contrast, among individual-level hospital data the overall incidence was under-estimated. Specifically, the incidence of melanoma and breast cancer was under-estimated by approximately half. The overall low sensitivity was due to the predominance of melanoma, and that melanoma was the major contributor to the false negatives (frequently missed in the hospital data). Other common cancers (breast, thyroid, gynaecological and lymphohaematopoeitic cancers) achieved only moderate levels of validity (ascertainment 72.1–78.6% and PPV 56.4–87.3%).
There is no literature based on obstetric populations available for direct comparison of our reporting characteristics. Published studies have predominantly assessed the use of Medicare claims data compared with medical records or cancer registries for ascertainment of selected cancers (mainly breast cancer) with varying sensitivities and positive predictive values, depending on the definitions used, the study timeframe and identification algorithms (first diagnostic code only, all diagnostic codes or a combination of diagnostic or surgical procedure codes) [
11‐
16]. Studies using ICD-9 coded hospital discharge data to identify incident breast, colorectal and lung cancers with cancer registries as the “gold standard” achieved better reporting characteristics than in our obstetric population [
3,
16,
25]. For example, for breast cancer the sensitivity was 77%–85% PPV 57%–91%, colorectal cancer: sensitivity 72% PPV 60%–88% and lung cancer: sensitivity 81% PPV 59%–79%. A study comparing ICD-10 coded diagnoses in hospital discharge data with medical records as the “gold standard” reported a much greater validity (breast cancer: sensitivity 96% PPV 94%, colon cancer: sensitivity 93% PPV 96% and lung cancer: sensitivity 97% PPV 94%) than we report using ICD-10 for an obstetric population [
26]. This is likely due to an older, non-specific hospitalised study population, which in general has a different pattern of hospital activity from an obstetric population. In NSW between 2001 and 2008, about 16% women with singleton pregnancy and 64% women with multifetal pregnancies were admitted to hospital at least once during 20–36 gestation weeks [
27,
28].
The evaluation of false-positive and false-negative pregnancy-associated cancers was insightful in considering the poorer identification of cancer from the hospital records of pregnant women. In accordance with literature, the common reasons for false positives were prevalent cancers or misclassification of cancer type [
4]. We were unable to evaluate the underlying causes of false-negative cancers as information on the source of cancer diagnosis was not available. Others have speculated that the false-negative (missed) cancers may be due to cancers notified from death certificate or other non hospital-based institutions [
4]. In our obstetric population the high proportion of in-situ/localised cancers suggests outpatient cancer management may be an important reason for false negatives. Furthermore, cancer may not be the reason for hospitalisation of a pregnant woman and diagnoses that are not relevant to the current admission are not required to be coded [
29].
The strength of our study is the use of ICD-10 for coding hospital diagnoses. ICD-10 codes, based on surgical speciality, allow more detailed diagnoses coding than ICD-9 codes [
30]. Importantly, there is a complete registration of cancers in the NSW statutory data collection, which provided the gold standard of identification of incident pregnancy-associated cancers without the need of an independent validation source. The linkage to cancer registry dating back to 1994 provided a unique opportunity to assess extensive history of cancer for identification of prevalent cancers and to mitigate their impact on the positive predictive values [
4,
5]. Unlike other studies which focused on specific cancers, our comparison of reporting characteristics was done overall and by cancer type, and accounted for women with multiple primary cancers in a pregnancy. Australia has the highest incidence rate of melanoma in the world [
19], and the results excluding melanoma (with a higher overall reporting sensitivity) may be more generalisable to other countries. Several limitations of our study also deserve consideration. The number of pregnancy-associated cancers was somewhat under-estimated as early pregnancy loss (miscarriage or abortion) was not registered in the birth data. The hospital data represent only inpatient stays; it may be less complete in capturing cancers where outpatient diagnosis and treatment are more common, e.g., melanoma. However, the data also provide a general representation of inpatient stays regardless of age or insurance status [
16], and they are more assessable than the primary health utilisation data. Finally, as the reporting characteristics are sensitive to the sample size, careful interpretation is needed for cancers with small sample sizes.
Conclusions
The timely availability of population level data is a key factor in surveillance [
31]. In 2012, NSW cancer registry data were available for linkage to the end of 2008, while hospital data were available through June 2011 [
20]. Unfortunately, we do not consider the validity of pregnancy-associated cancers as determined by our hospital data of sufficient to be relied on for contemporary cancer incidence estimates.
Our study shows that the use of hospital data for identifying incident pregnancy-associated cancers achieved only moderate levels of validity. Although hospital data may provide another source of cancer identification for a cancer registry there will still need to be rigorous assurance to confirm cancers in order to obtain valid estimates of incidence.
Competing interests
The authors declare no conflicts of interest.
Authors’ contributions
YYL took responsibility for the integrity of data, the design of study, the accuracy of data analysis and drafting of the manuscript. CLR participated in conception and study design, analysis and interpretation of data, and drafting of the manuscript. TD and JY were involved in study design, acquisition of data and critical revision of the manuscript for important intellectual content. All authors read and approved the final manuscript.