Skip to main content

01.12.2019 | Research article | Ausgabe 1/2019 Open Access

BMC Medicine 1/2019

‘Caveat emptor’: the cautionary tale of endocarditis and the potential pitfalls of clinical coding data—an electronic health records study

BMC Medicine > Ausgabe 1/2019
Nicola Fawcett, Bernadette Young, Leon Peto, T. Phuong Quan, Richard Gillott, Jianhua Wu, Chris Middlemass, Sheila Weston, Derrick W. Crook, Tim E. A. Peto, Berit Muller-Pebody, Alan P. Johnson, A. Sarah Walker, Jonathan A. T. Sandoe
Wichtige Hinweise

Electronic supplementary material

The online version of this article (https://​doi.​org/​10.​1186/​s12916-019-1390-x) contains supplementary material, which is available to authorized users.
A. Sarah Walker and Jonathan A. T. Sandoe contributed equally to this work.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Diagnostic codes from electronic health records are widely used to assess patterns of disease. Infective endocarditis is an uncommon but serious infection, with objective diagnostic criteria. Electronic health records have been used to explore the impact of changing guidance on antibiotic prophylaxis for dental procedures on incidence, but limited data on the accuracy of the diagnostic codes exists. Endocarditis was used as a clinically relevant case study to investigate the relationship between clinical cases and diagnostic codes, to understand discrepancies and to improve design of future studies.


Electronic health record data from two UK tertiary care centres were linked with data from a prospectively collected clinical endocarditis service database (Leeds Teaching Hospital) or retrospective clinical audit and microbiology laboratory blood culture results (Oxford University Hospitals Trust). The relationship between diagnostic codes for endocarditis and confirmed clinical cases according to the objective Duke criteria was assessed, and impact on estimations of disease incidence and trends.


In Leeds 2006–2016, 738/1681(44%) admissions containing any endocarditis code represented a definite/possible case, whilst 263/1001(24%) definite/possible endocarditis cases had no endocarditis code assigned. In Oxford 2010–2016, 307/552(56%) reviewed endocarditis-coded admissions represented a clinical case. Diagnostic codes used by most endocarditis studies had good positive predictive value (PPV) but low sensitivity (e.g. I33-primary 82% and 43% respectively); one (I38-secondary) had PPV under 6%. Estimating endocarditis incidence using raw admission data overestimated incidence trends twofold. Removing records with non-specific codes, very short stays and readmissions improved predictive ability. Estimating incidence of streptococcal endocarditis using secondary codes also overestimated increases in incidence over time. Reasons for discrepancies included changes in coding behaviour over time, and coding guidance allowing assignment of a code mentioning ‘endocarditis’ where endocarditis was never mentioned in the clinical notes.


Commonly used diagnostic codes in studies of endocarditis had good predictive ability. Other apparently plausible codes were poorly predictive. Use of diagnostic codes without examining sensitivity and predictive ability can give inaccurate estimations of incidence and trends. Similar considerations may apply to other diseases. Health record studies require validation of diagnostic codes and careful data curation to minimise risk of serious errors.
Additional file 1: Extended Methods. Table S1. Summary of studies of endocarditis incidence or features using electronic health record data or microbiological data, source of information, codes used, methods of deduplication and comparisons of codes and cases. Table S2. Summary of endocarditis codes used in the above studies. Table S3. Secondary/supplementary organism codes used and reviewed. Figure S1. Clinical reviews in the Leeds Endocarditis Service database, Duke status and diagnostic codes. Figure S2. Admissions with an endocarditis diagnosis code and selection for review: Oxford. Figure S3. Review of electronic prescription data in Oxford 2016 with matching to coded data. Table S4. Reviews and Duke status in the Leeds Service Database and matching to endocarditis-coded admissions. Table S5. Agreement between different combinations of endocarditis-coded admissions and confirmed clinical cases in Oxford. Figure S4. Sensitivity/specificity and positive predictive values for different algorithms to identify Duke definite/ possible endocarditis cases from diagnostic codes in Leeds and Oxford. Figure S5. Estimated endocarditis incidence in Oxford based on diagnostic coding and administrative information. Figure S6. Estimated endocarditis cases and causative organism from diagnostic codes compared to clinician cases, Leeds. Figure S7. Estimated endocarditis cases and causative organism from diagnostic codes and microbiological cultures, Oxford. Table S6. Coded organism vs clinician-recorded organism in Leeds Duke definite/possible cases. Table S7. Coded organism vs microbiology blood culture organism in all admissions with a non-I38 endocarditis code in Oxford. Figure S8. Coding depth and use of secondary/supplementary organism codes in Leeds and Oxford. (DOCX 2110 kb)
Über diesen Artikel

Weitere Artikel der Ausgabe 1/2019

BMC Medicine 1/2019 Zur Ausgabe

Neu im Fachgebiet Allgemeinmedizin

Mail Icon II Newsletter

Bestellen Sie unseren kostenlosen Newsletter Update Allgemeinmedizin und bleiben Sie gut informiert – ganz bequem per eMail.