Skip to main content
Erschienen in: World Journal of Urology 1/2014

01.02.2014 | Original Article

Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results

verfasst von: Anil A. Thomas, Chengyi Zheng, Howard Jung, Allen Chang, Brian Kim, Joy Gelfond, Jeff Slezak, Kim Porter, Steven J. Jacobsen, Gary W. Chien

Erschienen in: World Journal of Urology | Ausgabe 1/2014

Einloggen, um Zugang zu erhalten

Abstract

Objective

The extraction of specific data from electronic medical records (EMR) remains tedious and is often performed manually. Natural language processing (NLP) programs have been developed to identify and extract information within clinical narrative text. We performed a study to assess the validity of an NLP program to accurately identify patients with prostate cancer and to retrieve pertinent pathologic information from their EMR.

Materials and methods

A retrospective review was performed of a prospectively collected database including patients from the Southern California Kaiser Permanente Medical Region that underwent prostate biopsies during a 2-week period. A NLP program was used to identify patients with prostate biopsies that were positive for prostatic adenocarcinoma from all pathology reports within this period. The application then processed 100 consecutive patients with prostate adenocarcinoma to extract 10 variables from their pathology reports. The extraction and retrieval of information by NLP was then compared to a blinded manual review.

Results

A consecutive series of 18,453 pathology reports were evaluated. NLP correctly detected 117 out of 118 patients (99.1 %) with prostatic adenocarcinoma after TRUS-guided prostate biopsy. NLP had a positive predictive value of 99.1 % with a 99.1 % sensitivity and a 99.9 % specificity to correctly identify patients with prostatic adenocarcinoma after biopsy. The overall ability of the NLP application to accurately extract variables from the pathology reports was 97.6 %.

Conclusions

Natural language processing is a reliable and accurate method to identify select patients and to extract relevant data from an existing EMR in order to establish a prospective clinical database.
Literatur
1.
Zurück zum Zitat Jha AK et al (2009) Use of electronic health records in US hospitals. N Engl J Med 360(16):1628–1638PubMedCrossRef Jha AK et al (2009) Use of electronic health records in US hospitals. N Engl J Med 360(16):1628–1638PubMedCrossRef
3.
Zurück zum Zitat Erickstad L et al (2011) Use of electronic medical records to identify patients at risk for prostate cancer in an academic institution. Prostate Cancer Prostatic Dis 14(1):85–89PubMedCrossRef Erickstad L et al (2011) Use of electronic medical records to identify patients at risk for prostate cancer in an academic institution. Prostate Cancer Prostatic Dis 14(1):85–89PubMedCrossRef
4.
Zurück zum Zitat Lau EC et al (2011) Use of electronic medical records (EMR) for oncology outcomes research: assessing the comparability of EMR information to patient registry and health claims data. Clin Epidemiol 3:259–272PubMedCentralPubMed Lau EC et al (2011) Use of electronic medical records (EMR) for oncology outcomes research: assessing the comparability of EMR information to patient registry and health claims data. Clin Epidemiol 3:259–272PubMedCentralPubMed
5.
Zurück zum Zitat Meystre SM et al. (2008) Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 1:128–44 Meystre SM et al. (2008) Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 1:128–44
6.
Zurück zum Zitat Spyns P (1996) Natural language processing in medicine: an overview. Methods Inf Med 35(4–5):285–301PubMed Spyns P (1996) Natural language processing in medicine: an overview. Methods Inf Med 35(4–5):285–301PubMed
7.
Zurück zum Zitat Murff HJ et al (2011) Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA 306(8):848–855PubMedCrossRef Murff HJ et al (2011) Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA 306(8):848–855PubMedCrossRef
8.
Zurück zum Zitat Litwin MS, Saigal CS, Beerbohm EM (2005) The burden of urologic diseases in America. J Urol 173(4):1065–1066PubMedCrossRef Litwin MS, Saigal CS, Beerbohm EM (2005) The burden of urologic diseases in America. J Urol 173(4):1065–1066PubMedCrossRef
9.
Zurück zum Zitat Semins MJ, Trock BJ, Matlaga BR (2010) Validity of administrative coding in identifying patients with upper urinary tract calculi. J Urol 184(1):190–192PubMedCrossRef Semins MJ, Trock BJ, Matlaga BR (2010) Validity of administrative coding in identifying patients with upper urinary tract calculi. J Urol 184(1):190–192PubMedCrossRef
10.
Zurück zum Zitat Kaafarani HM, Rosen AK (2009) Using administrative data to identify surgical adverse events: an introduction to the patient safety indicators. Am J Surg 198(5 Suppl):S63–S68PubMedCrossRef Kaafarani HM, Rosen AK (2009) Using administrative data to identify surgical adverse events: an introduction to the patient safety indicators. Am J Surg 198(5 Suppl):S63–S68PubMedCrossRef
11.
Zurück zum Zitat White RH et al (2009) How valid is the ICD-9-CM based AHRQ patient safety indicator for postoperative venous thromboembolism? Med Care 47(12):1237–1243PubMedCrossRef White RH et al (2009) How valid is the ICD-9-CM based AHRQ patient safety indicator for postoperative venous thromboembolism? Med Care 47(12):1237–1243PubMedCrossRef
12.
Zurück zum Zitat Ganeswaran D et al. (2012) Population-based linkage of health records to detect urological complications and hospitalisation following transrectal ultrasound-guided biopsies in men suspected of prostate cancer. World J Urol. doi:10.1007/s00345-012-0893-2 Ganeswaran D et al. (2012) Population-based linkage of health records to detect urological complications and hospitalisation following transrectal ultrasound-guided biopsies in men suspected of prostate cancer. World J Urol. doi:10.​1007/​s00345-012-0893-2
13.
Zurück zum Zitat Khwaja HA, Syed H, Cranston DW (2002) Coding errors: a comparative analysis of hospital and prospectively collected departmental data. BJU Int 89(3):178–180PubMedCrossRef Khwaja HA, Syed H, Cranston DW (2002) Coding errors: a comparative analysis of hospital and prospectively collected departmental data. BJU Int 89(3):178–180PubMedCrossRef
14.
Zurück zum Zitat Currie AM et al. (2006) Automated extraction of free-text from pathology reports. AMIA Annu Symp Proc 2006:899 Currie AM et al. (2006) Automated extraction of free-text from pathology reports. AMIA Annu Symp Proc 2006:899
Metadaten
Titel
Extracting data from electronic medical records: validation of a natural language processing program to assess prostate biopsy results
verfasst von
Anil A. Thomas
Chengyi Zheng
Howard Jung
Allen Chang
Brian Kim
Joy Gelfond
Jeff Slezak
Kim Porter
Steven J. Jacobsen
Gary W. Chien
Publikationsdatum
01.02.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
World Journal of Urology / Ausgabe 1/2014
Print ISSN: 0724-4983
Elektronische ISSN: 1433-8726
DOI
https://doi.org/10.1007/s00345-013-1040-4

Weitere Artikel der Ausgabe 1/2014

World Journal of Urology 1/2014 Zur Ausgabe

Update Urologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.