Skip to main content
Erschienen in: Journal of Digital Imaging 4/2009

01.08.2009

Development of a Google-Based Search Engine for Data Mining Radiology Reports

verfasst von: Joseph P. Erinjeri, Daniel Picus, Fred W. Prior, David A. Rubin, Paul Koppel

Erschienen in: Journal of Imaging Informatics in Medicine | Ausgabe 4/2009

Einloggen, um Zugang zu erhalten

Abstract

The aim of this study is to develop a secure, Google-based data-mining tool for radiology reports using free and open source technologies and to explore its use within an academic radiology department. A Health Insurance Portability and Accountability Act (HIPAA)-compliant data repository, search engine and user interface were created to facilitate treatment, operations, and reviews preparatory to research. The Institutional Review Board waived review of the project, and informed consent was not required. Comprising 7.9 GB of disk space, 2.9 million text reports were downloaded from our radiology information system to a fileserver. Extensible markup language (XML) representations of the reports were indexed using Google Desktop Enterprise search engine software. A hypertext markup language (HTML) form allowed users to submit queries to Google Desktop, and Google’s XML response was interpreted by a practical extraction and report language (PERL) script, presenting ranked results in a web browser window. The query, reason for search, results, and documents visited were logged to maintain HIPAA compliance. Indexing averaged approximately 25,000 reports per hour. Keyword search of a common term like “pneumothorax” yielded the first ten most relevant results of 705,550 total results in 1.36 s. Keyword search of a rare term like “hemangioendothelioma” yielded the first ten most relevant results of 167 total results in 0.23 s; retrieval of all 167 results took 0.26 s. Data mining tools for radiology reports will improve the productivity of academic radiologists in clinical, educational, research, and administrative tasks. By leveraging existing knowledge of Google’s interface, radiologists can quickly perform useful searches.
Literatur
2.
Zurück zum Zitat Thrall JH: Reinventing radiology in the digital age: part I. The all-digital department. Radiology 236:382–385, 2005PubMedCrossRef Thrall JH: Reinventing radiology in the digital age: part I. The all-digital department. Radiology 236:382–385, 2005PubMedCrossRef
3.
Zurück zum Zitat Hynes DM, Stevenson G, Nahmias C: Towards filmless and distance radiology. Lancet 350:657–660, 1997PubMedCrossRef Hynes DM, Stevenson G, Nahmias C: Towards filmless and distance radiology. Lancet 350:657–660, 1997PubMedCrossRef
4.
Zurück zum Zitat Tamm EP, Kawashima A, Silverman P: An academic radiology information system (RIS): a review of the commercial RIS systems, and how an individualized academic RIS can be created and utilized. J Digit Imaging 14:131–134, 2001PubMedCrossRef Tamm EP, Kawashima A, Silverman P: An academic radiology information system (RIS): a review of the commercial RIS systems, and how an individualized academic RIS can be created and utilized. J Digit Imaging 14:131–134, 2001PubMedCrossRef
5.
Zurück zum Zitat Thrall JH: Reinventing radiology in the digital age. Part II. New directions and new stakeholder value. Radiology 237:15–18, 2005PubMedCrossRef Thrall JH: Reinventing radiology in the digital age. Part II. New directions and new stakeholder value. Radiology 237:15–18, 2005PubMedCrossRef
6.
Zurück zum Zitat Meghea CI, Sunshine JH: Who’s overworked and who’s underworked among radiologists? An update on the radiologist shortage. Radiology 236:932–938, 2005PubMedCrossRef Meghea CI, Sunshine JH: Who’s overworked and who’s underworked among radiologists? An update on the radiologist shortage. Radiology 236:932–938, 2005PubMedCrossRef
7.
Zurück zum Zitat Steinbrook R: Searching for the right search—reaching the medical literature. N Engl J Med 354:4–7, 2006PubMedCrossRef Steinbrook R: Searching for the right search—reaching the medical literature. N Engl J Med 354:4–7, 2006PubMedCrossRef
8.
Zurück zum Zitat Birney E, Bateman A, Clamp ME, Hubbard TJ: Mining the draft human genome. Nature 409:827–828, 2001PubMedCrossRef Birney E, Bateman A, Clamp ME, Hubbard TJ: Mining the draft human genome. Nature 409:827–828, 2001PubMedCrossRef
10.
Zurück zum Zitat O’Connor JB, Johanson JF: Use of the Web for medical information by a gastroenterology clinic population. JAMA 284:1962–1964, 2000PubMedCrossRef O’Connor JB, Johanson JF: Use of the Web for medical information by a gastroenterology clinic population. JAMA 284:1962–1964, 2000PubMedCrossRef
11.
12.
Zurück zum Zitat Hand DJ, Mannila P, Smyth P: Principle of Data Mining, Cambridge, MA: MIT, 2001 Hand DJ, Mannila P, Smyth P: Principle of Data Mining, Cambridge, MA: MIT, 2001
13.
Zurück zum Zitat Mullins IM, Siadaty MS, Lyman J, et al: Data mining and clinical data repositories: insights from a 667,000 patient data set. Comput Biol Med 36:1351–1377, 2006PubMedCrossRef Mullins IM, Siadaty MS, Lyman J, et al: Data mining and clinical data repositories: insights from a 667,000 patient data set. Comput Biol Med 36:1351–1377, 2006PubMedCrossRef
14.
Zurück zum Zitat Nigrin DJ, Kohane IS: Data mining by clinicians. Proc AMIA Symp 1998:957–961, 1998 Nigrin DJ, Kohane IS: Data mining by clinicians. Proc AMIA Symp 1998:957–961, 1998
15.
Zurück zum Zitat Prather JC, Lobach DF, Goodwin LK, Hales JW, Hage ML, Hammond WE: Medical data mining: knowledge discovery in a clinical data warehouse. Proc AMIA Annu Fall Symp 1997:101–105, 1997 Prather JC, Lobach DF, Goodwin LK, Hales JW, Hage ML, Hammond WE: Medical data mining: knowledge discovery in a clinical data warehouse. Proc AMIA Annu Fall Symp 1997:101–105, 1997
16.
Zurück zum Zitat Ananiadou S, Kell DB, Tsujii JI: Text mining and its potential applications in systems biology. Trends Biotechnol 24:571–579, 2006PubMedCrossRef Ananiadou S, Kell DB, Tsujii JI: Text mining and its potential applications in systems biology. Trends Biotechnol 24:571–579, 2006PubMedCrossRef
17.
Zurück zum Zitat Cohen AM, Hersh WR: A survey of current work in biomedical text mining. Brief Bioinform 6:57–71, 2005PubMedCrossRef Cohen AM, Hersh WR: A survey of current work in biomedical text mining. Brief Bioinform 6:57–71, 2005PubMedCrossRef
18.
Zurück zum Zitat Heinze DT, Morsch ML, Holbrook J: Mining free-text medical records. Proc AMIA Symp 2001:254–258, 2001 Heinze DT, Morsch ML, Holbrook J: Mining free-text medical records. Proc AMIA Symp 2001:254–258, 2001
19.
20.
Zurück zum Zitat Bekhuis T: Conceptual biology, hypothesis discovery, and text mining: Swanson’s legacy. Biomed Digit Libr 3:2, 2006PubMedCrossRef Bekhuis T: Conceptual biology, hypothesis discovery, and text mining: Swanson’s legacy. Biomed Digit Libr 3:2, 2006PubMedCrossRef
21.
Zurück zum Zitat Scherf M, Epple A, Werner T: The next generation of literature analysis: integration of genomic analysis into text mining. Brief Bioinform 6:287–297, 2005PubMedCrossRef Scherf M, Epple A, Werner T: The next generation of literature analysis: integration of genomic analysis into text mining. Brief Bioinform 6:287–297, 2005PubMedCrossRef
22.
Zurück zum Zitat Schonbach C, Nagashima T, Konagaya A: Textmining in support of knowledge discovery for vaccine development. Methods 34:488–495, 2004PubMedCrossRef Schonbach C, Nagashima T, Konagaya A: Textmining in support of knowledge discovery for vaccine development. Methods 34:488–495, 2004PubMedCrossRef
23.
Zurück zum Zitat Sokol L, Garcia B, Rodriguez J, West M, Johnson K: Using data mining to find fraud in HCFA health care claims. Top Health Inf Manage 22:1–13, 2001PubMed Sokol L, Garcia B, Rodriguez J, West M, Johnson K: Using data mining to find fraud in HCFA health care claims. Top Health Inf Manage 22:1–13, 2001PubMed
24.
Zurück zum Zitat Definitions: research. Title 45 Code of Federal Regulation, Pt. 46.102(d), 2000 Definitions: research. Title 45 Code of Federal Regulation, Pt. 46.102(d), 2000
25.
Zurück zum Zitat Use and Disclosure for Treatment, Payment and Health Care Operations. Title 45 Code of Federal Regulation, Pt. 164.506, 2000 Use and Disclosure for Treatment, Payment and Health Care Operations. Title 45 Code of Federal Regulation, Pt. 164.506, 2000
26.
Zurück zum Zitat Definition: health care operations. Title 45 Code of Federal Regulation, Pt. 164.501(2), 2000 Definition: health care operations. Title 45 Code of Federal Regulation, Pt. 164.501(2), 2000
27.
Zurück zum Zitat IRB review of research. Title 45 Code of Federal Regulation, Pt. 46.109, 2000 IRB review of research. Title 45 Code of Federal Regulation, Pt. 46.109, 2000
28.
Zurück zum Zitat Reviews Preparatory to Research. Title 45 Code of Federal Regulation, Pt. 164.512(h)(i)(1)(ii), 2000 Reviews Preparatory to Research. Title 45 Code of Federal Regulation, Pt. 164.512(h)(i)(1)(ii), 2000
29.
Zurück zum Zitat De-identification of protected health information. Title 45 Code of Federal Regulation, Pt. 164.514(a), 2000 De-identification of protected health information. Title 45 Code of Federal Regulation, Pt. 164.514(a), 2000
30.
Zurück zum Zitat Magos A, Gambadauro P: Desktop search engines: a modern way to hand search in full text. Lancet 366:203–204, 2005PubMedCrossRef Magos A, Gambadauro P: Desktop search engines: a modern way to hand search in full text. Lancet 366:203–204, 2005PubMedCrossRef
31.
Zurück zum Zitat Smith AC: Effect of XML markup on retrieval of clinical documents. AMIA Annu Symp Proc 2003:614–618, 2003 Smith AC: Effect of XML markup on retrieval of clinical documents. AMIA Annu Symp Proc 2003:614–618, 2003
32.
Zurück zum Zitat Hulse NC, Rocha RA, Bradshaw R, Del Fiol G, Roemer L: Application of an XML-based document framework to knowledge content authoring and clinical information system development. AMIA Annu Symp Proc 2003:870, 2003 Hulse NC, Rocha RA, Bradshaw R, Del Fiol G, Roemer L: Application of an XML-based document framework to knowledge content authoring and clinical information system development. AMIA Annu Symp Proc 2003:870, 2003
33.
Zurück zum Zitat Hripcsak G, Austin JH, Alderson PO, Friedman C: Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology 224:157–163, 2002PubMedCrossRef Hripcsak G, Austin JH, Alderson PO, Friedman C: Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology 224:157–163, 2002PubMedCrossRef
Metadaten
Titel
Development of a Google-Based Search Engine for Data Mining Radiology Reports
verfasst von
Joseph P. Erinjeri
Daniel Picus
Fred W. Prior
David A. Rubin
Paul Koppel
Publikationsdatum
01.08.2009
Verlag
Springer-Verlag
Erschienen in
Journal of Imaging Informatics in Medicine / Ausgabe 4/2009
Print ISSN: 2948-2925
Elektronische ISSN: 2948-2933
DOI
https://doi.org/10.1007/s10278-008-9110-7

Weitere Artikel der Ausgabe 4/2009

Journal of Digital Imaging 4/2009 Zur Ausgabe

Darf man die Behandlung eines Neonazis ablehnen?

08.05.2024 Gesellschaft Nachrichten

In einer Leseranfrage in der Zeitschrift Journal of the American Academy of Dermatology möchte ein anonymer Dermatologe bzw. eine anonyme Dermatologin wissen, ob er oder sie einen Patienten behandeln muss, der eine rassistische Tätowierung trägt.

Ein Drittel der jungen Ärztinnen und Ärzte erwägt abzuwandern

07.05.2024 Klinik aktuell Nachrichten

Extreme Arbeitsverdichtung und kaum Supervision: Dr. Andrea Martini, Sprecherin des Bündnisses Junge Ärztinnen und Ärzte (BJÄ) über den Frust des ärztlichen Nachwuchses und die Vorteile des Rucksack-Modells.

Endlich: Zi zeigt, mit welchen PVS Praxen zufrieden sind

IT für Ärzte Nachrichten

Darauf haben viele Praxen gewartet: Das Zi hat eine Liste von Praxisverwaltungssystemen veröffentlicht, die von Nutzern positiv bewertet werden. Eine gute Grundlage für wechselwillige Ärztinnen und Psychotherapeuten.

Akuter Schwindel: Wann lohnt sich eine MRT?

28.04.2024 Schwindel Nachrichten

Akuter Schwindel stellt oft eine diagnostische Herausforderung dar. Wie nützlich dabei eine MRT ist, hat eine Studie aus Finnland untersucht. Immerhin einer von sechs Patienten wurde mit akutem ischämischem Schlaganfall diagnostiziert.

Update Radiologie

Bestellen Sie unseren Fach-Newsletter und bleiben Sie gut informiert.