Skip to main content
Erschienen in: Journal of Medical Systems 5/2011

01.10.2011 | Original Paper

Characterizing Mammography Reports for Health Analytics

verfasst von: Carlos C. Rojas, Robert M. Patton, Barbara G. Beckerman

Erschienen in: Journal of Medical Systems | Ausgabe 5/2011

Einloggen, um Zugang zu erhalten

Abstract

As massive collections of digital health data are becoming available, the opportunities for large-scale automated analysis increase. In particular, the widespread collection of detailed health information is expected to help realize a vision of evidence-based public health and patient-centric health care. Within such a framework for large scale health analytics we describe the transformation of a large data set of mostly unlabeled and free-text mammography data into a searchable and accessible collection, usable for analytics. We also describe several methods to characterize and analyze the data, including their temporal aspects, using information retrieval, supervised learning, and classical statistical techniques. We present experimental results that demonstrate the validity and usefulness of the approach, since the results are consistent with the known features of the data, provide novel insights about it, and can be used in specific applications. Additionally, based on the process of going from raw data to results from analysis, we present the architecture of a generic system for health analytics from clinical notes.
Fußnoten
2
Breast Imaging Reporting and Data System, developed by the American College of Radiology.
 
3
This, of course, does not hold for every document and every human (within a given language) since specialized terminology is not universally accessible. It is, however, a reasonable assumption within a field, e.g., health sciences.
 
Literatur
2.
Zurück zum Zitat North Carolina Medical Journal. Special Issue on Data and Health Policy, 2008. North Carolina Medical Journal. Special Issue on Data and Health Policy, 2008.
3.
Zurück zum Zitat Aronow, D. B., Fangfang, F., and Croft, W. B., Ad hoc classification of radiology reports. J. Am. Med. Inform. Assoc., 6(5):393–411, 1999.CrossRef Aronow, D. B., Fangfang, F., and Croft, W. B., Ad hoc classification of radiology reports. J. Am. Med. Inform. Assoc., 6(5):393–411, 1999.CrossRef
4.
Zurück zum Zitat Bakalar, R., IBM’s vision for the future in patient-centric global health care: IBM’s vision of how advanced health analytics and automated health information infrastructure will transform anatomic pathology services. Arch. Pathol. Lab. Med., 132(5):766–771, 2008. Bakalar, R., IBM’s vision for the future in patient-centric global health care: IBM’s vision of how advanced health analytics and automated health information infrastructure will transform anatomic pathology services. Arch. Pathol. Lab. Med., 132(5):766–771, 2008.
5.
Zurück zum Zitat Berndt, D. J., and Clifford, J., Using dynamic time warping to find patterns in time series. In: KDD Workshop, pp. 359–370, 1994. Berndt, D. J., and Clifford, J., Using dynamic time warping to find patterns in time series. In: KDD Workshop, pp. 359–370, 1994.
6.
Zurück zum Zitat Borg, I., and Groenen, P., Modern Multidimensional Scaling: Theory and Applications. Springer, 1996. Borg, I., and Groenen, P., Modern Multidimensional Scaling: Theory and Applications. Springer, 1996.
7.
Zurück zum Zitat Burnside, B., Strasberg, H., and Rubin, D., Automated indexing of mammography reports using linear least squares fit. In: Proc. of the 14th International Congress and Exhibition on Computer Assisted Radiology and Surgery, pp. 449–454, 2000. Burnside, B., Strasberg, H., and Rubin, D., Automated indexing of mammography reports using linear least squares fit. In: Proc. of the 14th International Congress and Exhibition on Computer Assisted Radiology and Surgery, pp. 449–454, 2000.
8.
Zurück zum Zitat Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., and Buchanan, B. G., Evaluation of negation phrases in narrative clinical reports. In: Proc AMIA Symp, pp. 105–109, 2001. Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., and Buchanan, B. G., Evaluation of negation phrases in narrative clinical reports. In: Proc AMIA Symp, pp. 105–109, 2001.
9.
Zurück zum Zitat Dumais, S., Faceted search. Encyclopedia of Database Systems, pp. 1103–1109, 2009. Dumais, S., Faceted search. Encyclopedia of Database Systems, pp. 1103–1109, 2009.
10.
Zurück zum Zitat Giger, M., Computer-aided diagnosis of breast lesions in medical images. Comput. Sci. Eng. 2(5):39–45, 2000.CrossRef Giger, M., Computer-aided diagnosis of breast lesions in medical images. Comput. Sci. Eng. 2(5):39–45, 2000.CrossRef
11.
Zurück zum Zitat Harkema, H., Setzer, A., Gaizauskas, R., and Hepple, M., Mining and modelling temporal clinical data. In: Proceedings of the UK e-Science All Hands Meeting, 2005. Harkema, H., Setzer, A., Gaizauskas, R., and Hepple, M., Mining and modelling temporal clinical data. In: Proceedings of the UK e-Science All Hands Meeting, 2005.
12.
Zurück zum Zitat Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., and Weiss, D., Syndromic surveillance in public health practice, New York City. Emerg. Infect. Dis. 10(5):858–64, 2004. Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M., and Weiss, D., Syndromic surveillance in public health practice, New York City. Emerg. Infect. Dis. 10(5):858–64, 2004.
13.
14.
Zurück zum Zitat Jain, N. L., and Friedman, C., Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. In: Proc AMIA Annu Fall Symp, pp. 829–833, 1997. Jain, N. L., and Friedman, C., Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports. In: Proc AMIA Annu Fall Symp, pp. 829–833, 1997.
15.
Zurück zum Zitat Jolliffe, I., Principal Component Analysis. Springer, 2002. Jolliffe, I., Principal Component Analysis. Springer, 2002.
16.
Zurück zum Zitat Lohr, S., Tech Companies Push to Digitize Patients’ Records. New York Times, September 10 2009. Lohr, S., Tech Companies Push to Digitize Patients’ Records. New York Times, September 10 2009.
17.
Zurück zum Zitat Ma, F., Bajger, M., and Bottema, M., Temporal analysis of mammograms based on graph matching. Digital Mammography, pp. 158–165, 2010. Ma, F., Bajger, M., and Bottema, M., Temporal analysis of mammograms based on graph matching. Digital Mammography, pp. 158–165, 2010.
19.
Zurück zum Zitat Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., and Hurdle, J. F., Extracting information from textual documents in the electronic health record: A review of recent research. In: Yearb Med Inform, pp. 128–144, 2008. Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., and Hurdle, J. F., Extracting information from textual documents in the electronic health record: A review of recent research. In: Yearb Med Inform, pp. 128–144, 2008.
20.
Zurück zum Zitat Mitchell, T. M., Machine Learning, 1st edn.. New York, NY: McGraw-Hill, Inc, 1997.MATH Mitchell, T. M., Machine Learning, 1st edn.. New York, NY: McGraw-Hill, Inc, 1997.MATH
21.
Zurück zum Zitat Nassif, H., Woodsz, R., Burnsidey, E., Ayvacix, M., Shavlik, J., and Page, D., Information extraction for clinical data mining: A mammography case study. In: ICDM - DDDM09 Workshop, 2009. Nassif, H., Woodsz, R., Burnsidey, E., Ayvacix, M., Shavlik, J., and Page, D., Information extraction for clinical data mining: A mammography case study. In: ICDM - DDDM09 Workshop, 2009.
22.
Zurück zum Zitat Norén, G., Hopstadius, J., Bate, A., Star, K., and Edwards, I., Temporal pattern discovery in longitudinal electronic patient records. Data Mining and Knowledge Discovery 20:1–27, 2010.CrossRefMathSciNet Norén, G., Hopstadius, J., Bate, A., Star, K., and Edwards, I., Temporal pattern discovery in longitudinal electronic patient records. Data Mining and Knowledge Discovery 20:1–27, 2010.CrossRefMathSciNet
23.
Zurück zum Zitat Patton, R. M., Potok, T. E., Beckerman, B. G., and Treadwell, J. N., A genetic algorithm for learning significant phrase patterns in radiology reports. In: GECCO ’09: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference, pp. 2665–2670. New York, NY: ACM, 2009.CrossRef Patton, R. M., Potok, T. E., Beckerman, B. G., and Treadwell, J. N., A genetic algorithm for learning significant phrase patterns in radiology reports. In: GECCO ’09: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference, pp. 2665–2670. New York, NY: ACM, 2009.CrossRef
24.
Zurück zum Zitat Porter, M. F., An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3):130–137, 1980.CrossRef Porter, M. F., An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3):130–137, 1980.CrossRef
25.
Zurück zum Zitat Reed, J. W., Jiao, Y., Potok, T. E., Klump, B. A., Elmore, M. T., and Hurson, A. R., Tf-icf: A new term weighting scheme for clustering dynamic data streams. In: ICMLA ’06: Proceedings of the 5th International Conference on Machine Learning and Applications, pp. 258–263. Washington, DC: IEEE Computer Society, 2006. Reed, J. W., Jiao, Y., Potok, T. E., Klump, B. A., Elmore, M. T., and Hurson, A. R., Tf-icf: A new term weighting scheme for clustering dynamic data streams. In: ICMLA ’06: Proceedings of the 5th International Conference on Machine Learning and Applications, pp. 258–263. Washington, DC: IEEE Computer Society, 2006.
26.
Zurück zum Zitat Roelofs, A., Karssemeijer, N., Wedekind, N., Beck, C., van Woudenberg, S., Snoeren, P., Hendriks, J., Rosselli del Turco, M., Bjurstam, N., Junkermann, H., et al., Importance of comparison of current and prior mammograms in breast cancer screening. Radiology 242(1):70, 2007.CrossRef Roelofs, A., Karssemeijer, N., Wedekind, N., Beck, C., van Woudenberg, S., Snoeren, P., Hendriks, J., Rosselli del Turco, M., Bjurstam, N., Junkermann, H., et al., Importance of comparison of current and prior mammograms in breast cancer screening. Radiology 242(1):70, 2007.CrossRef
27.
Zurück zum Zitat Rokach, L., Romano, R., and Maimon, O., Negation recognition in medical narrative reports. Inf. Retr. 11(6):499–538, 2008.CrossRef Rokach, L., Romano, R., and Maimon, O., Negation recognition in medical narrative reports. Inf. Retr. 11(6):499–538, 2008.CrossRef
28.
Zurück zum Zitat Sakoe, H., and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1):43–49, 1978.CrossRefMATH Sakoe, H., and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 26(1):43–49, 1978.CrossRefMATH
29.
Zurück zum Zitat Salton, G., and Buckley, C., Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5):513–523, 1988.CrossRef Salton, G., and Buckley, C., Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5):513–523, 1988.CrossRef
30.
31.
Zurück zum Zitat Studnicki, J., Fisher, J. W., and Eichelberger, C. N., NC- CATCH: North Carolina comprehensive assessment for tracking community health. [2], pp. 122–126. Studnicki, J., Fisher, J. W., and Eichelberger, C. N., NC- CATCH: North Carolina comprehensive assessment for tracking community health. [2], pp. 122–126.
32.
Zurück zum Zitat Tang, J., Rangayyan, R., Xu, J., El Naqa, I., and Yang, Y., Computer-aided detection and diagnosis of breast cancer with mammography: Recent advances. IEEE Trans. Inf. Technol. Biomed. 13(2):236–251, 2009.CrossRef Tang, J., Rangayyan, R., Xu, J., El Naqa, I., and Yang, Y., Computer-aided detection and diagnosis of breast cancer with mammography: Recent advances. IEEE Trans. Inf. Technol. Biomed. 13(2):236–251, 2009.CrossRef
33.
Zurück zum Zitat Timp, S., Varela, C., and Karssemeijer, N., Temporal change analysis for characterization of mass lesions in mammography. IEEE Trans. Med. Imag. 26(7):945–953, 2007.CrossRef Timp, S., Varela, C., and Karssemeijer, N., Temporal change analysis for characterization of mass lesions in mammography. IEEE Trans. Med. Imag. 26(7):945–953, 2007.CrossRef
34.
Zurück zum Zitat Yi, B.-K., Jagadish, H. V., and Faloutsos, C., Efficient retrieval of similar time sequences under time warping. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 201–208. Washington, DC: IEEE Computer Society, 1998. Yi, B.-K., Jagadish, H. V., and Faloutsos, C., Efficient retrieval of similar time sequences under time warping. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 201–208. Washington, DC: IEEE Computer Society, 1998.
Metadaten
Titel
Characterizing Mammography Reports for Health Analytics
verfasst von
Carlos C. Rojas
Robert M. Patton
Barbara G. Beckerman
Publikationsdatum
01.10.2011
Verlag
Springer US
Erschienen in
Journal of Medical Systems / Ausgabe 5/2011
Print ISSN: 0148-5598
Elektronische ISSN: 1573-689X
DOI
https://doi.org/10.1007/s10916-011-9685-2

Weitere Artikel der Ausgabe 5/2011

Journal of Medical Systems 5/2011 Zur Ausgabe