Skip to main content

Une procédure d’anonymisation à deux niveaux pour créer un corpus de comptes rendus hospitaliers

  • Conference paper
Risques, Technologies de l’Information pour les Pratiques Médicales

Part of the book series: Informatique et Santé ((INFORMATIQUE,volume 17))

Abstract

De-identification is a growing need in medical informatics, and has therefore recently been the subject of renewed interest. De-identification needs to be tuned to the local documents and their specificities, which requires language engineers to work on non-de-identified text. To lower the issues linked to such a situation, we propose a de-identification method which proceeds in two steps. We report experiments on the adaptation of an American de-identifier to French and on the development of a new de-identifier for French patient reports. The latter, evaluated on a set of 23 randomly selected texts, obtains 85 % recall and 91 % precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Références

  1. Pestian JP, Brew C, Matykiewicz P, et al. A shared task involving multi-label classification of clinical free text. In: Proceedings of BioNLP 2007. Association for Computational Linguistics, 2007.

    Google Scholar 

  2. Kim JD, Ohta T, Tateisi Y, and Tsujii J. Genia corpus-a semantically annotated corpus for bio-textmining. Biotrformatics 2003; 19(Suppl. I): 180; 2.

    Article  Google Scholar 

  3. Zweigenbaum P. Natural Language Processing in the medical and biomedical domains: a parallel perspective. In: Rebholz-Schuhmann D, Salakoski T, and Pyysalo S, eds, Proceedings 3rd International Symposium for Semantic Mining in Biomedicine (SMBM 2008), Turku. 2008; pp. 3–4. Keynote speech.

    Google Scholar 

  4. Neamatullah I, Douglass MM, Lehman LWH, et al. Automated de-identification of free-text medical records. SMC Medical Informatics and Decision Making July 2008; 8.

    Google Scholar 

  5. Ruch P, Baud R, Rassinoux A, Bouillon P, and Robert G. Medical document anonymization with a semantic lexicon. Proc AMIA Symp 2000: 729; 33.

    Google Scholar 

  6. Grouin C. Chaîne de traitement pour 10 constitution automattque de corpus: application sur le domaine médical pour le projet corpus CLEF. DESS d’ingénierie multilingue, Institut National des Langues et Civilisations Orientales. 2002.

    Google Scholar 

  7. Zweigenbaum P, Jacquemart P, Grabar N, Habert B. Building a text corpus for representing the variety of medical language. In: Patel VL, Rogers R, and Haux R, eds, Medinfo, 2001; pp. 290; 4.

    Google Scholar 

  8. El Emam K et Kamal-Dankar F. Protecting privacy using k-anonymity. J Am Med Inform Assoc September/October 2008; 15(5):627; 37.

    Article  PubMed  Google Scholar 

  9. Uzuner O, Luo Y, and Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc 2007; 14:550:63.

    Article  PubMed  Google Scholar 

  10. Friedlin JF, and McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc September/October 2008; 15(5): 601;10.

    Article  PubMed  Google Scholar 

  11. Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. J Am Med Inform Assoc 2001; 8(suppl): 17;21.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cyril Grouin .

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag France

About this paper

Cite this paper

Grouin, C., Rosier, A., Dameron, O., Zweigenbaum, P. (2009). Une procédure d’anonymisation à deux niveaux pour créer un corpus de comptes rendus hospitaliers. In: Risques, Technologies de l’Information pour les Pratiques Médicales. Informatique et Santé, vol 17. Springer, Paris. https://doi.org/10.1007/978-2-287-99305-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-2-287-99305-3_3

  • Publisher Name: Springer, Paris

  • Print ISBN: 978-2-287-99304-6

  • Online ISBN: 978-2-287-99305-3

Publish with us

Policies and ethics