Une procédure d’anonymisation à deux niveaux pour créer un corpus de comptes rendus hospitaliers

Grouin, Cyril; Rosier, Arnaud; Dameron, Olivier; Zweigenbaum, Pierre

doi:10.1007/978-2-287-99305-3_3

Cyril Grouin⁵,
Arnaud Rosier⁶,
Olivier Dameron⁶ &
…
Pierre Zweigenbaum⁵

Part of the book series: Informatique et Santé ((INFORMATIQUE,volume 17))

454 Accesses
1 Citations

Abstract

De-identification is a growing need in medical informatics, and has therefore recently been the subject of renewed interest. De-identification needs to be tuned to the local documents and their specificities, which requires language engineers to work on non-de-identified text. To lower the issues linked to such a situation, we propose a de-identification method which proceeds in two steps. We report experiments on the adaptation of an American de-identifier to French and on the development of a new de-identifier for French patient reports. The latter, evaluated on a set of 23 randomly selected texts, obtains 85 % recall and 91 % precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Références

Pestian JP, Brew C, Matykiewicz P, et al. A shared task involving multi-label classification of clinical free text. In: Proceedings of BioNLP 2007. Association for Computational Linguistics, 2007.
Google Scholar
Kim JD, Ohta T, Tateisi Y, and Tsujii J. Genia corpus-a semantically annotated corpus for bio-textmining. Biotrformatics 2003; 19(Suppl. I): 180; 2.
Article Google Scholar
Zweigenbaum P. Natural Language Processing in the medical and biomedical domains: a parallel perspective. In: Rebholz-Schuhmann D, Salakoski T, and Pyysalo S, eds, Proceedings 3rd International Symposium for Semantic Mining in Biomedicine (SMBM 2008), Turku. 2008; pp. 3–4. Keynote speech.
Google Scholar
Neamatullah I, Douglass MM, Lehman LWH, et al. Automated de-identification of free-text medical records. SMC Medical Informatics and Decision Making July 2008; 8.
Google Scholar
Ruch P, Baud R, Rassinoux A, Bouillon P, and Robert G. Medical document anonymization with a semantic lexicon. Proc AMIA Symp 2000: 729; 33.
Google Scholar
Grouin C. Chaîne de traitement pour 10 constitution automattque de corpus: application sur le domaine médical pour le projet corpus CLEF. DESS d’ingénierie multilingue, Institut National des Langues et Civilisations Orientales. 2002.
Google Scholar
Zweigenbaum P, Jacquemart P, Grabar N, Habert B. Building a text corpus for representing the variety of medical language. In: Patel VL, Rogers R, and Haux R, eds, Medinfo, 2001; pp. 290; 4.
Google Scholar
El Emam K et Kamal-Dankar F. Protecting privacy using k-anonymity. J Am Med Inform Assoc September/October 2008; 15(5):627; 37.
Article PubMed Google Scholar
Uzuner O, Luo Y, and Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc 2007; 14:550:63.
Article PubMed Google Scholar
Friedlin JF, and McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc September/October 2008; 15(5): 601;10.
Article PubMed Google Scholar
Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. J Am Med Inform Assoc 2001; 8(suppl): 17;21.
Google Scholar

Download references

Author information

Authors and Affiliations

LIMSI-CNRS, BP133, F-91403, Orsay Cedex, France
Cyril Grouin & Pierre Zweigenbaum
INSERM U936, F-35000, Rennes, France
Arnaud Rosier & Olivier Dameron

Authors

Cyril Grouin
View author publications
You can also search for this author in PubMed Google Scholar
Arnaud Rosier
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Dameron
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Zweigenbaum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cyril Grouin .

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grouin, C., Rosier, A., Dameron, O., Zweigenbaum, P. (2009). Une procédure d’anonymisation à deux niveaux pour créer un corpus de comptes rendus hospitaliers. In: Risques, Technologies de l’Information pour les Pratiques Médicales. Informatique et Santé, vol 17. Springer, Paris. https://doi.org/10.1007/978-2-287-99305-3_3

Download citation

DOI: https://doi.org/10.1007/978-2-287-99305-3_3
Publisher Name: Springer, Paris
Print ISBN: 978-2-287-99304-6
Online ISBN: 978-2-287-99305-3

Publish with us

Policies and ethics