Abstract
De-identification is a growing need in medical informatics, and has therefore recently been the subject of renewed interest. De-identification needs to be tuned to the local documents and their specificities, which requires language engineers to work on non-de-identified text. To lower the issues linked to such a situation, we propose a de-identification method which proceeds in two steps. We report experiments on the adaptation of an American de-identifier to French and on the development of a new de-identifier for French patient reports. The latter, evaluated on a set of 23 randomly selected texts, obtains 85 % recall and 91 % precision.
Preview
Unable to display preview. Download preview PDF.
Références
Pestian JP, Brew C, Matykiewicz P, et al. A shared task involving multi-label classification of clinical free text. In: Proceedings of BioNLP 2007. Association for Computational Linguistics, 2007.
Kim JD, Ohta T, Tateisi Y, and Tsujii J. Genia corpus-a semantically annotated corpus for bio-textmining. Biotrformatics 2003; 19(Suppl. I): 180; 2.
Zweigenbaum P. Natural Language Processing in the medical and biomedical domains: a parallel perspective. In: Rebholz-Schuhmann D, Salakoski T, and Pyysalo S, eds, Proceedings 3rd International Symposium for Semantic Mining in Biomedicine (SMBM 2008), Turku. 2008; pp. 3–4. Keynote speech.
Neamatullah I, Douglass MM, Lehman LWH, et al. Automated de-identification of free-text medical records. SMC Medical Informatics and Decision Making July 2008; 8.
Ruch P, Baud R, Rassinoux A, Bouillon P, and Robert G. Medical document anonymization with a semantic lexicon. Proc AMIA Symp 2000: 729; 33.
Grouin C. Chaîne de traitement pour 10 constitution automattque de corpus: application sur le domaine médical pour le projet corpus CLEF. DESS d’ingénierie multilingue, Institut National des Langues et Civilisations Orientales. 2002.
Zweigenbaum P, Jacquemart P, Grabar N, Habert B. Building a text corpus for representing the variety of medical language. In: Patel VL, Rogers R, and Haux R, eds, Medinfo, 2001; pp. 290; 4.
El Emam K et Kamal-Dankar F. Protecting privacy using k-anonymity. J Am Med Inform Assoc September/October 2008; 15(5):627; 37.
Uzuner O, Luo Y, and Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc 2007; 14:550:63.
Friedlin JF, and McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc September/October 2008; 15(5): 601;10.
Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: The MetaMap program. J Am Med Inform Assoc 2001; 8(suppl): 17;21.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2009 Springer-Verlag France
About this paper
Cite this paper
Grouin, C., Rosier, A., Dameron, O., Zweigenbaum, P. (2009). Une procédure d’anonymisation à deux niveaux pour créer un corpus de comptes rendus hospitaliers. In: Risques, Technologies de l’Information pour les Pratiques Médicales. Informatique et Santé, vol 17. Springer, Paris. https://doi.org/10.1007/978-2-287-99305-3_3
Download citation
DOI: https://doi.org/10.1007/978-2-287-99305-3_3
Publisher Name: Springer, Paris
Print ISBN: 978-2-287-99304-6
Online ISBN: 978-2-287-99305-3