nach oben

Erschienen in:

29.01.2018 | Preclinical study

Machine learning to parse breast pathology reports in Chinese

verfasst von: Rong Tang, Lizhi Ouyang, Clara Li, Yue He, Molly Griffin, Alphonse Taghian, Barbara Smith, Adam Yala, Regina Barzilay, Kevin Hughes

Erschienen in: Breast Cancer Research and Treatment | Ausgabe 2/2018

Einloggen, um Zugang zu erhalten

Abstract

Introduction

Large structured databases of pathology findings are valuable in deriving new clinical insights. However, they are labor intensive to create and generally require manual annotation. There has been some work in the bioinformatics community to support automating this work via machine learning in English. Our contribution is to provide an automated approach to construct such structured databases in Chinese, and to set the stage for extraction from other languages.

Methods

We collected 2104 de-identified Chinese benign and malignant breast pathology reports from Hunan Cancer Hospital. Physicians with native Chinese proficiency reviewed the reports and annotated a variety of binary and numerical pathologic entities. After excluding 78 cases with a bilateral lesion in the same report, 1216 cases were used as a training set for the algorithm, which was then refined by 405 development cases. The Natural language processing algorithm was tested by using the remaining 405 cases to evaluate the machine learning outcome. The model was used to extract 13 binary entities and 8 numerical entities.

Results

When compared to physicians with native Chinese proficiency, the model showed a per-entity accuracy from 91 to 100% for all common diagnoses on the test set. The overall accuracy of binary entities was 98% and of numerical entities was 95%. In a per-report evaluation for binary entities with more than 100 training cases, 85% of all the testing reports were completely correct and 11% had an error in 1 out of 22 entities.

Conclusion

We have demonstrated that Chinese breast pathology reports can be automatically parsed into structured data using standard machine learning approaches. The results of our study demonstrate that techniques effective in parsing English reports can be scaled to other languages.

Huang CR, Chen KJ, Chang LL (1996) Segmentation standard for Chinese natural language processing. In: Proceedings of the 16th conference on Computational linguistics, vol. 2 (pp. 1045–1048). Association for Computational Linguistics

Wong KF, Li W, Xu R, Zhang ZS (2009) Introduction to Chinese natural language processing. Synth Lect Hum Lang Technol 2(1):1–148CrossRef

Qiu X, Qi Z, Huang X (2013) Fudan NLP: a toolkit for Chinese natural language processing. In: ACL (conference system demonstrations), pp. 49–54

Liang YF, Chu PY, Chang CS, Wang CH, Chang P (2006) Developing and evaluating a simple, spreadsheet-based pathology report extraction system for cancer registrars. AMIA Ann Sym Proc 2006:1008

Buckley JM, Coopey SB, Sharko J, Polubriaginof F, Drohan B, Belli AK, Kim EM, Garber JE, Smith BL, Gadd MA et al (2012) The feasibility of using natural language processing to extract clinical information from breast pathology reports. J Pathol Inform 3:23CrossRefPubMedPubMedCentral

Yala Adam, Barzilay Regina, Salama Laura, Griffin Molly, Sollender Grace, Bardia Aditya, Lehman Constance et al (2017) Using machine learning to parse breast pathology reports. Breast Cancer Res Treat 161(2):203–211CrossRefPubMed

Sun J (2013) Jieba (version 0.39) [source code]. https://github.com/fxsjy/jieba

Korobov M (2015) Sklearn-crfsuite (Version 0.3.6) [source code] https://github.com/TeamHG-Memex/sklearn-crfsuite

Burger G, Abu-Hanna A, de Keizer N, Cornet R (2016) Natural language processing in pathology: a scoping review. J Clin Pathol 69(11):949–955CrossRef

10.

Edwards GA (2008) Expert systems for clinical pathology reporting. Clin Biochem Rev 29:S105–S109PubMedPubMedCentral

11.

Napolitano G, Fox C, Middleton R, Connolly D (2010) Pattern based information extraction from pathology reports for cancer registration. Cancer Causes Control 21:1887–1894CrossRefPubMed

12.

Nguyen A, Lawley M, Hansen D, Colquist S (2011) Structured pathology reporting for cancer from free text: lung cancer case study. Electron J Health Inform 7:8

13.

Nguyen AN, Lawley MJ, Hansen DP, Bowman RV, Clarke BE, Duhig EE, Colquist S (2010) Symbolic rule-based classification of lung cancer stages from free-text pathology reports. J Am Med Inform Assoc 17:440–445CrossRefPubMedPubMedCentral

14.

Weegar R, Dalianis H (2015) Creating a rule based system for text mining of Norwegian breast cancer pathology reports. In: Sixth international workshop on health text mining and information analysis (Louhi), p 73

15.

Li Y, Martinez D (2010) Information extraction of multiple entities from pathology reports. In: Australasian Language Technology Association Workshop, p 41

16.

Martinez D, Li Y (2011) Information extraction from pathology reports in a hospital setting. In: Proceedings of the 20th ACM international conference on information and knowledge management, ACM, pp 1877–1882

17.

Nguyen A, Moore D, McCowan I, Courage M-J (2007) Multiclass classification of cancer stages from free-text histology reports using support vector machines. In: 29th annual international conference of the IEEE engineering in medicine and biology society, IEEE, pp 5140–5143

18.

Wieneke AE, Bowles EJ, Cronkite D, Wernli KJ, Gao H, Carrell D, Buist DS (2015) Validation of natural language processing to extract breast cancer pathology procedures and results. J Pathol Inform 6:38CrossRefPubMedPubMedCentral

Titel: Machine learning to parse breast pathology reports in Chinese
verfasst von: Rong Tang
Lizhi Ouyang
Clara Li
Yue He
Molly Griffin
Alphonse Taghian
Barbara Smith
Adam Yala
Regina Barzilay
Kevin Hughes
Publikationsdatum: 29.01.2018
Verlag: Springer US
Erschienen in: Breast Cancer Research and Treatment / Ausgabe 2/2018
Print ISSN: 0167-6806
Elektronische ISSN: 1573-7217
DOI: https://doi.org/10.1007/s10549-018-4668-3

Neu im Fachgebiet Onkologie

25.04.2024 | Nierenkarzinom | Nachrichten

Springer Medizin

Machine learning to parse breast pathology reports in Chinese

Abstract

Introduction

Methods

Results

Conclusion

Neu im Fachgebiet Onkologie

Adjuvante Immuntherapie verlängert Leben bei RCC

Alectinib verbessert krankheitsfreies Überleben bei ALK-positivem NSCLC

Bei Senioren mit Prostatakarzinom auf Anämie achten!

ICI-Therapie in der Schwangerschaft wird gut toleriert

Update Onkologie

Springer Medizin

Abstract

Introduction

Methods

Results

Conclusion

Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten

Weitere Artikel der Ausgabe 2/2018

The frequency of missed breast cancers in women participating in a high-risk MRI screening program

Chronic postsurgical pain following breast reconstruction: a commentary and critique

Targeting ataxia telangiectasia-mutated- and Rad3-related kinase (ATR) in PTEN-deficient breast cancers for personalized therapy

Lymph drainage of the upper limb and mammary region to the axilla: anatomical study in stillborns

A novel patient-derived xenograft model for claudin-low triple-negative breast cancer

Intrathecal trastuzumab in the management of HER2+ breast leptomeningeal disease: a single institution experience

Neu im Fachgebiet Onkologie

Adjuvante Immuntherapie verlängert Leben bei RCC

Alectinib verbessert krankheitsfreies Überleben bei ALK-positivem NSCLC

Bei Senioren mit Prostatakarzinom auf Anämie achten!

ICI-Therapie in der Schwangerschaft wird gut toleriert

Update Onkologie