Methods Inf Med 2016; 55(03): 266-275
DOI: 10.3414/ME15-01-0112
Original Articles
Schattauer GmbH

Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text[*]

Tianyong Hao
1   Department of Biomedical Informatics, Columbia University, New York, NY, USA
2   Key Lab of Language Engineering and Computing of Guangdong Province, Guangdong University of Foreign Studies, Guangzhou, China
,
Hongfang Liu
3   Department of Health Sciences Research, Rochester, MN, USA
,
Chunhua Weng
1   Department of Biomedical Informatics, Columbia University, New York, NY, USA
› Author Affiliations
Further Information

Publication History

received: 26 August 2015

accepted: 07 February 2016

Publication Date:
08 January 2018 (online)

Summary

Objectives: To develop an automated method for extracting and structuring numeric lab test comparison statements from text and evaluate the method using clinical trial eligibility criteria text.

Methods: Leveraging semantic knowledge from the Unified Medical Language System (UMLS) and domain knowledge acquired from the Internet, Valx takes seven steps to extract and normalize numeric lab test expressions: 1) text preprocessing, 2) numeric, unit, and comparison operator extraction, 3) variable identification using hybrid knowledge, 4) variable – numeric association, 5) context-based association filtering, 6) measurement unit normalization, and 7) heuristic rule-based comparison statements verification. Our reference standard was the consensus-based annotation among three raters for all comparison statements for two variables, i.e., HbA1c and glucose, identi -fied from all of Type 1 and Type 2 diabetes trials in ClinicalTrials.gov.

Results: The precision, recall, and F-measure for structuring HbA1c comparison statements were 99.6%, 98.1%, 98.8% for Type 1 diabetes trials, and 98.8%, 96.9%, 97.8% for Type 2 diabetes trials, respectively. The pre -cision, recall, and F-measure for structuring glucose comparison statements were 97.3%, 94.8%, 96.1% for Type 1 diabetes trials, and 92.3%, 92.3%, 92.3% for Type 2 diabetes trials, respectively.

Conclusions: Valx is effective at extracting and structuring free-text lab test comparison statements in clinical trial summaries. Future studies are warranted to test its generaliz-ability beyond eligibility criteria text. The open-source Valx enables its further evaluation and continued improvement among the collaborative scientific community.

* Supplementary material published on our website http://dx.doi.org/10.3414/ME15-01-0112


 
  • References

  • 1 Boland MR, Miotto R, Gao J, Weng C. Feasibility of feature-based indexing, clustering, and search of clinical trials. A case study of breast cancer trials from ClinicalTrials.gov. Methods Inf Med 2013; 52 (Suppl. 05) 382-394.
  • 2 Miotto R, Jiang S, Weng C. eTACTS: a method for dynamically filtering clinical trial search results. J Biomed Inform 2013; 46 (Suppl. 06) 1060-1067.
  • 3 Tianyong Hao AR, Weng C. Extracting and Normalizing Temporal Expressions in Clinical Data Requests from Researchers. Lecture Notes in Computer Science 2013; 8040 p 10.
  • 4 Hao T, Rusanov A, Boland MR, Weng C. Clustering clinical trials with similar eligibility criteria features. J Biomed Inform 2014; 52: 112-120.
  • 5 Hao T, Weng C. Adaptive semantic tag mining from heterogeneous clinical research texts. Methods Inf Med 2015; 54 (Suppl. 02) 164-170.
  • 6 Bache R, Taweel A, Miles S, Delaney BC. An eligibility criteria query language for heterogeneous data warehouses. Methods Inf Med 2015; 54 (Suppl. 01) 41-44.
  • 7 Weng C, Payne PRO, Velez M, Johnson SB, Bakken S. Towards Symbiosis in Knowledge Representation and Natural Language Processing for Structuring Clinical Practice Guidelines. Studies in Health Technology and Informatics 2013; 201: 461-469.
  • 8 Thadani SR, Weng CH, Bigger JT, Ennever JF, Wajngurt D. Electronic Screening Improves Efficiency in Clinical Trial Recruitment. Journal of the American Medical Informatics Association 2009; 16 (Suppl. 06) 869-873.
  • 9 Miotto R, Weng C. Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials. Journal of the American Medical Informatics Association. 2015 available online.
  • 10 Weng C, Li Y, Ryan P, Zhang Y, Liu F, Gao J, Bigger JT, Hripcsak G. A Distribution-based Method for Assessing the Differences between Clinical Trial Target Populations and Patient Populations in Electronic Health Records. Applied Clinical Informatics 2014; 5 (Suppl. 02) 463-479.
  • 11 He Z, Carini S, Sim I, Weng C. Visual aggregate analysis of eligibility features of clinical trials. Journal of Biomedical Informatics 2015; 54 (Suppl. 00) 241-255.
  • 12 Tu MZY, Zong CA. Universal Approach to Translating Numerical and Time Expressions. In: 9th International Workshop on Spoken Language Translation. 2012 pp 209-216.
  • 13 Lonsdale DW, Tustison C, Parker CG, Embley DW. Assessing clinical trial eligibility with logic expression queries. Data & Knowledge Engineering 2008; 66 (Suppl. 01) 3-17.
  • 14 Tu SW, Peleg M, Carini S, Bobak M, Ross J, Rubin D, Sim I. A practical method for transforming free-text eligibility criteria into computable criteria. Journal of Biomedical Informatics 2011; 44 (Suppl. 02) 239-250.
  • 15 Damen DLK, Hellebaut G, Bulcke TVD. PASTEL: A Semantic Platform for Assisted Clinical Trial Patient Recruitment. In: International Conference on Healthcare Informatics. 2013. pp 269-276.
  • 16 Murata MST, Torisawa K, Iwatate M, Ichii K, Ma Q, Kanamaru T. Sophisticated Text Mining System for Extracting and Visualizing Numerical and Named Entity Information from a Large Number of Documents. In: NTCIR-7 Workshop Meeting. 2008. pp 555-562.
  • 17 US National Institutes of Health.. ClinicalTrials.gov. [cited 2014]. Available from: http://www.clinicaltrials.gov.
  • 18 Pustejovsky JIB, Sauri R, Castano J, Littman J, Gaizauskas R. et al. The Specification Language TimeML. In: The Language of Time: A Reader. Oxford University Press; 2005. pp 545-557.
  • 19 Boguraev BARK. TimeML – Compliant Text Analysis for Temporal Reasoning. In: 19th international joint conference on Artificial intelligence. 2005. Morgan Kaufmann Publishers Inc.; pp 997-1003.
  • 20 Pustejovsky JMC, Ingria R, Sauri R, Gaizauskas RJ, Setzer A, Katz G, Radev DR. TimeML: Robust Specification of Event and Temporal Expressions in Text. In: New Directions in Question Answering. AAAI Press; 2003. pp 28-34.
  • 21 National Library of Medicine.. Unified Medical Language System Glossary. [cited 2014]. Available from: http://www.nlm.nih.gov/research/umls/new_users/glossary.html
  • 22 Units Conversion. [cited 2014]. Available from: http://www.globalrph.com/conv_si.htm
  • 23 Gillett MJ. International Expert Committee report on the role of the A1c assay in the diagnosis of diabetes. Diabetes Care 2009; 32 (Suppl. 07) 1327-1334. Clin Biochem Rev; 2009; 30 (4): 197–200.
  • 24 Manning CD PR, Schütze H. Introduction to information retrieval. Cambridge University Press; 2009
  • 25 Parker CG, Embley DW. Generating medical logic modules for clinical trial eligibility criteria. AMIA Annu Symp Proc 2003; p 964.