Skip to main content

Representing EHRs with Temporal Tree and Sequential Pattern Mining for Similarity Computing

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2020)

Abstract

The ability to rapidly identify at scale patients that are similar based on their electronic health records (EHRs) is fundamental for a number of clinical informatics applications, such as clinical decision support, cohort selection, treatment recommendation, among others.

The effective representation of EHR data is paramount to effective computational similarity methods. Such representation would take into account the complex properties of EHR data including temporality and multivariaty. Of critical importance for this is the modelling of: (i) compound information – multiple medical events for a patient occur in order and may be at the same time, (ii) clinical patterns – frequent common sequential patterns that are associated with specific sequences of clinical events. To model these, in this paper we exploit the recently proposed Temporal Tree technique to capture compound information and we further apply sequential pattern mining (SPM) with gap constraint to discover more complex clinical patterns.

The effectiveness of the proposed EHR representation method is evaluated using a real EHR dataset, MIMIC III, based on two task types within an Intensive Care Unit setting: (i) similar patients retrieval (ii) sepsis prediction and mortality prediction. The empirical results show that representation of EHRs with Temporal Tree and SPM, used in conjunction with traditional similarity measures or more complex embedding methods, delivers significant improvements in effectiveness in the considered tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Assume one hour is the temporal time unit used for representation.

  2. 2.

    Each patient may have multiple diagnoses: we only consider the first diagnosis when filtering the data to create the subset for evaluation. The used primary icd9_code are: “41401”,“0389” and “51881”.

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the 11th International Conference on Data Engineering, pp. 3–14. IEEE (1995)

    Google Scholar 

  2. Altszyler, E., Ribeiro, S., Sigman, M., Slezak, D.F.: The interpretation of dream meaning: resolving ambiguity using latent semantic analysis in a small corpus of text. Conscious. Cogn. 56, 178–187 (2017)

    Article  Google Scholar 

  3. Angus, D.C., Linde-Zwirble, W.T., Lidicker, J., Clermont, G., Carcillo, J., Pinsky, M.R.: Epidemiology of severe sepsis in the united states: analysis of incidence, outcome, and associated costs of care. Crit. Care Med. 29(7), 1303–1310 (2001). Society of Critical Care Medicine

    Article  Google Scholar 

  4. Angus, D.C., et al.: A framework for the development and interpretation of different sepsis definitions and clinical criteria. Crit. Care Med. 44(3), e113 (2016)

    Article  Google Scholar 

  5. Bajor, J.M., Mesa, D.A., Osterman, T.J., Lasko, T.A.: Embedding complexity in the data representation instead of in the model: A case study using heterogeneous medical data. arXiv preprint arXiv:1802.04233 (2018)

  6. Batal, I., Fradkin, D., Harrison, J., Moerchen, F., Hauskrecht, M.: Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 280–288 (2012)

    Google Scholar 

  7. Batal, I., Valizadegan, H., Cooper, G.F., Hauskrecht, M.: A pattern mining approach for classifying multivariate temporal data. In: 2011 IEEE International Conference on Bioinformatics and Biomedicine, pp. 358–365. IEEE (2011)

    Google Scholar 

  8. Batal, I., Valizadegan, H., Cooper, G.F., Hauskrecht, M.: A temporal pattern mining approach for classifying electronic health record data. ACM Trans. Intell. Syst. Technol. (TIST) 4(4), 63 (2013)

    Google Scholar 

  9. Choi, E., et al.: Multi-layer representation learning for medical concepts. In: Proceedings of the 22nd ACM SIGKDD, pp. 1495–1504 (2016)

    Google Scholar 

  10. Choi, Y., Chiu, C.Y.I., Sontag, D.: Learning low-dimensional representations of medical concepts. AMIA Jt. Summits Transl. Sci. Proc. 2016, 41 (2016)

    Google Scholar 

  11. Glicksberg, B.S., et al.: Automated disease cohort selection using word embeddings from electronic health records. In: PSB, pp. 145–156. World Scientific (2018)

    Google Scholar 

  12. Gottlieb, A., Stein, G.Y., Ruppin, E., Altman, R.B., Sharan, R.: A method for inferring medical diagnoses from patient similarities. BMC Med. 11(1), 194 (2013)

    Article  Google Scholar 

  13. Huang, Z., Dong, W., Duan, H., Li, H.: Similarity measure between patient traces for clinical pathway analysis: problem, method, and applications. IEEE J. Biomed. Health Inform. 18(1), 4–14 (2014)

    Article  Google Scholar 

  14. Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13(6), 395 (2012)

    Article  Google Scholar 

  15. Jia, Z., Zeng, X., Duan, H., Lu, X., Li, H.: A patient-similarity-based model for diagnostic prediction. Int. J. Med. Inform. 135, 104073 (2020)

    Article  Google Scholar 

  16. Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)

    Article  Google Scholar 

  17. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  18. Li, L., et al.: Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci. Transl. Med. 7(311), 311ra174 (2015)

    Article  Google Scholar 

  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  20. Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016)

    Article  Google Scholar 

  21. Miotto, R., Weng, C.: Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials. J. Am. Med. Inform. Assoc. 22(e1), e141–e150 (2015)

    Article  Google Scholar 

  22. Musen, M.A., Middleton, B., Greenes, R.A.: Clinical decision-support systems. In: Shortliffe, E.H., Cimino, J.J. (eds.) Biomedical Informatics, pp. 643–674. Springer, London (2014). https://doi.org/10.1007/978-1-4471-4474-8_22

    Chapter  Google Scholar 

  23. Narayanan, A., Chandramohan, M., Chen, L., Liu, Y., Saminathan, S.: subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. arXiv preprint arXiv:1606.08928 (2016)

  24. Nguyen, D., Luo, W., Nguyen, T.D., Venkatesh, S., Phung, D.: Sqn2Vec: learning sequence representation via sequential patterns with a gap constraint. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11052, pp. 569–584. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10928-8_34

    Chapter  Google Scholar 

  25. Pokharel, S., Li, X., Zhao, X., Adhikari, A., Li, Y.: Similarity computing on electronic health records (2018)

    Google Scholar 

  26. Pokharel, S., Zuccon, G., Li, X., Utomo, C.P., Li, Y.: Temporal tree representation for similarity computation between medical patients. Artif. Intell. Med. 108, 101900 (2020)

    Article  Google Scholar 

  27. Rjeily, C.B., Badr, G., Al Hassani, A.H., Andres, E.: Predicting heart failure class using a sequence prediction algorithm. In: 2017 4th International Conference on Advances in Biomedical Engineering (ICABME), pp. 1–4. IEEE (2017)

    Google Scholar 

  28. Seymour, C.W., et al.: Assessment of clinical criteria for sepsis: for the third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 315(8), 762–774 (2016)

    Article  Google Scholar 

  29. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-lehman graph kernels. J. Mach. Learn. Res. 12, 2539–2561 (2011)

    MathSciNet  MATH  Google Scholar 

  30. Sidorov, G., Gelbukh, A., Gómez-Adorno, H., Pinto, D.: Soft similarity and soft cosine measure: similarity of features in vector space model. Computación y Sistemas 18(3), 491–504 (2014)

    Article  Google Scholar 

  31. Sun, J., Wang, F., Hu, J., Edabollahi, S.: Supervised patient similarity measure of heterogeneous patient records. ACM SIGKDD Expl. Newsl. 14(1), 16–24 (2012)

    Article  Google Scholar 

  32. Utomo, C.P., Kurniawati, H., Li, X., Pokharel, S.: Personalised medicine in critical care using Bayesian reinforcement learning. In: Li, J., Wang, S., Qin, S., Li, X., Wang, S. (eds.) ADMA 2019. LNCS (LNAI), vol. 11888, pp. 648–657. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35231-8_47

    Chapter  Google Scholar 

  33. Wang, Y., Chen, W., Pi, D., Boots, R.: Graph augmented triplet architecture for fine-grained patient similarity. World Wide Web 23(5), 2739–2752 (2020). https://doi.org/10.1007/s11280-020-00794-y

    Article  Google Scholar 

  34. Wright, A.P., Wright, A.T., McCoy, A.B., Sittig, D.F.: The use of sequential pattern mining to predict next prescribed medications. J. Biomed. Inform. 53, 73–80 (2015)

    Article  Google Scholar 

  35. Yanardag, P., Vishwanathan, S.: Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD, pp. 1365–1374. ACM (2015)

    Google Scholar 

  36. Zaki, M.J., Meira Jr., W., Meira, W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge (2014)

    Book  Google Scholar 

  37. Zhang, J., Kowsari, K., Harrison, J.H., Lobo, J.M., Barnes, L.E.: Patient2Vec: a personalized interpretable deep representation of the longitudinal electronic health record. IEEE Access 6, 65333–65346 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Suresh Pokharel , Guido Zuccon or Yu Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pokharel, S., Zuccon, G., Li, Y. (2020). Representing EHRs with Temporal Tree and Sequential Pattern Mining for Similarity Computing. In: Yang, X., Wang, CD., Islam, M.S., Zhang, Z. (eds) Advanced Data Mining and Applications. ADMA 2020. Lecture Notes in Computer Science(), vol 12447. Springer, Cham. https://doi.org/10.1007/978-3-030-65390-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65390-3_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65389-7

  • Online ISBN: 978-3-030-65390-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics