skip to main content
10.1145/3394486.3403055acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Attention and Memory-Augmented Networks for Dual-View Sequential Learning

Published:20 August 2020Publication History

ABSTRACT

In recent years, sequential learning has been of great interest due to the advance of deep learning with applications in time-series forecasting, natural language processing, and speech recognition. Recurrent neural networks (RNNs) have achieved superior performance in single-view and synchronous multi-view sequential learning comparing to traditional machine learning models. However, the method remains less explored in asynchronous multi-view sequential learning, and the unalignment nature of multiple sequences poses a great challenge to learn the inter-view interactions. We develop an AMANet (Attention and Memory-Augmented Networks) architecture by integrating both attention and memory to solve asynchronous multi-view learning problem in general, and we focus on experiments in dual-view sequences in this paper. Self-attention and inter-attention are employed to capture intra-view interaction and inter-view interaction, respectively. History attention memory is designed to store the historical information of a specific object, which serves as local knowledge storage. Dynamic external memory is used to store global knowledge for each view. We evaluate our model in three tasks: medication recommendation from a patient's medical records, diagnosis-related group (DRG) classification from a hospital record, and invoice fraud detection through a company's taxation behaviors. The results demonstrate that our model outperforms all baselines and other state-of-the-art models in all tasks. Moreover, the ablation study of our model indicates that the inter-attention mechanism plays a key role in the model and it can boost the predictive power by effectively capturing the inter-view interactions from asynchronous views.

Skip Supplemental Material Section

Supplemental Material

3394486.3403055.mp4

mp4

126.1 MB

References

  1. D. Bahdanau, K. Cho, and Y. Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  2. T. Baltruaitis, C. Ahuja, and L. P. Morency. 2019. Multimodal machine learning: A survey and taxonomy. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 41(2). 423--443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Cho, B. Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724--1734.Google ScholarGoogle Scholar
  4. E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart. 2016. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Advances in Neural Information Processing Systems (NeurIPS). 3504--3512.Google ScholarGoogle Scholar
  5. Daniel Gartner, Rainer Kolisch, Daniel B Neill, and Rema Padman. 2015. Machine learning approaches for early DRG classification and resource allocation. INFORMS Journal on Computing, Vol. 27, 4 (2015), 718--734.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwi'ska, and et al. 2016. Hybrid computing using a neural network with dynamic external memory. In Nature, Vol. 538(7626). 471.Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. In Neural Computation, Vol. 9(8). 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Jin, H. Yang, L. Sun, C. Liu, Y. Qu, and J. Tong. 2018. A Treatment Engine by Predicting Next-Period Prescriptions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 1608--1616.Google ScholarGoogle Scholar
  9. A. E. Johnson, T. J. Pollard, L. Shen, H. L. Li-wei, M. Feng, M. Ghassemi, and et al. 2016. MIMIC-III, a freely accessible critical care database. In Scientific Data, Vol. (3). 160035.Google ScholarGoogle Scholar
  10. D. P. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  11. T. N. Kipf and M. Welling. 2017. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  12. H. Le, T. Tran, and S. Venkatesh. 2018. Memory fusion network for multi-view sequential learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 1637--1645.Google ScholarGoogle Scholar
  13. T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2980--2988.Google ScholarGoogle Scholar
  14. F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun, and J. Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD). 1903--1911.Google ScholarGoogle Scholar
  15. S. S. Rajagopalan, L. P. Morency, T. Baltrusaitis, and R. Goecke. 2016. Extending long short-term memory for multi-view structured learning. In European Conference on Computer Vision (ECCV). 338--353.Google ScholarGoogle Scholar
  16. J. Shang, C. Xiao, T. Ma, H. Li, and J. Sun. 2019. Graph augmented memory networks for recommending medication combination. In Thirty-Third AAAI Conference on Artificial Intelligence (AAAI). 1126--1133.Google ScholarGoogle Scholar
  17. H. Song, D. Rajan, J. J. Thiagarajan, and A. Spanias. 2018. Attend and diagnose: Clinical time series analysis using attention models. In Thirty-Second AAAI Conference on Artificial Intelligence. 4091--4098.Google ScholarGoogle Scholar
  18. A. Vaswani, N. Shazeer, N. Shazeer, J. Uszkoreit, L. Jones, A. N. Gomez, and et al. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS). 5998--6008.Google ScholarGoogle Scholar
  19. Zhaoxin Wang, Rui Liu, Ping Li, and Chenghua Jiang. 2014. Exploring the transition to DRGs in developing countries: a case study in Shanghai, China. Pakistan journal of medical sciences, Vol. 30, 2 (2014), 250.Google ScholarGoogle Scholar
  20. J. Weston, S. Chopra, and A. Bordes. 2015. Memory networks. In 3rd International Conference on Learning Representations (ICLR).Google ScholarGoogle Scholar
  21. C. Xu, D. Tao, and C. Xu. 2013. A survey on multi-view learning. In ArXiv. arXiv:1304.5634.Google ScholarGoogle Scholar
  22. Y. Xu, S. Biswal, S. R. Biswal, K. O. Maher, and J. Sun. 2018. RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 2565--2573.Google ScholarGoogle Scholar
  23. Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 1480--1489.Google ScholarGoogle Scholar
  24. A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria, and L. P. Morency. 2018. Memory fusion network for multi-view sequential learning. In Thirty-Second AAAI Conference on Artificial Intelligence (AAAI). 5634--5641.Google ScholarGoogle Scholar
  25. Y. Zhang, R. Chen, J. Tang, W. F. Stewart, and J. Sun. 2017. LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1315--1324.Google ScholarGoogle Scholar
  26. J. Zhao, X. Xie, X. Xu, and S. Sun. 2017. Multi-view learning overview: Recent progress and new challenges. In Information Fusion, Vol. 38. 43--54.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Attention and Memory-Augmented Networks for Dual-View Sequential Learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader