ABSTRACT
In recent years, sequential learning has been of great interest due to the advance of deep learning with applications in time-series forecasting, natural language processing, and speech recognition. Recurrent neural networks (RNNs) have achieved superior performance in single-view and synchronous multi-view sequential learning comparing to traditional machine learning models. However, the method remains less explored in asynchronous multi-view sequential learning, and the unalignment nature of multiple sequences poses a great challenge to learn the inter-view interactions. We develop an AMANet (Attention and Memory-Augmented Networks) architecture by integrating both attention and memory to solve asynchronous multi-view learning problem in general, and we focus on experiments in dual-view sequences in this paper. Self-attention and inter-attention are employed to capture intra-view interaction and inter-view interaction, respectively. History attention memory is designed to store the historical information of a specific object, which serves as local knowledge storage. Dynamic external memory is used to store global knowledge for each view. We evaluate our model in three tasks: medication recommendation from a patient's medical records, diagnosis-related group (DRG) classification from a hospital record, and invoice fraud detection through a company's taxation behaviors. The results demonstrate that our model outperforms all baselines and other state-of-the-art models in all tasks. Moreover, the ablation study of our model indicates that the inter-attention mechanism plays a key role in the model and it can boost the predictive power by effectively capturing the inter-view interactions from asynchronous views.
Supplemental Material
- D. Bahdanau, K. Cho, and Y. Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR).Google Scholar
- T. Baltruaitis, C. Ahuja, and L. P. Morency. 2019. Multimodal machine learning: A survey and taxonomy. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 41(2). 423--443.Google ScholarDigital Library
- K. Cho, B. Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724--1734.Google Scholar
- E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart. 2016. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Advances in Neural Information Processing Systems (NeurIPS). 3504--3512.Google Scholar
- Daniel Gartner, Rainer Kolisch, Daniel B Neill, and Rema Padman. 2015. Machine learning approaches for early DRG classification and resource allocation. INFORMS Journal on Computing, Vol. 27, 4 (2015), 718--734.Google ScholarDigital Library
- A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwi'ska, and et al. 2016. Hybrid computing using a neural network with dynamic external memory. In Nature, Vol. 538(7626). 471.Google ScholarCross Ref
- S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. In Neural Computation, Vol. 9(8). 1735--1780.Google ScholarDigital Library
- B. Jin, H. Yang, L. Sun, C. Liu, Y. Qu, and J. Tong. 2018. A Treatment Engine by Predicting Next-Period Prescriptions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 1608--1616.Google Scholar
- A. E. Johnson, T. J. Pollard, L. Shen, H. L. Li-wei, M. Feng, M. Ghassemi, and et al. 2016. MIMIC-III, a freely accessible critical care database. In Scientific Data, Vol. (3). 160035.Google Scholar
- D. P. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).Google Scholar
- T. N. Kipf and M. Welling. 2017. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations (ICLR).Google Scholar
- H. Le, T. Tran, and S. Venkatesh. 2018. Memory fusion network for multi-view sequential learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 1637--1645.Google Scholar
- T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2980--2988.Google Scholar
- F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun, and J. Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD). 1903--1911.Google Scholar
- S. S. Rajagopalan, L. P. Morency, T. Baltrusaitis, and R. Goecke. 2016. Extending long short-term memory for multi-view structured learning. In European Conference on Computer Vision (ECCV). 338--353.Google Scholar
- J. Shang, C. Xiao, T. Ma, H. Li, and J. Sun. 2019. Graph augmented memory networks for recommending medication combination. In Thirty-Third AAAI Conference on Artificial Intelligence (AAAI). 1126--1133.Google Scholar
- H. Song, D. Rajan, J. J. Thiagarajan, and A. Spanias. 2018. Attend and diagnose: Clinical time series analysis using attention models. In Thirty-Second AAAI Conference on Artificial Intelligence. 4091--4098.Google Scholar
- A. Vaswani, N. Shazeer, N. Shazeer, J. Uszkoreit, L. Jones, A. N. Gomez, and et al. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS). 5998--6008.Google Scholar
- Zhaoxin Wang, Rui Liu, Ping Li, and Chenghua Jiang. 2014. Exploring the transition to DRGs in developing countries: a case study in Shanghai, China. Pakistan journal of medical sciences, Vol. 30, 2 (2014), 250.Google Scholar
- J. Weston, S. Chopra, and A. Bordes. 2015. Memory networks. In 3rd International Conference on Learning Representations (ICLR).Google Scholar
- C. Xu, D. Tao, and C. Xu. 2013. A survey on multi-view learning. In ArXiv. arXiv:1304.5634.Google Scholar
- Y. Xu, S. Biswal, S. R. Biswal, K. O. Maher, and J. Sun. 2018. RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 2565--2573.Google Scholar
- Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 1480--1489.Google Scholar
- A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria, and L. P. Morency. 2018. Memory fusion network for multi-view sequential learning. In Thirty-Second AAAI Conference on Artificial Intelligence (AAAI). 5634--5641.Google Scholar
- Y. Zhang, R. Chen, J. Tang, W. F. Stewart, and J. Sun. 2017. LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1315--1324.Google Scholar
- J. Zhao, X. Xie, X. Xu, and S. Sun. 2017. Multi-view learning overview: Recent progress and new challenges. In Information Fusion, Vol. 38. 43--54.Google ScholarDigital Library
Index Terms
- Attention and Memory-Augmented Networks for Dual-View Sequential Learning
Recommendations
Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningOne of the core tasks in multi-view learning is to capture relations among views. For sequential data, the relations not only span across views, but also extend throughout the view length to form long-term intra-view and inter-view interactions. In this ...
Multi-Cast Attention Networks
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningAttention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention Networks (MCAN), a ...
Memory fusion network for multi-view sequential learning
AAAI'18/IAAI'18/EAAI'18: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial IntelligenceMulti-view sequential learning is a fundamental problem in machine learning dealing with multi-view sequences. In a multi-view sequence, there exists two forms of interactions between different views: view-specific interactions and cross-view ...
Comments