research-article

Attention and Memory-Augmented Networks for Dual-View Sequential Learning

Authors:
Yong He

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Cheng Wang

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Nan Li

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Zhenyu Zeng

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningAugust 2020Pages 125–134https://doi.org/10.1145/3394486.3403055

Published:20 August 2020Publication History

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 125–134

ABSTRACT

In recent years, sequential learning has been of great interest due to the advance of deep learning with applications in time-series forecasting, natural language processing, and speech recognition. Recurrent neural networks (RNNs) have achieved superior performance in single-view and synchronous multi-view sequential learning comparing to traditional machine learning models. However, the method remains less explored in asynchronous multi-view sequential learning, and the unalignment nature of multiple sequences poses a great challenge to learn the inter-view interactions. We develop an AMANet (Attention and Memory-Augmented Networks) architecture by integrating both attention and memory to solve asynchronous multi-view learning problem in general, and we focus on experiments in dual-view sequences in this paper. Self-attention and inter-attention are employed to capture intra-view interaction and inter-view interaction, respectively. History attention memory is designed to store the historical information of a specific object, which serves as local knowledge storage. Dynamic external memory is used to store global knowledge for each view. We evaluate our model in three tasks: medication recommendation from a patient's medical records, diagnosis-related group (DRG) classification from a hospital record, and invoice fraud detection through a company's taxation behaviors. The results demonstrate that our model outperforms all baselines and other state-of-the-art models in all tasks. Moreover, the ablation study of our model indicates that the inter-attention mechanism plays a key role in the model and it can boost the predictive power by effectively capturing the inter-view interactions from asynchronous views.

Supplemental Material

3394486.3403055.mp4

mp4

126.1 MB

Download

References

D. Bahdanau, K. Cho, and Y. Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR).Google Scholar
T. Baltruaitis, C. Ahuja, and L. P. Morency. 2019. Multimodal machine learning: A survey and taxonomy. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 41(2). 423--443.Google ScholarDigital Library
K. Cho, B. Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724--1734.Google Scholar
E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart. 2016. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Advances in Neural Information Processing Systems (NeurIPS). 3504--3512.Google Scholar
Daniel Gartner, Rainer Kolisch, Daniel B Neill, and Rema Padman. 2015. Machine learning approaches for early DRG classification and resource allocation. INFORMS Journal on Computing, Vol. 27, 4 (2015), 718--734.Google ScholarDigital Library
A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwi'ska, and et al. 2016. Hybrid computing using a neural network with dynamic external memory. In Nature, Vol. 538(7626). 471.Google ScholarCross Ref
S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. In Neural Computation, Vol. 9(8). 1735--1780.Google ScholarDigital Library
B. Jin, H. Yang, L. Sun, C. Liu, Y. Qu, and J. Tong. 2018. A Treatment Engine by Predicting Next-Period Prescriptions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 1608--1616.Google Scholar
A. E. Johnson, T. J. Pollard, L. Shen, H. L. Li-wei, M. Feng, M. Ghassemi, and et al. 2016. MIMIC-III, a freely accessible critical care database. In Scientific Data, Vol. (3). 160035.Google Scholar
D. P. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).Google Scholar
T. N. Kipf and M. Welling. 2017. Semi-supervised classification with graph convolutional networks. In 5th International Conference on Learning Representations (ICLR).Google Scholar
H. Le, T. Tran, and S. Venkatesh. 2018. Memory fusion network for multi-view sequential learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 1637--1645.Google Scholar
T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 2980--2988.Google Scholar
F. Ma, R. Chitta, J. Zhou, Q. You, T. Sun, and J. Gao. 2017. Dipole: Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD). 1903--1911.Google Scholar
S. S. Rajagopalan, L. P. Morency, T. Baltrusaitis, and R. Goecke. 2016. Extending long short-term memory for multi-view structured learning. In European Conference on Computer Vision (ECCV). 338--353.Google Scholar
J. Shang, C. Xiao, T. Ma, H. Li, and J. Sun. 2019. Graph augmented memory networks for recommending medication combination. In Thirty-Third AAAI Conference on Artificial Intelligence (AAAI). 1126--1133.Google Scholar
H. Song, D. Rajan, J. J. Thiagarajan, and A. Spanias. 2018. Attend and diagnose: Clinical time series analysis using attention models. In Thirty-Second AAAI Conference on Artificial Intelligence. 4091--4098.Google Scholar
A. Vaswani, N. Shazeer, N. Shazeer, J. Uszkoreit, L. Jones, A. N. Gomez, and et al. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS). 5998--6008.Google Scholar
Zhaoxin Wang, Rui Liu, Ping Li, and Chenghua Jiang. 2014. Exploring the transition to DRGs in developing countries: a case study in Shanghai, China. Pakistan journal of medical sciences, Vol. 30, 2 (2014), 250.Google Scholar
J. Weston, S. Chopra, and A. Bordes. 2015. Memory networks. In 3rd International Conference on Learning Representations (ICLR).Google Scholar
C. Xu, D. Tao, and C. Xu. 2013. A survey on multi-view learning. In ArXiv. arXiv:1304.5634.Google Scholar
Y. Xu, S. Biswal, S. R. Biswal, K. O. Maher, and J. Sun. 2018. RAIM: Recurrent Attentive and Intensive Model of Multimodal Patient Monitoring Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD). 2565--2573.Google Scholar
Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 1480--1489.Google Scholar
A. Zadeh, P. P. Liang, N. Mazumder, S. Poria, E. Cambria, and L. P. Morency. 2018. Memory fusion network for multi-view sequential learning. In Thirty-Second AAAI Conference on Artificial Intelligence (AAAI). 5634--5641.Google Scholar
Y. Zhang, R. Chen, J. Tang, W. F. Stewart, and J. Sun. 2017. LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1315--1324.Google Scholar
J. Zhao, X. Xie, X. Xu, and S. Sun. 2017. Multi-view learning overview: Recent progress and new challenges. In Information Fusion, Vol. 38. 43--54.Google ScholarDigital Library

Index Terms

Attention and Memory-Augmented Networks for Dual-View Sequential Learning

Recommendations

Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

One of the core tasks in multi-view learning is to capture relations among views. For sequential data, the relations not only span across views, but also extend throughout the view length to form long-term intra-view and inter-view interactions. In this ...
Read More
Multi-Cast Attention Networks
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Attention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention Networks (MCAN), a ...
Read More
Memory fusion network for multi-view sequential learning
AAAI'18/IAAI'18/EAAI'18: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence

Multi-view sequential learning is a fundamental problem in machine learning dealing with multi-view sequences. In a multi-view sequence, there exists two forms of interactions between different views: view-specific interactions and cross-view ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
DRG classification
classification
dual-view sequential learning
dynamic external memory
history attention memory
inter-attention
intra-attention
invoice fraud detection
medication recommendation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 1,373
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Attention and Memory-Augmented Networks for Dual-View Sequential Learning

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning

Multi-Cast Attention Networks

Memory fusion network for multi-view sequential learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Attention and Memory-Augmented Networks for Dual-View Sequential Learning

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Dual Memory Neural Computer for Asynchronous Two-view Sequential Learning

Multi-Cast Attention Networks

Memory fusion network for multi-view sequential learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media