Long Expressive Memory for Sequence Modeling
METADATA ONLY
Loading...
Author / Producer
Date
2022
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models.
Permanent link
Publication status
published
Editor
Book title
The Tenth International Conference on Learning Representations (ICLR 2022)
Journal / series
Volume
Pages / Article No.
Publisher
OpenReview
Event
10th International Conference on Learning Representations (ICLR 2022)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03851 - Mishra, Siddhartha / Mishra, Siddhartha
Notes
Spotlight presentation held on April 27, 2022.
Funding
770880 - Computation and analysis of statistical solutions of fluid flow (EC)
Related publications and datasets
Is new version of: