Metadata only
Date
2021Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of oscillators. The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem. A suite of experiments are presented to demonstrate that the proposed RNN provides state of the art performance on a variety of learning tasks with (very) long-time dependencies. Show more
Publication status
publishedExternal links
Book title
Proceedings of the 38th International Conference on Machine LearningJournal / series
Proceedings of Machine Learning ResearchVolume
Pages / Article No.
Publisher
PMLREvent
Organisational unit
03851 - Mishra, Siddhartha / Mishra, Siddhartha
Funding
770880 - Computation and analysis of statistical solutions of fluid flow (EC)
More
Show all metadata
ETH Bibliography
yes
Altmetrics