Show simple item record

dc.contributor.author
Orvieto, Antonio
dc.contributor.author
Smith, Samuel L.
dc.contributor.author
Gu, Albert
dc.contributor.author
Fernando, Anushan
dc.contributor.author
Gulcehre, Caglar
dc.contributor.author
Pascanu, Razvan
dc.contributor.author
De, Soham
dc.contributor.editor
Krause, Andreas
dc.contributor.editor
Brunskill, Emma
dc.contributor.editor
Cho, Kyunghyun
dc.contributor.editor
Engelhardt, Barbara
dc.contributor.editor
Sabato, Sivan
dc.contributor.editor
Scarlett, Jonathan
dc.date.accessioned
2024-01-17T11:11:22Z
dc.date.available
2024-01-16T11:56:35Z
dc.date.available
2024-01-17T11:11:22Z
dc.date.issued
2023
dc.identifier.issn
2640-3498
dc.identifier.uri
http://hdl.handle.net/20.500.11850/653177
dc.description.abstract
Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important differences that make it unclear where their performance boost over RNNs comes from. We show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks, while matching their training speed. To achieve this, we analyze and ablate a series of changes to standard RNNs including linearizing and diagonalizing the recurrence, using better parameterizations and initializations, and ensuring careful normalization of the forward pass. Our results provide new insights on the origins of the impressive performance of deep SSMs, and introduce an RNN block called the Linear Recurrent Unit (or LRU) that matches both their performance on the Long Range Arena benchmark and their computational efficiency.
en_US
dc.language.iso
en
en_US
dc.publisher
PMLR
en_US
dc.title
Resurrecting Recurrent Neural Networks for Long Sequences
en_US
dc.type
Conference Paper
ethz.book.title
Proceedings of the 40th International Conference on Machine Learning
en_US
ethz.journal.title
Proceedings of Machine Learning Research
ethz.journal.volume
202
en_US
ethz.pages.start
26670
en_US
ethz.pages.end
26698
en_US
ethz.event
40th International Conference on Machine Learning (ICML 2023)
en_US
ethz.event.location
Honolulu, HI, USA
en_US
ethz.event.date
July 23-29, 2023
en_US
ethz.publication.place
Cambridge, MA
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09462 - Hofmann, Thomas / Hofmann, Thomas
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02661 - Institut für Maschinelles Lernen / Institute for Machine Learning::09462 - Hofmann, Thomas / Hofmann, Thomas
en_US
ethz.identifier.url
https://proceedings.mlr.press/v202/orvieto23a.html
ethz.date.deposited
2024-01-16T11:56:36Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2024-01-17T11:11:23Z
ethz.rosetta.lastUpdated
2024-01-17T11:11:23Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Resurrecting%20Recurrent%20Neural%20Networks%20for%20Long%20Sequences&rft.jtitle=Proceedings%20of%20Machine%20Learning%20Research&rft.date=2023&rft.volume=202&rft.spage=26670&rft.epage=26698&rft.issn=2640-3498&rft.au=Orvieto,%20Antonio&Smith,%20Samuel%20L.&Gu,%20Albert&Fernando,%20Anushan&Gulcehre,%20Caglar&rft.genre=proceeding&rft.btitle=Proceedings%20of%20the%2040th%20International%20Conference%20on%20Machine%20Learning
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record