Open access
Date
2022-07Type
- Journal Article
ETH Bibliography
yes
Altmetrics
Abstract
In this work we introduce KL-TRANSFORMER, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space complexity of Transformers from quadratic to linear. We show that KL-TRANSFORMERs achieve performance comparable to existing efficient Transformer architectures, both in terms of accuracy and computational efficiency. Our study also demonstrates that the choice of the kernel has a substantial impact on performance, and kernel learning variants are competitive alternatives to fixed kernel Transformers, both in long as well as short sequence tasks. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000592500Publication status
publishedExternal links
Journal / series
Transactions on Machine Learning ResearchPublisher
OpenReviewOrganisational unit
09684 - Sachan, Mrinmaya / Sachan, Mrinmaya
More
Show all metadata
ETH Bibliography
yes
Altmetrics