Evolving Attention with Residual Convolutions


METADATA ONLY
Loading...

Date

2021-07

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However, they are learned without explicit interactions in each layer and sometimes fail to capture reasonable patterns. In this paper, we propose a novel and generic mechanism based on evolving attention try improve the performance of transformers. On one hand, the attention maps in different layers share common knowledge, thus the ones in preceding layers can instruct the learning of attention in succeeding layers through residual connections. On the other hand, low-level and high-level attentions vary in the levels of abstraction, so we adopt additional convolutional layers to capture the evolutionary process of attention maps. The proposed evolving attention mechanism achieves significant performance improvement over various state-of-the-art models for multiple tasks, including image classification, natural language understanding and machine translation.

Publication status

published

Book title

Proceedings of the 38th International Conference on Machine Learning

Journal / series

Proceedings of Machine Learning Research

Volume

139

Pages / Article No.

10971 - 10980

Publisher

PMLR

Event

38th International Conference on Machine Learning (ICML 2021)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09588 - Zhang, Ce (ehemalig) / Zhang, Ce (former) check_circle

Notes

Funding

Related publications and datasets