Telling BERT's Full Story: from Local Attention to Global Aggregation

Pascual, Damian; Brunner, Gino; Wattenhofer, Roger

doi:10.3929/ethz-b-000496002

Download

Full text (published version) (PDF, 2.397Mb)

Open access

Author

Pascual, Damian

Brunner, Gino

Wattenhofer, Roger

Date

2021-04

Type

Conference Paper

ETH Bibliography

yes

Altmetrics

Download

Full text (published version) (PDF, 2.397Mb)

Rights / license

Creative Commons Attribution 4.0 International

Abstract

We take a deep look into the behaviour of self-attention heads in the transformer architecture. In light of recent work discouraging the use of attention distributions for explaining a model’s behaviour, we show that attention distributions can nevertheless provide insights into the local behaviour Show more

Permanent link

https://doi.org/10.3929/ethz-b-000496002

Publication status

published

External links

https://aclanthology.org/2021.eacl-main.9

Editor

Merlo, Paola

Tiedemann, Jörg

Tsarfaty, Reut

Book title

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume