Zur Kurzanzeige

dc.contributor.author
Brunner, Gino
dc.contributor.author
Liu, Yang
dc.contributor.author
Pascual, Damian
dc.contributor.author
Richter, Oliver
dc.contributor.author
Ciaramita, Massimiliano
dc.contributor.author
Wattenhofer, Roger
dc.date.accessioned
2020-08-28T12:50:00Z
dc.date.available
2020-08-25T10:24:40Z
dc.date.available
2020-08-28T12:50:00Z
dc.date.issued
2020
dc.identifier.uri
http://hdl.handle.net/20.500.11850/432547
dc.description.abstract
In this paper we delve deep in the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that, for sequences longer than the attention head dimension, attention weights are not identifiable. We propose effective attention as a complementary tool for improving explanatory interpretations based on attention. Furthermore, we show that input tokens retain to a large degree their identity across the model. We also find evidence suggesting that identity information is mainly encoded in the angle of the embeddings and gradually decreases with depth. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to better understand and further investigate Transformer models.
en_US
dc.language.iso
en
en_US
dc.publisher
International Conference on Learning Representations
en_US
dc.subject
Attention
en_US
dc.subject
Generation
en_US
dc.subject
Interpretability
en_US
dc.subject
NLP
en_US
dc.subject
Self attention
en_US
dc.subject
Transformer
en_US
dc.title
On Identifiability in Transformers
en_US
dc.type
Conference Paper
ethz.size
35 p.
en_US
ethz.event
8th International Conference on Learning Representations (ICLR 2020) (virtual)
en_US
ethz.event.location
Addis Ababa, Ethiopia
en_US
ethz.event.date
April 26-30, 2020
en_US
ethz.notes
Due to the Corona virus (COVID-19) the conference was conducted virtually.
en_US
ethz.publication.place
s.l.
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03604 - Wattenhofer, Roger / Wattenhofer, Roger
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03604 - Wattenhofer, Roger / Wattenhofer, Roger
en_US
ethz.identifier.url
https://iclr.cc/virtual_2020/poster_BJg1f6EFDB.html#details
ethz.date.deposited
2020-08-25T10:24:59Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2020-08-28T12:50:11Z
ethz.rosetta.lastUpdated
2020-08-28T12:50:11Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=On%20Identifiability%20in%20Transformers&rft.date=2020&rft.au=Brunner,%20Gino&Liu,%20Yang&Pascual,%20Damian&Richter,%20Oliver&Ciaramita,%20Massimiliano&rft.genre=proceeding&rft.btitle=On%20Identifiability%20in%20Transformers
 Printexemplar via ETH-Bibliothek suchen

Dateien zu diesem Eintrag

DateienGrößeFormatIm Viewer öffnen

Zu diesem Eintrag gibt es keine Dateien.

Publikationstyp

Zur Kurzanzeige