error
Kurzer Serviceunterbruch am Donnerstag, 11. Dezember 2025, 12 bis 13 Uhr. Sie können in diesem Zeitraum keine neuen Dokumente hochladen oder bestehende Einträge bearbeiten. Das Login wird in diesem Zeitraum deaktiviert. Grund: Wartungsarbeiten // Short service interruption on Thursday, December 11, 2025, 12.00 – 13.00. During this time, you won’t be able to upload new documents or edit existing records. The login will be deactivated during this time. Reason: maintenance work
 

Intrinsic Probing through Dimension Selection


Loading...

Date

2020-11

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks. Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it. In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted. To enable intrinsic probing, we propose a novel framework based on a decomposable multivariate Gaussian probe that allows us to determine whether the linguistic information in word embeddings is dispersed or focal. We then probe fastText and BERT for various morphosyntactic attributes across 36 languages. We find that most attributes are reliably encoded by only a few neurons, with fastText concentrating its linguistic structure more than BERT.

Publication status

published

Book title

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Journal / series

Volume

Pages / Article No.

197 - 216

Publisher

Association for Computational Linguistics

Event

Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09682 - Cotterell, Ryan / Cotterell, Ryan check_circle

Notes

Due to the Coronavirus (COVID-19) the conference was conducted virtually.

Funding

Related publications and datasets