A surprisal–duration trade-off across and within the world’s languages


Loading...

Date

2021-11

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

While there exist scores of natural languages, each with its unique features and idiosyncrasies, they all share a unifying theme: enabling human communication. We may thus reasonably predict that human cognition shapes how these languages evolve and are used. Assuming that the capacity to process information is roughly constant across human populations, we expect a surprisal–duration trade-off to arise both across and within languages. We analyse this trade-off using a corpus of 600 languages and, after controlling for several potential confounds, we find strong supporting evidence in both settings. Specifically, we find that, on average, phones are produced faster in languages where they are less surprising, and vice versa. Further, we confirm that more surprising phones are longer, on average, in 319 languages out of the 600. We thus conclude that there is strong evidence of a surprisal–duration trade-off in operation, both across and within the world’s languages.

Publication status

published

Book title

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Journal / series

Volume

Pages / Article No.

949 - 962

Publisher

Association for Computational Linguistics

Event

Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

09682 - Cotterell, Ryan / Cotterell, Ryan check_circle

Notes

Funding

Related publications and datasets