Estimating the Entropy of Linguistic Distributions
dc.contributor.author
Arora, Aryaman
dc.contributor.author
Meister, Clara
dc.contributor.author
Cotterell, Ryan
dc.date.accessioned
2022-08-29T06:32:14Z
dc.date.available
2022-08-20T03:24:13Z
dc.date.available
2022-08-22T08:54:53Z
dc.date.available
2022-08-29T06:32:14Z
dc.date.issued
2022
dc.identifier.isbn
978-1-955917-22-3
en_US
dc.identifier.other
10.18653/v1/2022.acl-short.20
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/565114
dc.identifier.doi
10.3929/ethz-b-000565114
dc.description.abstract
Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language. However, entropy must typically be estimated from observed data because researchers do not have access to the underlying probability distribution that gives rise to these data. While entropy estimation is a well-studied problem in other fields, there is not yet a comprehensive exploration of the efficacy of entropy estimators for use with linguistic data. In this work, we fill this void, studying the empirical effectiveness of different entropy estimators for linguistic distributions. In a replication of two recent information-theoretic linguistic studies, we find evidence that the reported effect size is over-estimated due to over-reliance on poor entropy estimators. Finally, we end our paper with concrete recommendations for entropy estimation depending on distribution type and data availability.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Association for Computational Linguistics
en_US
dc.rights.uri
http://creativecommons.org/licenses/by-nc-sa/3.0/
dc.title
Estimating the Entropy of Linguistic Distributions
en_US
dc.type
Conference Paper
dc.rights.license
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported
ethz.book.title
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics
en_US
ethz.journal.volume
2
en_US
ethz.pages.start
175
en_US
ethz.pages.end
195
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.event
60th Annual Meeting of the Association for Computational Linguistics (ACL 2022)
ethz.event.location
Dublin, Ireland
ethz.event.date
May 22-27, 2022
ethz.identifier.wos
ethz.publication.place
Stroudsburg, PA
ethz.publication.status
published
en_US
ethz.date.deposited
2022-08-20T03:25:07Z
ethz.source
WOS
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2022-08-22T08:55:02Z
ethz.rosetta.lastUpdated
2024-02-02T17:56:37Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Estimating%20the%20Entropy%20of%20Linguistic%20Distributions&rft.date=2022&rft.volume=2&rft.spage=175&rft.epage=195&rft.au=Arora,%20Aryaman&Meister,%20Clara&Cotterell,%20Ryan&rft.isbn=978-1-955917-22-3&rft.genre=proceeding&rft_id=info:doi/10.18653/v1/2022.acl-short.20&rft.btitle=Proceedings%20of%20the%2060th%20Annual%20Meeting%20of%20the%20Association%20for%20Computational%20Linguistics
Files in this item
Publication type
-
Conference Paper [35277]