Search
Results
-
Disambiguatory Signals are Stronger in Word-initial Positions
(2021)Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main VolumePsycholinguistic studies of human word processing and lexical access provide ample evidence of the preferred nature of word-initial versus word-final segments, e.g., in terms of attention paid by listeners (greater) or the likelihood of reduction by speakers (lower). This has led to the conjecture—as in Wedel et al. (2019b), but common elsewhere—that languages have evolved to provide more information earlier in words than later. ...Conference Paper -
Applying the Transformer to Character-level Transduction
(2021)Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main VolumeThe transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. Yet for character-level transduction tasks, e.g. morphological inflection generation and historical text normalization, there are few works that outperform recurrent models using the transformer. In an empirical study, we uncover that, in contrast to recurrent sequence-to-sequence models, the batch size ...Conference Paper -
A Cognitive Regularizer for Language Modeling
(2021)Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language ProcessingThe uniform information density (UID) hypothesis, which posits that speakers behaving optimally tend to distribute information uniformly across a linguistic signal, has gained traction in psycholinguistics as an explanation for certain syntactic, morphological, and prosodic choices. In this work, we explore whether the UID hypothesis can be operationalized as an inductive bias for statistical language modeling. Specifically, we augment ...Conference Paper -
Determinantal Beam Search
(2021)Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language ProcessingBeam search is a go-to strategy for decoding neural sequence models. The algorithm can naturally be viewed as a subset optimization problem, albeit one where the corresponding set function does not reflect interactions between candidates. Empirically, this leads to sets often exhibiting high overlap, e.g., strings may differ by only a single word. Yet in use-cases that call for multiple solutions, a diverse or representative set is often ...Conference Paper -
Finding Concept-specific Biases in Form–Meaning Associations
(2021)Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language TechnologiesThis work presents an information-theoretic operationalisation of cross-linguistic non-arbitrariness. It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words. For instance, it has been claimed (Blasi et al., 2016) that the word for “tongue” is more likely than chance to contain the phone [l]. By controlling for the influence of language family and geographic proximity within a very ...Conference Paper -
Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing
(2021)Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language TechnologiesAnalysing whether neural language models encode linguistic information has become popular in NLP. One method of doing so, which is frequently cited to support the claim that models like BERT encode syntax, is called probing; probes are small supervised models trained to extract linguistic information from another model’s output. If a probe is able to predict a particular structure, it is argued that the model whose output it is trained ...Conference Paper -
What About the Precedent: An Information-Theoretic Analysis of Common Law
(2021)Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language TechnologiesIn common law, the outcome of a new case is determined mostly by precedent cases, rather than by existing statutes. However, how exactly does the precedent influence the outcome of a new case? Answering this question is crucial for guaranteeing fair and consistent judicial decision-making. We are the first to approach this question computationally by comparing two longstanding jurisprudential views; Halsbury’s, who believes that the ...Conference Paper -
How (Non-)Optimal is the Lexicon?
(2021)Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language TechnologiesThe mapping of lexical meanings to wordforms is a major feature of natural languages. While usage pressures might assign short words to frequent meanings (Zipf’s law of abbreviation), the need for a productive and open-ended vocabulary, local constraints on sequences of symbols, and various other factors all shape the lexicons of the world’s languages. Despite their importance in shaping lexical structure, the relative contributions of ...Conference Paper -
A Non-Linear Structural Probe
(2021)Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language TechnologiesProbes are models devised to investigate the encoding of knowledge—e.g. syntactic structure—in contextual representations. Probes are often designed for simplicity, which has led to restrictions on probe design that may not allow for the full exploitation of the structure of encoded information; one such restriction is linearity. We examine the case of a structural probe (Hewitt and Manning, 2019), which aims to investigate the encoding ...Conference Paper -
Modeling the Unigram Distribution
(2021)Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021The unigram distribution is the non-contextual probability of finding a specific word form in a corpus. While of central importance to the study of language, it is commonly approximated by each word’s sample frequency in the corpus. This approach, being highly dependent on sample size, assigns zero probability to any out-of-vocabulary (oov) word form. As a result, it produces negatively biased probabilities for any oov word form, while ...Conference Paper