Algorithms for Decoding Probabilistic Language Generators: Beam Search and Locally Typical Sampling
OPEN ACCESS
Loading...
Author / Producer
Date
2024
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Natural language generation (NLG) systems overwhelmingly rely on probabilistic models to approximate task-specific distributions over natural language strings. The majority of these models are auto-regressive and locally normalized, producing probability distributions over the next token given prior context. At inference time, the user must decide how to use such a distribution to generate natural language strings.
Beam search is a widely used approximation algorithm for finding the highest probability string according to such distributions. It has been the go-to tool for decoding probabilistic models in numerous generation tasks, e.g., machine translation, abstractive summarization and constrained decoding. Yet at times, it exhibits notable variability in output quality, computational inefficiency, and lack of diversity. This thesis first aims to better understand beam search's success. We identify an inductive bias inherent in beam search, leading us to propose that its success is due to its implicit enforcement of uniform information density---a property linked to psycholinguistic theories---in generated text. We then address three limitations of standard beam search: its inefficiency, its tendency to produce sets with low diversity, and its deterministic nature. To address the first limitation, we introduce a more efficient variant of beam search, which frames the algorithm as an agenda-based process and employs best-first prioritization; this approach reduces computational cost by eliminating unnecessary path exploration. We next show how each generation step in beam search can be formulated as a subdeterminant maximization problem, and how this framing allows us to optimize for set-level characteristics (such as diversity) in a principled fashion. We further develop a stochastic generalization of beam search, which facilitates the generation of diverse samples and enables the construction of statistically consistent estimators for expectations under the model. We provide empirical evidence for the effectiveness of these new techniques in improving the efficiency, diversity, and adaptability of beam search as a decoding algorithm for NLG tasks. In the last part of this thesis, we use our insights about the properties of effective decoding strategies to propose a new decoding algorithm---one that is designed to produce text that mimics information content patterns in human communication. We observe that this algorithm leads to high-quality text, consistently reducing the degenerate repetitions that probabilistic language generators are known to occasionally produce under other decoding strategies. The methods proposed herein offer valuable tools for researchers and practitioners in their effort to create better probabilistic language generators.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Hofmann, Thomas
Examiner : Krause, Andreas
Examiner : Beinborn, Lisa
Examiner : Sennrich, Rico
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Natural Language Generation; Natural Language Processing; Language Models
Organisational unit
09462 - Hofmann, Thomas / Hofmann, Thomas