Algorithms for Decoding Probabilistic Language Generators: Beam Search and Locally Typical Sampling


Loading...

Date

2024

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Natural language generation (NLG) systems overwhelmingly rely on probabilistic models to approximate task-specific distributions over natural language strings. The majority of these models are auto-regressive and locally normalized, producing probability distributions over the next token given prior context. At inference time, the user must decide how to use such a distribution to generate natural language strings. Beam search is a widely used approximation algorithm for finding the highest probability string according to such distributions. It has been the go-to tool for decoding probabilistic models in numerous generation tasks, e.g., machine translation, abstractive summarization and constrained decoding. Yet at times, it exhibits notable variability in output quality, computational inefficiency, and lack of diversity. This thesis first aims to better understand beam search's success. We identify an inductive bias inherent in beam search, leading us to propose that its success is due to its implicit enforcement of uniform information density---a property linked to psycholinguistic theories---in generated text. We then address three limitations of standard beam search: its inefficiency, its tendency to produce sets with low diversity, and its deterministic nature. To address the first limitation, we introduce a more efficient variant of beam search, which frames the algorithm as an agenda-based process and employs best-first prioritization; this approach reduces computational cost by eliminating unnecessary path exploration. We next show how each generation step in beam search can be formulated as a subdeterminant maximization problem, and how this framing allows us to optimize for set-level characteristics (such as diversity) in a principled fashion. We further develop a stochastic generalization of beam search, which facilitates the generation of diverse samples and enables the construction of statistically consistent estimators for expectations under the model. We provide empirical evidence for the effectiveness of these new techniques in improving the efficiency, diversity, and adaptability of beam search as a decoding algorithm for NLG tasks. In the last part of this thesis, we use our insights about the properties of effective decoding strategies to propose a new decoding algorithm---one that is designed to produce text that mimics information content patterns in human communication. We observe that this algorithm leads to high-quality text, consistently reducing the degenerate repetitions that probabilistic language generators are known to occasionally produce under other decoding strategies. The methods proposed herein offer valuable tools for researchers and practitioners in their effort to create better probabilistic language generators.

Publication status

published

Editor

Contributors

Examiner : Hofmann, Thomas
Examiner : Krause, Andreas
Examiner : Beinborn, Lisa
Examiner : Sennrich, Rico

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Natural Language Generation; Natural Language Processing; Language Models

Organisational unit

09462 - Hofmann, Thomas / Hofmann, Thomas check_circle

Notes

Funding

Related publications and datasets