Metadata only
Date
2024Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
To ensure that text generated by large language models (LLMs) is in an expected format, constrained decoding proposes to enforce strict formal language constraints during generation. However, as we show in this work, not only do such methods incur performance overhead during generation, but many of them also significantly impair task accuracy, if they do not correctly align the underlying LLM sub-word vocabularies with external constraints. To address this, we present a novel decoding algorithm, DOMINO, that can enforce constraints in a fully subword-aligned fashion, while leveraging pre-computation and speculative decoding to achieve virtually no overhead and in some cases even almost 2x speedup over unconstrained decoding - thereby outperforming existing approaches by a wide margin. We release DOMINO as open source on GitHub. Show more
Publication status
publishedExternal links
Editor
Book title
Proceedings of the 41st International Conference on Machine LearningJournal / series
Proceedings of Machine Learning ResearchVolume
Pages / Article No.
Publisher
PMLREvent
Related publications and datasets
Is variant form of: https://openreview.net/forum?id=pXaEYzrFae
More
Show all metadata
ETH Bibliography
yes
Altmetrics