Entropy-based Sampling for Abstractive Multi-document Summarization in Low-resource Settings
Loading...
Author / Producer
Date
2023
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
Data
Rights / License
Abstract
Research in Multi-document Summarization (MDS) mostly focuses on the English language and depends on large MDS datasets that are not available for other languages. Some of these approaches concatenate the source documents, resulting in overlong model inputs. Existing transformer architectures are unable to process such long inputs entirely, omitting documents in the summarization process. Other solutions address this issue by implementing multi-stage approaches that also require changes in the model architecture. In this paper, we introduce various sampling approaches based on infor- mation entropy that allow us to perform MDS in a single stage. These approaches also con- sider all source documents without using MDS training data nor changing the model’s archi- tecture. Besides, we build a MDS test set of German news articles to assess the performance of our methods on abstractive multi-document summaries. Experimental results show that our entropy-based approaches outperform previous state-of-the-art on German MDS, while still re- maining primarily abstractive. We release our code and MDS test set to encourage further research in German abstractive MDS.
Permanent link
Publication status
published
External links
Book title
Proceedings of the 16th International Natural Language Generation Conference
Journal / series
Volume
Pages / Article No.
123 - 133
Publisher
Association for Computational Linguistics
Event
16th International Natural Language Generation Conference (INGL 2023)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
02154 - Media Technology Center (MTC) / Media Technology Center (MTC)