- Doctoral Thesis
Rights / licenseIn Copyright - Non-Commercial Use Permitted
Automatic summarization aims to reduce an input document to a compressed version that captures only its salient parts. It is a topic with growing importance in today's age of information overflow. There are two main types of automatic summarization. Extractive summarization only selects salient sentences from the input, while abstractive summarization generates a summary without explicitly re-using whole sentences, resulting in summaries are often more fluent. State-of-the-art approaches to abstractive summarization are data-driven, relying on the availability of large collections of paired articles with summaries. The pairs are typically manually constructed, a task which is costly and time-consuming. Furthermore, when targeting a slightly different domain or summary format, a new parallel dataset is often required. This large reliance on parallel resources limits the potential impact of abstractive summarization systems in society. In this thesis, we consider the problem of abstractive summarization from two different perspectives: high-resource and low-resource summarization. In the first part, we compare different methods for data-driven summarization, focusing specifically on the problem of generating long, abstractive summaries, such as an abstract for a scientific journal article. We discuss the difficulties that come with abstractive generation of long summaries and propose methods for alleviating them. In the second part of this thesis, we develop low-resource methods for abstractive text rewriting, first focusing on individual sentences and then on whole summaries. Our methods do not rely on parallel data, but instead utilize raw non-parallel text collections. In overall, this work makes a step towards data-driven abstractive summarization for the generation of long summaries, without having to rely on vast amounts of parallel, manually curated data. Show more
External linksSearch print copy at ETH Library
SubjectNatural Language Processing; Artificial Intelligence; Machine Learning
Organisational unit03774 - Hahnloser, Richard H.R. / Hahnloser, Richard H.R.
MoreShow all metadata