SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German
OPEN ACCESS
Author / Producer
Date
2021-03-21
Publication Type
Working Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Swiss German is a dialect continuum whose natively acquired dialects significantly differ from the formal variety of the language. These dialects are mostly used for verbal communication and do not have standard orthography. This has led to a lack of annotated datasets, rendering the use of many NLP methods infeasible. In this paper, we introduce the first annotated parallel corpus of spoken Swiss German across 8 major dialects, plus a Standard German reference. Our goal has been to create and to make available a basic dataset for employing data-driven NLP applications in Swiss German. We present our data collection procedure in detail and validate the quality of our corpus by conducting experiments with the recent neural models for speech synthesis.
Permanent link
Publication status
published
External links
Editor
Book title
Journal / series
Volume
Pages / Article No.
2103.11401
Publisher
Cornell University
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Computation and Language
Organisational unit
03420 - Gross, Markus (emeritus) / Gross, Markus (emeritus)
02154 - Media Technology Center (MTC) / Media Technology Center (MTC)