Detecting emphaisised spoken words by considering them prosodic outliers and taking advantage of HMM-based TTS framework
METADATA ONLY
Loading...
Author / Producer
Date
2015-07
Publication Type
Report
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
A fresh approach to detecting emphasised spoken words, where the concept of one-class classification is adopted, is investigated in this research work, such that a major difficulty – collecting a large amount of well-annotated training data containing emphasis – can be avoided. The key idea, in brief, is that after rich context-dependent phone models are trained on common, neutrally read speech data with the help of the HMM-based speech synthesis framework, emphasised words are considered prosodic outliers with respect to these “neutral” phone models and thus get detected. Experiments were conducted on speech data in the German language without any simplifying assumption (e.g. there was only one emphasised word in each utterance). Under many conditions this universally applicable approach was found to outperform totally random guessing, even though the emphasised words constituted only a small portion (i.e. 6.28%) of the test set. In addition, an optimal set of configuration parameters which applied to all the test speakers was not observed. For better performance, collecting a small amount of development data containing emphasis from a target speaker and then optimising the proposed detector in a “speaker-adaptive” manner is presumably necessary.
Permanent link
Publication status
published
External links
Editor
Book title
Journal / series
Volume
362
Pages / Article No.
Publisher
ETH Zurich, Computer Engineering and Networks Laboratory
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Emphasis detection; Prosodic outlier; Rich context modelling; HMM-based speech synthesis
Organisational unit
03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)
Notes
Funding
Related publications and datasets
Is previous version of: