Detecting emphaisised spoken words by considering them prosodic outliers and taking advantage of HMM-based TTS framework
Abstract
A fresh approach to detecting emphasised spoken words, where the concept of one-class classification is adopted, is investigated in this research work, such that a major difficulty – collecting a large amount of well-annotated training data containing emphasis – can be avoided. The key idea, in brief, is that after rich context-dependent phone models are trained on common, neutrally read speech data with the help of the HMM-based speech synthesis framework, emphasised words are considered prosodic outliers with respect to these “neutral” phone models and thus get detected. Experiments were conducted on speech data in the German language without any simplifying assumption (e.g. there was only one emphasised word in each utterance). Under many conditions this universally applicable approach was found to outperform totally random guessing, even though the emphasised words constituted only a small portion (i.e. 6.28%) of the test set. In addition, an optimal set of configuration parameters which applied to all the test speakers was not observed. For better performance, collecting a small amount of development data containing emphasis from a target speaker and then optimising the proposed detector in a “speaker-adaptive” manner is presumably necessary. Mehr anzeigen
Publikationsstatus
publishedZeitschrift / Serie
TIK ReportBand
Verlag
ETH Zurich, Computer Engineering and Networks LaboratoryThema
Emphasis detection; Prosodic outlier; Rich context modelling; HMM-based speech synthesisOrganisationseinheit
03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)
Zugehörige Publikationen und Daten
Is previous version of: http://hdl.handle.net/20.500.11850/116354
ETH Bibliographie
yes
Altmetrics