Detecting emphaisised spoken words by considering them prosodic outliers and taking advantage of HMM-based TTS framework
dc.contributor.author
Liang, Hui
dc.date.accessioned
2022-08-15T11:24:36Z
dc.date.available
2017-06-11T21:09:53Z
dc.date.available
2022-08-15T11:24:36Z
dc.date.issued
2015-07
dc.identifier.uri
http://hdl.handle.net/20.500.11850/106838
dc.description.abstract
A fresh approach to detecting emphasised spoken words, where the concept of one-class classification is adopted, is investigated in this research work, such that a major difficulty – collecting a large amount of well-annotated training data containing emphasis – can be avoided. The key idea, in brief, is that after rich context-dependent phone models are trained on common, neutrally read speech data with the help of the HMM-based speech synthesis framework, emphasised words are considered prosodic outliers with respect to these “neutral” phone models and thus get detected. Experiments were conducted on speech data in the German language without any simplifying assumption (e.g. there was only one emphasised word in each utterance). Under many conditions this universally applicable approach was found to outperform totally random guessing, even though the emphasised words constituted only a small portion (i.e. 6.28%) of the test set. In addition, an optimal set of configuration parameters which applied to all the test speakers was not observed. For better performance, collecting a small amount of development data containing emphasis from a target speaker and then optimising the proposed detector in a “speaker-adaptive” manner is presumably necessary.
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich, Computer Engineering and Networks Laboratory
en_US
dc.subject
Emphasis detection
en_US
dc.subject
Prosodic outlier
en_US
dc.subject
Rich context modelling
en_US
dc.subject
HMM-based speech synthesis
en_US
dc.title
Detecting emphaisised spoken words by considering them prosodic outliers and taking advantage of HMM-based TTS framework
en_US
dc.type
Report
ethz.journal.title
TIK Report
ethz.journal.volume
362
en_US
ethz.size
6 p.
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)
ethz.relation.isPreviousVersionOf
20.500.11850/116354
ethz.date.deposited
2017-06-11T21:10:47Z
ethz.source
ECIT
ethz.identifier.importid
imp593653af3dee513568
ethz.ecitpid
pub:167253
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2017-07-13T15:27:58Z
ethz.rosetta.lastUpdated
2023-02-07T05:18:45Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Detecting%20emphaisised%20spoken%20words%20by%20considering%20them%20prosodic%20outliers%20and%20taking%20advantage%20of%20HMM-based%20TTS%20framework&rft.jtitle=TIK%20Report&rft.date=2015-07&rft.volume=362&rft.au=Liang,%20Hui&rft.genre=report&
Files in this item
Files | Size | Format | Open in viewer |
---|---|---|---|
There are no files associated with this item. |
Publication type
-
Report [6584]