On-device audio-visual multi-person wake word spotting

Li, Yidi; Wang, Guoquan; Chen, Zhan; Tang, Hao; Liu, Hong

doi:10.1049/cit2.12189

Download

Full text (published version) (PDF, 2.683Mb)

Open access

Author

Date

2023-12

Type

Journal Article

ETH Bibliography

yes

Altmetrics

Download

Full text (published version) (PDF, 2.683Mb)

Rights / license

Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

Abstract

Audio-visual wake word spotting is a challenging multi-modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance. However, most audio-visual wake word spotting models are only suitable for simple single-speaker scenario Show more

Permanent link

https://doi.org/10.3929/ethz-b-000603879

Publication status

published

External links

https://doi.org/10.1049/cit2.12189

Journal / series

CAAI Transactions on Intelligence Technology

Volume

8(4)

Pages / Article No.

1578 - 1589

Publisher

Institution of Engineering and Technology

Subject

audio-visual fusion; human-computer interfacing; speech processing

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

On-device audio-visual multi-person wake word spotting Mendeley CSV RIS BibTeX

On-device audio-visual multi-person wake word spotting

Mendeley

CSV

RIS

BibTeX