PWESuite: Phonetic Word Embeddings and Tasks They Facilitate


Loading...

Date

2023-04-05

Publication Type

Working Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Word embeddings that map words into a fixed-dimensional vector space are the backbone of modern NLP. Most word embedding methods encode semantic information. However, phonetic information, which is important for some tasks, is often overlooked. In this work, we develop several novel methods which leverage articulatory features to build phonetically informed word embeddings, and present a set of phonetic word embeddings to encourage their community development, evaluation and use. While several methods for learning phonetic word embeddings already exist, there is a lack of consistency in evaluating their effectiveness. Thus, we also proposes several ways to evaluate both intrinsic aspects of phonetic word embeddings, such as word retrieval and correlation with sound similarity, and extrinsic performances, such as rhyme and cognate detection and sound analogies. We hope that our suite of tasks will promote reproducibility and provide direction for future research on phonetic word embeddings.

Publication status

published

Editor

Book title

Journal / series

Volume

Pages / Article No.

2304.02541

Publisher

Cornell University

Event

Edition / version

v1

Methods

Software

Geographic location

Date collected

Date created

Subject

Computation and Language (cs.CL); FOS: Computer and information sciences

Organisational unit

09684 - Sachan, Mrinmaya / Sachan, Mrinmaya check_circle

Notes

Funding

Related publications and datasets