PWESuite: Phonetic Word Embeddings and Tasks They Facilitate
OPEN ACCESS
Loading...
Author / Producer
Date
2023-04-05
Publication Type
Working Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Word embeddings that map words into a fixed-dimensional vector space are the backbone of modern NLP. Most word embedding methods encode semantic information. However, phonetic information, which is important for some tasks, is often overlooked. In this work, we develop several novel methods which leverage articulatory features to build phonetically informed word embeddings, and present a set of phonetic word embeddings to encourage their community development, evaluation and use. While several methods for learning phonetic word embeddings already exist, there is a lack of consistency in evaluating their effectiveness. Thus, we also proposes several ways to evaluate both intrinsic aspects of phonetic word embeddings, such as word retrieval and correlation with sound similarity, and extrinsic performances, such as rhyme and cognate detection and sound analogies. We hope that our suite of tasks will promote reproducibility and provide direction for future research on phonetic word embeddings.
Permanent link
Publication status
published
Editor
Book title
Journal / series
Volume
Pages / Article No.
2304.02541
Publisher
Cornell University
Event
Edition / version
v1
Methods
Software
Geographic location
Date collected
Date created
Subject
Computation and Language (cs.CL); FOS: Computer and information sciences
Organisational unit
09684 - Sachan, Mrinmaya / Sachan, Mrinmaya
Notes
Funding
Related publications and datasets
Is supplemented by: https://github.com/zouharvi/pwesuiteIs supplemented by: https://huggingface.co/datasets/zouharvi/pwesuite-eval