Metadata only
Date
2021Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
Word embeddings have gained increasing popularity in the recent years due to the Word2vec library and its extension fastText that uses subword information. In this paper, we aim at improving the execution speed of fastText training on homo-geneous multi- and manycore CPUs while maintaining accuracy. We present a novel open-source implementation that flexibly incorporates various algorithmic variants including negative sample sharing, batched updates, and a byte-pair encoding-based alternative for subword units. We build these novel variants over a fastText implementation that we carefully optimized for the architecture, memory hierarchy, and parallelism of current manycore CPUs. Our experiments on three languages demon-strate 3–20 x speed-up in training time at competitive semantic and syntactic accuracy. Show more
Publication status
publishedExternal links
Book title
2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC)Pages / Article No.
Publisher
IEEEEvent
Subject
Machine learning; Natural language processing; Parallel computing; Performance; Word2vec; FasttextOrganisational unit
03893 - Püschel, Markus / Püschel, Markus
More
Show all metadata
ETH Bibliography
yes
Altmetrics