Communication Collectives for the Cerebras Wafer-Scale Engine
OPEN ACCESS
Loading...
Author / Producer
Date
2023-11-07
Publication Type
Bachelor Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Cerebras Wafer-Scale Engine (WSE) is a powerful architecture used initially for machine learning training but now also for a larger variety of workloads. To achieve the best possible performance on the hardware it is crucial to have efficient reduce and allreduce communication collectives. We therefore provide the first systematic investigation of the reduce operation on the Cerebras WSE using the Cerebras SDK. The provided implementations are up to 5.1x faster than the current library implementation. We show that using at most three different implementations we can achieve performance at most 1.38x slower than an optimal reduction tree. We extend those methods to an allreduce, which outperforms classical patterns like ring or butterfly by up to 2x. Finally, we show how we can utilize those algorithms to speed up GEMV.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Gianinazzi, Lukas
Examiner : Iff, Patrick
Examiner : Hoefler, Torsten
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03950 - Hoefler, Torsten / Hoefler, Torsten