Open access
Author
Date
2023-11-07Type
- Bachelor Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Cerebras Wafer-Scale Engine (WSE) is a powerful architecture used initially for machine learning training but now also for a larger variety of workloads. To achieve the best possible performance on the hardware it is crucial to have efficient reduce and allreduce communication collectives. We therefore provide the first systematic investigation of the reduce operation on the Cerebras WSE using the Cerebras SDK. The provided implementations are up to 5.1x faster than the current library implementation. We show that using at most three different implementations we can achieve performance at most 1.38x slower than an optimal reduction tree. We extend those methods to an allreduce, which outperforms classical patterns like ring or butterfly by up to 2x. Finally, we show how we can utilize those algorithms to speed up GEMV. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000644033Publication status
publishedPublisher
ETH ZurichOrganisational unit
03950 - Hoefler, Torsten / Hoefler, Torsten
More
Show all metadata
ETH Bibliography
yes
Altmetrics