Communication Collectives for the Cerebras Wafer-Scale Engine


Loading...

Author / Producer

Date

2023-11-07

Publication Type

Bachelor Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Cerebras Wafer-Scale Engine (WSE) is a powerful architecture used initially for machine learning training but now also for a larger variety of workloads. To achieve the best possible performance on the hardware it is crucial to have efficient reduce and allreduce communication collectives. We therefore provide the first systematic investigation of the reduce operation on the Cerebras WSE using the Cerebras SDK. The provided implementations are up to 5.1x faster than the current library implementation. We show that using at most three different implementations we can achieve performance at most 1.38x slower than an optimal reduction tree. We extend those methods to an allreduce, which outperforms classical patterns like ring or butterfly by up to 2x. Finally, we show how we can utilize those algorithms to speed up GEMV.

Publication status

published

External links

Editor

Contributors

Examiner : Gianinazzi, Lukas
Examiner : Iff, Patrick
Examiner : Hoefler, Torsten

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03950 - Hoefler, Torsten / Hoefler, Torsten

Notes

Funding

Related publications and datasets