StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems
Abstract
Spatial computing devices have been shown to significantly accelerate stencil computations, but have so far relied on unrolling the iterative dimension of a single stencil operation to increase temporal locality. This work considers the general case of mapping directed acyclic graphs of heterogeneous stencil computations to spatial computing systems, assuming large input programs without an iterative component. StencilFlow maximizes temporal locality and ensures deadlock freedom in this setting, providing end-to-end analysis and mapping from a high-level program description to distributed hardware. We evaluate our generated architectures on a Stratix 10 FPGA testbed, yielding 1.31 TOp/s and 4.18 TOp/s on single-device and multi-device, respectively, demonstrating the highest performance recorded for stencil programs on FPGAs to date. We then leverage the framework to study a complex stencil program from a production weather simulation application. Our work enables productively targeting distributed spatial computing systems with large stencil programs, and offers insight into architecture characteristics required for their efficient execution in practice. Show more
Publication status
publishedExternal links
Book title
2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)Pages / Article No.
Publisher
IEEEEvent
Organisational unit
03950 - Hoefler, Torsten / Hoefler, Torsten
Funding
185778 - Empowering Computational Science using Data-Centric Programming (SNF)
Notes
Due to the Coronavirus (COVID-19) the conference was conducted virtually.More
Show all metadata