Open access
Author
Date
2021Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
For decades, the computational performance of processors has grown at a faster rate than the available memory bandwidth. As a result, most transistors in modern processors are spent on managing data movement via caches and registers. Spatial computing architectures can omit general purpose caches, registers, and control logic by implementing application-specific dataflow, where computations are laid out spatially. Programmable spatial architectures, such as FPGAs, can implement application-specific dataflow, but the steep learning curve of hardware programming prevents widespread adoption in high-performance computing (HPC). In this dissertation, we address this programmability gap. High-level synthesis (HLS) has increased productivity when designing FPGA architectures, but traditional software optimizations are insufficient to implement high-performance hardware architectures. To alleviate this, we present a set of key transformations for HLS, targeting scalable architectures for HPC applications, identifying classes of transformations and their effect in hardware, and boost the productivity of HLS developers with the hlslib open source project of productivity tools. Using these techniques, we present a model-based, end-to-end example of optimizing matrix multiplication for FPGAs, which yields competitive performance in practice and is published as an open source project. Venturing beyond HLS, we propose a new way to develop, optimize, and compile FPGA programs. The Data-Centric parallel programming (DaCe) framework allows applications to be defined by their dataflow and control flow through the Stateful DataFlow multiGraph (SDFG) representation, exposing a plethora of optimization opportunities. We unify general, domain-specific, and platform-specific optimizations in this flow, and present the FPGA backends of DaCe, emitting efficient HLS code for both Xilinx and Intel devices. Building on this infrastructure, we present StencilFlow, an end-to-end framework that maps general directed acyclic graphs of heterogeneous stencil operators to distributed FPGA architectures, maximizing temporal locality and ensuring deadlock freedom. We show the highest performance recorded for stencil programs for either FPGA vendor to date, and study a complex stencil program from a production weather simulation application. With the toolbox of transformations, open source software, and programming abstractions provided in this dissertation, we contribute to the productivity of HLS developers, performance engineers, domain scientists, and compiler engineers alike, bridging the gap for bringing spatial computing systems into the mainstream of HPC. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000536674Publication status
publishedExternal links
Search print copy at ETH Library
Contributors
Examiner: Hoefler, Torsten
Examiner: Alonso, Gustavo
Examiner: Blott, Michaela
Examiner: Kinsner, Michael
Publisher
ETH ZurichOrganisational unit
03950 - Hoefler, Torsten / Hoefler, Torsten
Funding
678880 - DAPP: Data-Centric Parallel Programming (EC)
More
Show all metadata
ETH Bibliography
yes
Altmetrics