Productive FPGA Programming for High-Performance Computing

de Fine Licht, Johannes

doi:10.3929/ethz-b-000536674

Download

Full text (PDF, 5.927Mb)

Open access

Author

de Fine Licht, Johannes

Date

2021

Type

Doctoral Thesis

ETH Bibliography

yes

Altmetrics

Download

Full text (PDF, 5.927Mb)

Rights / license

In Copyright - Non-Commercial Use Permitted

Abstract

For decades, the computational performance of processors has grown at a faster rate than the available memory bandwidth. As a result, most transistors in modern processors are spent on managing data movement via caches and registers. Spatial computing architectures can omit general purpose caches, registers, and control logic by implementing application-specific dataflow, where computations are laid out spatially. Programmable spatial architectures, such as FPGAs, can implement application-specific dataflow, but the steep learning curve of hardware programming prevents widespread adoption in high-performance computing (HPC). In this dissertation, we address this programmability gap. High-level synthesis (HLS) has increased productivity when designing FPGA architectures, but traditional software optimizations are insufficient to implement high-performance hardware architectures. To alleviate this, we present a set of key transformations for HLS, targeting scalable architectures for HPC applications, identifying classes of transformations and their effect in hardware, and boost the productivity of HLS developers with the hlslib open source project of productivity tools. Using these techniques, we present a model-based, end-to-end example of optimizing matrix multiplication for FPGAs, which yields competitive performance in practice and is published as an open source project. Venturing beyond HLS, we propose a new way to develop, optimize, and compile FPGA programs. The Data-Centric parallel programming (DaCe) framework allows applications to be defined by their dataflow and control flow through the Stateful DataFlow multiGraph (SDFG) representation, exposing a plethora of optimization opportunities. We unify general, domain-specific, and platform-specific optimizations in this flow, and present the FPGA backends of DaCe, emitting efficient HLS code for both Xilinx and Intel devices. Building on this infrastructure, we present StencilFlow, an end-to-end framework that maps general directed acyclic graphs of heterogeneous stencil operators to distributed FPGA architectures, maximizing temporal locality and ensuring deadlock freedom. We show the highest performance recorded for stencil programs for either FPGA vendor to date, and study a complex stencil program from a production weather simulation application. With the toolbox of transformations, open source software, and programming abstractions provided in this dissertation, we contribute to the productivity of HLS developers, performance engineers, domain scientists, and compiler engineers alike, bridging the gap for bringing spatial computing systems into the mainstream of HPC. Show more

Permanent link

https://doi.org/10.3929/ethz-b-000536674

Publication status

published

External links

Search print copy at ETH Library

Contributors

Examiner: Hoefler, Torsten
Examiner: Alonso, Gustavo
Examiner: Blott, Michaela
Examiner: Kinsner, Michael

Publisher

ETH Zurich

Organisational unit

03950 - Hoefler, Torsten / Hoefler, Torsten

Funding

678880 - DAPP: Data-Centric Parallel Programming (EC)

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

Productive FPGA Programming for High-Performance Computing Mendeley CSV RIS BibTeX

Productive FPGA Programming for High-Performance Computing

Mendeley

CSV

RIS

BibTeX