A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

Dynamic parallelism on GPUs allows GPU threads to dynamically launch other GPU threads. It is useful in applications with nested parallelism, particularly where the amount of nested parallelism is irregular and cannot be predicted beforehand. However, prior works have shown that dynamic parallelism Show more

To address this issue, we propose a compiler framework for optimizing the use of dynamic parallelism in applications with nested parallelism. The framework features three key optimizations: thresholding, coarsening, and aggregation. Thresholding involves launching a grid dynamically only if the num Show more

Thresholding is sometimes applied manually by programmers in the context of dynamic parallelism. We automate it in the compiler and discuss the challenges associated with doing so. Coarsening is sometimes applied as an optimization in other contexts. We propose to apply coarsening in the context of Show more

Our evaluation shows that our compiler framework improves the performance of applications with nested parallelism by a geometric mean of 43.0 x over applications that use dynamic parallelism, 8.7x over applications that do not use dynamic parallelism, and 3.6x over applications that use dynamic par Show more

Publication status

published

External links

https://doi.org/10.1109/CGO53902.2022.9741284

Book title

2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Pages / Article No.

1 - 13

Publisher

IEEE

Event

IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2022), Online, April 2-6, 2022

Organisational unit

09483 - Mutlu, Onur / Mutlu, Onur

More

Show all metadata

ETH Bibliography

yes

Altmetrics

Research Collection

Search

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs Mendeley CSV RIS BibTeX

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

Mendeley

CSV

RIS

BibTeX