Show simple item record

dc.contributor.author
Olabi, Mhd Ghaith
dc.contributor.author
Gómez Luna, Juan
dc.contributor.author
Mutlu, Onur
dc.contributor.author
Hwu, Wen-Mei
dc.contributor.author
El Hajj, Izzat
dc.date.accessioned
2022-08-02T15:38:31Z
dc.date.available
2022-07-31T03:03:38Z
dc.date.available
2022-08-02T15:38:31Z
dc.date.issued
2022
dc.identifier.isbn
978-1-6654-0584-3
en_US
dc.identifier.isbn
978-1-6654-0585-0
en_US
dc.identifier.other
10.1109/CGO53902.2022.9741284
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/560988
dc.description.abstract
Dynamic parallelism on GPUs allows GPU threads to dynamically launch other GPU threads. It is useful in applications with nested parallelism, particularly where the amount of nested parallelism is irregular and cannot be predicted beforehand. However, prior works have shown that dynamic parallelism may impose a high performance penalty when a large number of small grids are launched. The large number of launches results in high launch latency due to congestion, and the small grid sizes result in hardware underutilization.
en_US
dc.description.abstract
To address this issue, we propose a compiler framework for optimizing the use of dynamic parallelism in applications with nested parallelism. The framework features three key optimizations: thresholding, coarsening, and aggregation. Thresholding involves launching a grid dynamically only if the number of child threads exceeds some threshold, and serializing the child threads in the parent thread otherwise. Coarsening involves executing the work of multiple thread blocks by a single coarsened block to amortize the common work across them. Aggregation involves combining multiple child grids into a single aggregated grid.
en_US
dc.description.abstract
Thresholding is sometimes applied manually by programmers in the context of dynamic parallelism. We automate it in the compiler and discuss the challenges associated with doing so. Coarsening is sometimes applied as an optimization in other contexts. We propose to apply coarsening in the context of dynamic parallelism and automate it in the compiler as well. Aggregation has been automated in the compiler by prior work. We enhance aggregation by proposing a new aggregation technique that uses multi-block granularity. We also integrate these three optimizations into an open-source compiler framework to simplify the process of optimizing dynamic parallelism code.
en_US
dc.description.abstract
Our evaluation shows that our compiler framework improves the performance of applications with nested parallelism by a geometric mean of 43.0 x over applications that use dynamic parallelism, 8.7x over applications that do not use dynamic parallelism, and 3.6x over applications that use dynamic parallelism with aggregation alone as proposed in prior work.
en_US
dc.language.iso
en
en_US
dc.publisher
IEEE
en_US
dc.title
A Compiler Framework for Optimizing Dynamic Parallelism on GPUs
en_US
dc.type
Conference Paper
dc.date.published
2022-03-29
ethz.book.title
2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
en_US
ethz.pages.start
1
en_US
ethz.pages.end
13
en_US
ethz.event
IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2022)
en_US
ethz.event.location
Online
en_US
ethz.event.date
April 2-6, 2022
en_US
ethz.identifier.wos
ethz.publication.place
Piscataway, NJ
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::09483 - Mutlu, Onur / Mutlu, Onur
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::09483 - Mutlu, Onur / Mutlu, Onur
ethz.date.deposited
2022-07-31T03:04:00Z
ethz.source
WOS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2022-08-02T15:38:38Z
ethz.rosetta.lastUpdated
2023-02-07T04:59:32Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=A%20Compiler%20Framework%20for%20Optimizing%20Dynamic%20Parallelism%20on%20GPUs&rft.date=2022&rft.spage=1&rft.epage=13&rft.au=Olabi,%20Mhd%20Ghaith&G%C3%B3mez%20Luna,%20Juan&Mutlu,%20Onur&Hwu,%20Wen-Mei&El%20Hajj,%20Izzat&rft.isbn=978-1-6654-0584-3&978-1-6654-0585-0&rft.genre=proceeding&rft_id=info:doi/10.1109/CGO53902.2022.9741284&rft.btitle=2022%20IEEE/ACM%20International%20Symposium%20on%20Code%20Generation%20and%20Optimization%20(CGO)
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record