Efficiently Processing Large Relational Joins on GPUs
dc.contributor.author
Wu, Bowen
dc.contributor.author
Koutsoukos, Dimitrios
dc.contributor.author
Alonso, Gustavo
dc.date.accessioned
2024-03-13T10:52:49Z
dc.date.available
2024-01-15T13:02:41Z
dc.date.available
2024-03-13T10:52:04Z
dc.date.available
2024-03-13T10:52:49Z
dc.date.issued
2023-12-01
dc.identifier.other
10.48550/ARXIV.2312.00720
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/652744
dc.identifier.doi
10.3929/ethz-b-000652744
dc.description.abstract
With the growing interest in Machine Learning (ML), Graphic Processing Units (GPUs) have become key elements of any computing infrastructure. Their widespread deployment in data centers and the cloud raises the question of how to use them beyond ML use cases, with growing interest in employing them in a database context. In this paper, we explore and analyze the implementation of relational joins on GPUs from an end-to-end perspective, meaning that we take result materialization into account. We conduct a comprehensive performance study of state-of-the-art GPU-based join algorithms over diverse synthetic workloads and TPC-H/TPC-DS benchmarks. Without being restricted to the conventional setting where each input relation has only one key and one non-key with all attributes being 4-bytes long, we investigate the effect of various factors (e.g., input sizes, number of non-key columns, skewness, data types, match ratios, and number of joins) on the end-to-end throughput. Furthermore, we propose a technique called "Gather-from-Transformed-Relations" (GFTR) to reduce the long-ignored yet high materialization cost in GPU-based joins. The experimental evaluation shows significant performance improvements from GFTR, with throughput gains of up to 2.3 times over previous work. The insights gained from the performance study not only advance the understanding of GPU-based joins but also introduce a structured approach to selecting the most efficient GPU join algorithm based on the input relation characteristics.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Cornell University
en_US
dc.rights.uri
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject
Databases (cs.DB)
en_US
dc.subject
FOS: Computer and information sciences
en_US
dc.title
Efficiently Processing Large Relational Joins on GPUs
en_US
dc.type
Working Paper
dc.rights.license
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
ethz.journal.title
arXiv
ethz.pages.start
2312.00720
en_US
ethz.size
14 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science
en_US
ethz.identifier.arxiv
2312.00720
ethz.publication.place
Ithaca, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02663 - Institut für Computing Platforms / Institute for Computing Platforms::03506 - Alonso, Gustavo / Alonso, Gustavo
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02663 - Institut für Computing Platforms / Institute for Computing Platforms::03506 - Alonso, Gustavo / Alonso, Gustavo
en_US
ethz.date.deposited
2024-01-15T13:02:41Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2024-03-13T10:52:05Z
ethz.rosetta.lastUpdated
2024-03-13T10:52:05Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Efficiently%20Processing%20Large%20Relational%20Joins%20on%20GPUs&rft.jtitle=arXiv&rft.date=2023-12-01&rft.spage=2312.00720&rft.au=Wu,%20Bowen&Koutsoukos,%20Dimitrios&Alonso,%20Gustavo&rft.genre=preprint&rft_id=info:doi/10.48550/ARXIV.2312.00720&
Files in this item
Publication type
-
Working Paper [5992]