Efficiently Processing Large Relational Joins on GPUs


Date

2023-12-01

Publication Type

Working Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

With the growing interest in Machine Learning (ML), Graphic Processing Units (GPUs) have become key elements of any computing infrastructure. Their widespread deployment in data centers and the cloud raises the question of how to use them beyond ML use cases, with growing interest in employing them in a database context. In this paper, we explore and analyze the implementation of relational joins on GPUs from an end-to-end perspective, meaning that we take result materialization into account. We conduct a comprehensive performance study of state-of-the-art GPU-based join algorithms over diverse synthetic workloads and TPC-H/TPC-DS benchmarks. Without being restricted to the conventional setting where each input relation has only one key and one non-key with all attributes being 4-bytes long, we investigate the effect of various factors (e.g., input sizes, number of non-key columns, skewness, data types, match ratios, and number of joins) on the end-to-end throughput. Furthermore, we propose a technique called "Gather-from-Transformed-Relations" (GFTR) to reduce the long-ignored yet high materialization cost in GPU-based joins. The experimental evaluation shows significant performance improvements from GFTR, with throughput gains of up to 2.3 times over previous work. The insights gained from the performance study not only advance the understanding of GPU-based joins but also introduce a structured approach to selecting the most efficient GPU join algorithm based on the input relation characteristics.

Publication status

published

Editor

Book title

Journal / series

Volume

Pages / Article No.

2312.0072

Publisher

Cornell University

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Databases (cs.DB); FOS: Computer and information sciences

Organisational unit

03506 - Alonso, Gustavo / Alonso, Gustavo check_circle

Notes

Funding

Related publications and datasets