Main-Memory Hash Joins on Multi-Core CPUs
Tuning to the Underlying Hardware
OPEN ACCESS
Author / Producer
Date
2012-11
Publication Type
Report
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
The architectural changes introduced with multicore CPUs have triggered a redesign of main-memory join algorithms. In the last few years, two diverging views have appeared. One approach advocates careful tailoring of the algorithm to the architectural parameters (cache sizes, TLB, and memory bandwidth). The other approach argues that modern hardware is good enough at hiding cache and TLB miss latencies and, consequently, the careful tailoring can be omitted without sacrificing performance. In this paper we demonstrate through experimental analysis of different algorithms and architectures that hardware still matters. Join algorithms that are hardware conscious perform better than hardware-oblivious approaches. The analysis and comparisons in the paper show that many of the claims regarding the behavior of join algorithms that have appeared in literature are due to selection effects (relative table sizes, tuple sizes, the underlying architecture, using sorted data, etc.) and are not supported by experiments run under different parameters settings. Through the analysis, we shed light on how modern hardware affects the implementation of data operators and provide the fastest implementation of radix join to date, reaching close to 200 million tuples per second.
Permanent link
Publication status
published
External links
Editor
Book title
Volume
779
Pages / Article No.
Publisher
ETH Zurich, Department of Computer Science
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
MULTIPLE DATA STREAM ARCHITECTURES + MULTIPROCESSORS (COMPUTER SYSTEMS); MULTIPLE-DATA-STREAM-ARCHITEKTUREN + MULTIPROZESSOREN (COMPUTERSYSTEME); SPEICHERORGANISATION + SPEICHERVERWALTUNG (BETRIEBSSYSTEME); STORAGE MANAGEMENT + MEMORY MANAGEMENT (OPERATING SYSTEMS)
Organisational unit
03506 - Alonso, Gustavo / Alonso, Gustavo
02150 - Dep. Informatik / Dep. of Computer Science