Open access
Author
Date
2017-12-21Type
- Master Thesis
ETH Bibliography
yes
Altmetrics
Abstract
In this thesis, we present a novel, extensible RDMA database operator interface — where RDMA is not simply the acronym for Remote Direct Memory Accesses – in addition it also stands for Reusable, Distributed MAin-memory database operator interface.
The interface is designed and implemented for distributed query pipelines scaling up algorithms to many thousand cores. It provides the usability of SQL, combined with the expressiveness and extensibility of Spark and will eventually achieve the performance of hand-tuned algorithms written in C++. We implement a distributed radix hash join algorithm and query 1 of the TPC-H Benchmark with the building blocks provided by the operator abstraction. While the former is intended to compare the performance to a hand-tuned, highly optimized implementation, the latter shows the expressiveness of the interface.
Although the radix hash join implemented on top of the operator interface exhibits less performance, it has a similar sub-linear scaling behaviour. In the discussion section, we show the differences in the implementations and that the operator interface simply needs more fine-tuning to match similar performance or even surpass it. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000223924Publication status
publishedVolume
Publisher
Systems Group, Department of Computer Science, ETH ZurichSubject
operator interface; extensible; reusable; RDMA; main-memory; main-memory database; micro-operatorOrganisational unit
03506 - Alonso, Gustavo / Alonso, Gustavo
More
Show all metadata
ETH Bibliography
yes
Altmetrics