Producing building blocks for data analytics
OPEN ACCESS
Loading...
Author / Producer
Date
2019-09
Publication Type
Master Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
The ever increasing diversity of data analytics and AI applications has had a tremendous impact on the number of tools that were developed during the past few years. The developers of these tools usually do not spend a lot of time thinking which are the building blocks that lie in their core. As a result, they sometimes have to produce many slightly different versions of the same code fragments. Instead, they could reduce their implementation effort by designing reusable and recomposable building blocks. Then, they could simply orchestrate them in a different order across execution plans. In this thesis, we study the level of granularity of these building blocks. We start with a state-of-the-art high-performance distributed hash join, which we split into smaller operators that have a single functionality. We explore different levels of granularity and study their impact on reusability and performance. Our proposed granularity level yields operators that are reusable and have almost no performance overhead. We present a variety of use cases where we can apply them in modern ML and data analytics scenarios. By using the same operators, the original join algorithm has similar performance and it is even faster in some cases.
Permanent link
Publication status
published
External links
Editor
Contributors
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03506 - Alonso, Gustavo / Alonso, Gustavo