Distributed Join Result Materialization over High-Performance Networks


Loading...

Author / Producer

Date

2017-09-27

Publication Type

Master Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Distributed Joins over a network have been researched for decades, usually focusing on adapting the join to the network connecting the nodes holding the relations. Most research has gone into optimizing the join itself, i.e. the identification of matching tuples, however the effective materialization of the join result is equally important. The main performance issue identified by materialization strategies is that the network performs significantly worse than the local processing nodes, i.e. the transfer speed between nodes is the limiting factor. The conclusion drawn from this is that a materialization approach should reduce the amount of transmitted data by spending CPU time on the creation of optimal transfer schedules. In this thesis, we explore the possible changes to this materialization approach by considering a high-performance network. We propose a late-materialization approach with two different strategies for the exchange of data. We focus on optimizing CPU time and interleave communication and computation for the exchange of data. We then perform experiments for a wide range of parameters. The results show that, despite the interleaving of communication and computation, the implementation is network bound, thus concluding that even in high-performance networks, the data transfer has to be optimized.

Publication status

published

External links

Editor

Contributors

Examiner : Müller, Ingo
Examiner : Barthels, Claude
Examiner : Alonso, Gustavo

Book title

Journal / series

Volume

176

Pages / Article No.

Publisher

Systems Group, Department of Computer Science, ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03506 - Alonso, Gustavo / Alonso, Gustavo check_circle

Notes

Funding

Related publications and datasets