Notice
This is not the latest version of this item. The latest version can be found at: https://www.research-collection.ethz.ch/handle/20.500.11850/284907
Open access
Author
Date
2017-09-27Type
- Master Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Distributed Joins over a network have been researched for decades, usually focusing on adapting the join to the network connecting the nodes holding the relations. Most research has gone into optimizing the join itself, i.e. the identification of matching tuples, however the effective materialization of the join result is equally important. The main performance issue identified by materialization strategies is that the network performs significantly worse than the local processing nodes, i.e. the transfer speed between nodes is the limiting factor. The conclusion drawn from this is that a materialization approach should reduce the amount of transmitted data by spending CPU time on the creation of optimal transfer schedules. In this thesis, we explore the possible changes to this materialization approach by considering a high-performance network. We propose a late-materialization approach with two different strategies for the exchange of data. We focus on optimizing CPU time and interleave communication and computation for the exchange of data. We then perform experiments for a wide range of parameters. The results show that, despite the interleaving of communication and computation, the implementation is network bound, thus concluding that even in high-performance networks, the data transfer has to be optimized. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000284907Publication status
publishedVolume
Publisher
Systems Group, Department of Computer Science, ETH ZurichOrganisational unit
03506 - Alonso, Gustavo / Alonso, Gustavo
More
Show all metadata
ETH Bibliography
yes
Altmetrics