Efficient Data-parallel Computing on Small Heterogeneous Clusters
OPEN ACCESS
Author / Producer
Date
2012-04
Publication Type
Report
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Cluster-based data-parallel frameworks such as MapReduce, Hadoop, and Dryad are increasingly popular for a large class of compute-intensive tasks. Such systems are designed for large-scale clusters, and employ several techniques to decrease the run time of jobs in the presence of failures, slow machines, and other effects. In this paper, we apply Dryad to smaller-scale, “ad-hoc” clusters such as those formed by aggregating the servers and workstations in a small office. We first show that, while Dryad’s greedy scheduling algorithm performs well at scale, it is significantly less optimal in a small (5-10 machine) cluster environment where nodes have widely differing performance characteristics. We further show that in such cases, performance models of dataflow operators can be constructed which predict runtimes of vertex processes with sufficient accuracy to allow a more intelligent planner to achieve significant performance gains for a variety of jobs, and we show how to efficiently construct such models. Our system enhances the DryadLINQ data-parallel language compiler with a planner/optimizer implemented using constraint programming, and can exploit our operator models to significantly enhance the performance of parallel jobs on ad-hoc clusters.
Permanent link
Publication status
published
External links
Editor
Book title
Volume
756
Pages / Article No.
Publisher
ETH Zurich, Department of Computer Science
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
PARALLELVERARBEITUNG + NEBENLÄUFIGKEIT (BETRIEBSSYSTEME); VERTEILTE PROGRAMMIERUNG + PARALLELE PROGRAMMIERUNG (PROGRAMMIERMETHODEN); VERTEILTE ANWENDUNGEN + CLOUD COMPUTING + GRID COMPUTING (COMPUTERSYSTEME); DISTRIBUTED APPLICATIONS + CLOUD COMPUTING + GRID COMPUTING (COMPUTER SYSTEMS); PARALLEL PROCESSING + CONCURRENCY (OPERATING SYSTEMS); CONCURRENT PROGRAMMING + DISTRIBUTED PROGRAMMING + PARALLEL PROGRAMMING (PROGRAMMING METHODS)
Organisational unit
03757 - Roscoe, Timothy / Roscoe, Timothy
02150 - Dep. Informatik / Dep. of Computer Science