Efficient Data-parallel Computing on Small Heterogeneous Clusters


Date

2012-04

Publication Type

Report

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Cluster-based data-parallel frameworks such as MapReduce, Hadoop, and Dryad are increasingly popular for a large class of compute-intensive tasks. Such systems are designed for large-scale clusters, and employ several techniques to decrease the run time of jobs in the presence of failures, slow machines, and other effects. In this paper, we apply Dryad to smaller-scale, “ad-hoc” clusters such as those formed by aggregating the servers and workstations in a small office. We first show that, while Dryad’s greedy scheduling algorithm performs well at scale, it is significantly less optimal in a small (5-10 machine) cluster environment where nodes have widely differing performance characteristics. We further show that in such cases, performance models of dataflow operators can be constructed which predict runtimes of vertex processes with sufficient accuracy to allow a more intelligent planner to achieve significant performance gains for a variety of jobs, and we show how to efficiently construct such models. Our system enhances the DryadLINQ data-parallel language compiler with a planner/optimizer implemented using constraint programming, and can exploit our operator models to significantly enhance the performance of parallel jobs on ad-hoc clusters.

Publication status

published

External links

Editor

Book title

Volume

756

Pages / Article No.

Publisher

ETH Zurich, Department of Computer Science

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

PARALLELVERARBEITUNG + NEBENLÄUFIGKEIT (BETRIEBSSYSTEME); VERTEILTE PROGRAMMIERUNG + PARALLELE PROGRAMMIERUNG (PROGRAMMIERMETHODEN); VERTEILTE ANWENDUNGEN + CLOUD COMPUTING + GRID COMPUTING (COMPUTERSYSTEME); DISTRIBUTED APPLICATIONS + CLOUD COMPUTING + GRID COMPUTING (COMPUTER SYSTEMS); PARALLEL PROCESSING + CONCURRENCY (OPERATING SYSTEMS); CONCURRENT PROGRAMMING + DISTRIBUTED PROGRAMMING + PARALLEL PROGRAMMING (PROGRAMMING METHODS)

Organisational unit

03757 - Roscoe, Timothy / Roscoe, Timothy check_circle
02150 - Dep. Informatik / Dep. of Computer Science

Notes

Funding

Related publications and datasets