Off-the-shelf Data Analytics on Serverless


Date

2024-01

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Serverless has captured the interest of researchers and practitioners alike, being often considered the next step in the evolution of the cloud. Existing research, however, indicates it is ill-suited to data analytics due to the limitations of commercial platforms. This has led researchers to either design data analytics systems that work around the limitations of serverless platforms, suggest alternative serverless platforms, or both. In this paper we demonstrate that there is a third option: to provide the functionality needed to run off-the-shelf distributed data processing systems on top of existing serverless platforms (e.g., AWS Lambda) in a transparent manner. In the paper we discuss how this can be done and present initial experimental results of the TPC-H benchmark of unmodified Apache Spark and Apache Drill running on AWS Lambda. The results enable research in serverless data analytics that go beyond patching the shortcomings of existing commercial solutions and can be the basis for turning serverless into a general purpose computing platform.

Publication status

published

External links

Editor

Book title

Proceedings of the 14th Conference on Innovative Data Systems Research, CIDR 2024

Journal / series

Volume

Pages / Article No.

Publisher

CIDR

Event

14th Annual Conference on Innovative Data Systems Research (CIDR 2024)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03506 - Alonso, Gustavo / Alonso, Gustavo check_circle
09683 - Klimovic, Ana / Klimovic, Ana check_circle

Notes

Funding

Related publications and datasets