Off-the-shelf Data Analytics on Serverless
OPEN ACCESS
Date
2024-01
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Serverless has captured the interest of researchers and practitioners alike, being often considered the next step in the evolution of the cloud. Existing research, however, indicates it is ill-suited to data analytics due to the limitations of commercial platforms. This has led researchers to either design data analytics systems that work around the limitations of serverless platforms, suggest alternative serverless platforms, or both. In this paper we demonstrate that there is a third option: to provide the functionality needed to run off-the-shelf distributed data processing systems on top of existing serverless platforms (e.g., AWS Lambda) in a transparent manner. In the paper we discuss how this can be done and present initial experimental results of the TPC-H benchmark of unmodified Apache Spark and Apache Drill running on AWS Lambda. The results enable research in serverless data analytics that go beyond patching the shortcomings of existing commercial solutions and can be the basis for turning serverless into a general purpose computing platform.
Permanent link
Publication status
published
External links
Editor
Book title
Proceedings of the 14th Conference on Innovative Data Systems Research, CIDR 2024
Journal / series
Volume
Pages / Article No.
Publisher
CIDR
Event
14th Annual Conference on Innovative Data Systems Research (CIDR 2024)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03506 - Alonso, Gustavo / Alonso, Gustavo
09683 - Klimovic, Ana / Klimovic, Ana
Notes
Funding
Related publications and datasets
Is part of: https://www.cidrdb.org/