Michael Wawrzoniak
Loading...
8 results
Filters
Reset filtersSearch Results
Publications 1 - 8 of 8
- Serverless Datacenter ApplicationsItem type: Doctoral ThesisWawrzoniak, Michael (2024)
- Rethinking Serverless Computing: from the Programming Model to the Platform DesignItem type: Conference Paper
CEUR Workshop Proceedings ~ Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023)Alonso, Gustavo; Klimovic, Ana; Kuchler, Tom; et al. (2023)Serverless computing offers a number of advantages over conventional, Virtual Machine (VM) based deployments on the cloud, e.g., greater elasticity, simplicity of use and management, finer granularity billing, and rapid deployment and start up times. Naturally, there is a growing interest in exploring how to run applications in this new environment and data analytics is not an exception. Unfortunately, current serverless platforms are limited along several dimensions, which makes things quite difficult from the perspective of data analytics. In this paper we explore what serverless has to offer today, what is missing, and what can be done to make serverless a better computing platform in general and for data analytics in particular. - Boxer: FaaSt Ephemeral Elasticity for Off-the-Shelf Cloud ApplicationsItem type: Working Paper
arXivWawrzoniak, Michael; Fraga Barcelos Paulus Bruno, Rodrigo; Klimovic, Ana; et al. (2024)Elasticity is a key property of cloud computing. However, elasticity is offered today at the granularity of virtual machines, which take tens of seconds to start. This is insufficient to react to load spikes and sudden failures in latency sensitive applications, leading users to resort to expensive overprovisioning. Function-as-a-Service (FaaS) provides significantly higher elasticity than VMs, but comes coupled with an event-triggered programming model and a constrained execution environment that makes them unsuitable for off-the-shelf applications. Previous work tries to overcome these obstacles but often requires re-architecting the applications. In this paper, we show how off-the-shelf applications can transparently benefit from ephemeral elasticity with FaaS. We built Boxer, an interposition layer spanning VMs and AWS Lambda, that intercepts application execution and emulates the network-of-hosts environment that applications expect when deployed in a conventional VM/container environment. The ephemeral elasticity of Boxer enables significant performance and cost savings for off-the-shelf applications with, e.g., recovery times over 5x faster than EC2 instances and absorbing load spikes comparable to overprovisioned EC2 VM instances. - Ephemeral Per-query Engines for Serverless AnalyticsItem type: Conference Paper
CEUR Workshop Proceedings ~ Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023)Wawrzoniak, Michael; Fraga Barcelos Paulus Bruno, Rodrigo; Klimovic, Ana; et al. (2023)We challenge the common assumption that queries are submitted to a pre-configured, already running engine and put forward the idea of dynamically instantiating a chosen data processing engine upon query submission by leveraging Function-as-a Service (FaaS) platforms. We demonstrate the idea by running unmodified data processing engines (we use Apache Drill as an initial example) on real-world serverless FaaS platforms and show that such engines can be instantiated on demand when a query arrives. We aim to eventually support a wide range of queries and workloads. Wide access to such functionality would be a game changer in data processing. First, it would enable pay-per-query models supporting sporadic, interactive data analysis on arbitrary engines. Second, it would significantly increase the flexibility for data processing by enabling the possibility of dynamically choosing the actual engine, its configuration, and the resource allocation on a per-query basis. Logically, this amounts to dynamically attaching a query engine to the query rather than sending the query to a pre-configured and already deployed engine. In this paper we elaborate on this vision, outline the design of the MetaQ prototype that we are building to explore the idea, demonstrate that it is realistic through initial experiments, and discuss its many exciting practical implications. - Off-the-shelf Data Analytics on ServerlessItem type: Conference Paper
Proceedings of the 14th Conference on Innovative Data Systems Research, CIDR 2024Wawrzoniak, Michael; Moro, Gianluca; Fraga Barcelos Paulus Bruno, Rodrigo; et al. (2024)Serverless has captured the interest of researchers and practitioners alike, being often considered the next step in the evolution of the cloud. Existing research, however, indicates it is ill-suited to data analytics due to the limitations of commercial platforms. This has led researchers to either design data analytics systems that work around the limitations of serverless platforms, suggest alternative serverless platforms, or both. In this paper we demonstrate that there is a third option: to provide the functionality needed to run off-the-shelf distributed data processing systems on top of existing serverless platforms (e.g., AWS Lambda) in a transparent manner. In the paper we discuss how this can be done and present initial experimental results of the TPC-H benchmark of unmodified Apache Spark and Apache Drill running on AWS Lambda. The results enable research in serverless data analytics that go beyond patching the shortcomings of existing commercial solutions and can be the basis for turning serverless into a general purpose computing platform. - Imaginary Machines: A Serverless Model for Cloud ApplicationsItem type: Conference PaperWawrzoniak, Michael; Fraga Barcelos Paulus Bruno, Rodrigo; Klimovic, Ana; et al. (2024)Serverless Function-as-a-Service (FaaS) platforms provide applications with resources that are highly elastic, quick to instantiate, accounted at fine granularity, and without the need for explicit runtime resource orchestration. This combination of the core properties underpins the success and popularity of the serverless FaaS paradigm. However, these benefits are not available to most cloud applications because they are designed for networked virtual machines/containers environments. Since such cloud applications cannot take advantage of the highly elastic resources of serverless and require run-time orchestration systems to operate, they suffer from lower resource utilization, additional management complexity, and costs relative to their FaaS serverless counterparts. We propose Imaginary Machines, a new serverless model for cloud applications. This model (1.) exposes the highly elastic resources of serverless platforms as the traditional network-of-hosts model that cloud applications expect, and (2.) it eliminates the need for explicit run-time orchestration by transparently managing application resources based on signals generated during cloud application executions. With the Imaginary Machines model, unmodified cloud applications become serverless applications. While still based on the network-of-host model, they benefit from the highly elastic resources and do not require runtime orchestration, just like their specialized serverless FaaS counterparts, promising increased resource utilization while reducing management costs.
- The Collection Virtual Machine: An Abstraction for Multi-Frontend Multi-Backend Data AnalysisItem type: Conference Paper
Proceedings of the 16th International Workshop on Data Management on New Hardware, DaMoN '20Müller, Ingo; Marroquín, Renato; Koutsoukos, Dimitrios; et al. (2020)Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement a single analytics type on one platform, leading to repeated implementation effort---and a plethora of semi-compatible tools for data scientists. In this paper, we propose the "Collection Virtual Machine" (or CVM)---an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators. While reducing the overall implementation effort, this also improves the interoperability of both analyses and hardware platforms. We have used CVM successfully to build specialized backends for platforms as diverse as multi-core CPUs, RDMA clusters, and serverless computing infrastructure in the cloud and expect similar results for many more frontends and hardware platforms in the near future. - Boxer: Data Analytics on Network-enabled Serverless PlatformsItem type: Conference PaperWawrzoniak, Michael; Müller, Ingo; Fraga Barcelos Paulus Bruno, Rodrigo; et al. (2021)Serverless is an attractive platform for a variety of applications in the cloud due to its promise of elasticity, low cost, and fast deployment. Instead of using traditional virtual machine services and a fixed infrastructure, which incurs considerable costs to operate and run, Function-as-a-Service allows triggering short computations on demand with the cost proportional to the time the functions are running. As appealing as the idea is, recent work has shown that for data processing applications (regardless of whether it is OLTP, OLAP, or ML) existing serverless platforms are inadequate and additional services are needed in practice, often to address the lack of communication capabilities between functions. In this paper, we demonstrate how to enable function-to-function communication using conventional TCP/IP and show how the ability to communicate can be used to implement data processing on serverless platforms in a more efficient manner than it was possible until now. Our benchmarks show a speedup as high as 11 × in TPC-H queries over systems that use cloud storage to communicate across functions, sustained function-to-function throughput of 621 Mbit/s, and a round-trip latency of less than 1 ms.
Publications 1 - 8 of 8