Software Resource Disaggregation for HPC with Serverless Computing
METADATA ONLY
Author / Producer
Date
2024
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
Aggregated HPC resources have rigid allocation systems and programming models which struggle to adapt to diverse and changing workloads. Consequently, HPC systems fail to efficiently use the large pools of unused memory and increase the utilization of idle computing resources. Prior work attempted to increase the throughput and efficiency of super-computing systems through workload co-location and resource disaggregation. However, these methods fall short of providing a solution that can be applied to existing systems without major hardware modifications and performance losses. In this paper, we improve the utilization of supercomputers by employing the new cloud paradigm of serverless computing. We show how serverless functions provide fine-grained access to the resources of batch-managed cluster nodes. We present an HPC-oriented Function-as-a-Service (FaaS) that satisfies the requirements of high-performance applications. We demonstrate a software resource disaggregation approach where placing functions on unallocated and underutilized nodes allows idle cores and accelerators to be utilized while retaining near-native performance.
Permanent link
Publication status
published
External links
Editor
Book title
2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Journal / series
Volume
Pages / Article No.
139 - 156
Publisher
IEEE
Event
38th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2024)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03950 - Hoefler, Torsten / Hoefler, Torsten
Notes
Conference Presentation held on May 28, 2024.
Funding
955606 - DEEP- Software for Exascale Archtiectures (EC)
955776 - Network Solution for Exascale Architectures (EC)
955776 - Network Solution for Exascale Architectures (EC)