Pecan: Cost-Efficient ML Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement
Metadata only
Autor(in)
Alle anzeigen
Datum
2024Typ
- Conference Paper
ETH Bibliographie
yes
Altmetrics
Abstract
Input data preprocessing is a common bottleneck in machine teaming (ML) jobs, that can significantly increase training time and cost as expensive GPUs or Till's idle waiting for input data. Previous Work has shown that offloading data preprocessing to remote CPU servers successfully alleviates data stalls and improves training time. However, remote CPU workers in disaggregated data processing systems comprise a significant fraction of total training costs. Meanwhile, current disaggregated solutions often underutilize CPU and DRAM resources available on ML accelerator nodes. We propose two approaches to alleviate ML input data stalls while minimizing costs. First, we dynamically schedule data preprocessing workers on ML accelerator host resources to minimize the number of remote CPU workers needed to achieve peak data ingestion bandwidth. Second, we analyze the characteristics of input pipelines and automatically reorder transformations to increase data preprocessing worker throughput. We observe that relaxing commutativity increases throughput while maintaining high model accuracy for a variety of ML data pipelines. We build Pecan, an ML data preprocessing service that automates data preprocessing worker placement and transformation reordering decisions. Pecan reduces preprocessing costs by 87% on average and total training costs by up to 60% compared to training with slate-of-the-art disaggregated data preprocessing and total training costs by 55% on average compared to collocated data preprocessing. Mehr anzeigen
Publikationsstatus
publishedBuchtitel
ATC'24: Proceedings of the 2024 USENIX Annual Technical ConferenceSeiten / Artikelnummer
Verlag
USENIX AssociationKonferenz
Förderung
204620 - MLin: Machine Learning Input Data Processing as a Service (SNF)
ETH Bibliographie
yes
Altmetrics