Towards A Platform and Benchmark Suite for Model Training on Dynamic Datasets
Open access
Date
2023-05Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
Machine learning (ML) is often applied in use cases where training data evolves and/or grows over time. Training must incorporate data changes for high model quality, however this is often challenging and expensive due to large datasets and models. In contrast, ML researchers often train and evaluate ML models on static datasets or with artificial assumptions about data dynamics. This gap between research and practice is largely due to (i) the absence of an open-source platform that manages dynamic datasets at scale and supports pluggable policies for when and what data to train on, and (ii) the lack of representative open-source benchmarks for ML training on dynamic datasets. To address this gap, we propose to design a platform that enables ML researchers and practitioners to explore training and data selection policies, while alleviating the burdens of managing large dynamic datasets and orchestrating recurring training jobs. We also propose to build an accompanying benchmark suite that integrates public dynamic datasets and ML models from a variety of representative use cases. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000624892Publication status
publishedExternal links
Book title
EuroMLSys '23: Proceedings of the 3rd Workshop on Machine Learning and SystemsPages / Article No.
Publisher
Association for Computing MachineryEvent
Organisational unit
09683 - Klimovic, Ana / Klimovic, Ana
Funding
204620 - MLin: Machine Learning Input Data Processing as a Service (SNF)
More
Show all metadata
ETH Bibliography
yes
Altmetrics