Data Summarization via Bilevel Optimization
OPEN ACCESS
Loading...
Author / Producer
Date
2024
Publication Type
Journal Article
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
The increasing availability of massive data sets poses various challenges for machine learning. Prominent among these is learning models under hardware or human resource straints. In such resource -constrained settings, a simple yet powerful approach is operating on small subsets of the data. Coresets are weighted subsets of the data that provide approximation guarantees for the optimization objective. However, existing coreset constructions are highly model -specific and are limited to simple models such as linear regression, logistic regression, and k -means. In this work, we propose a generic coreset construction framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem. In contrast to existing approaches, our framework does not require model -specific adaptations and applies to any twice differentiable model, including neural networks. show the effectiveness of our framework for a wide range of models in various settings, including training non -convex models online and batch active learning.
Permanent link
Publication status
published
External links
Editor
Book title
Journal / series
Volume
25
Pages / Article No.
73
Publisher
Microtome Publishing
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
importance sampling; Monte Carlo; Bayesian computation; diagnostics