Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures
METADATA ONLY
Loading...
Author / Producer
Date
2016
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
Coresets are efficient representations of data sets such that models trained on the coreset are provably competitive with models trained on the original data set. As such, they have been successfully used to scale up clustering models such as K-Means and Gaussian mixture models to massive data sets. However, until now, the algorithms and the corresponding theory were usually specific to each clustering problem.
We propose a single, practical algorithm to construct strong coresets for a large class of hard and soft clustering problems based on Bregman divergences. This class includes hard clustering with popular distortion measures such as the Squared Euclidean distance, the Mahalanobis distance, KL-divergence and Itakura-Saito distance. The corresponding soft clustering problems are directly related to popular mixture models due to a dual relationship between Bregman divergences and Exponential family distributions. Our theoretical results further imply a randomized polynomial-time approximation scheme for hard clustering. We demonstrate the practicality of the proposed algorithm in an empirical evaluation.
Permanent link
Publication status
published
Book title
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
Journal / series
Volume
51
Pages / Article No.
1 - 9
Publisher
PMLR
Event
19th International Conference on Artificial Intelligence and Statistics (AISTATS 2016)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03908 - Krause, Andreas / Krause, Andreas
Notes
Funding
307036 - Large-scale Adaptive Sensing, Learning and Decision Making: Theory and Applications (EC)