Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures


METADATA ONLY
Loading...

Date

2016

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Coresets are efficient representations of data sets such that models trained on the coreset are provably competitive with models trained on the original data set. As such, they have been successfully used to scale up clustering models such as K-Means and Gaussian mixture models to massive data sets. However, until now, the algorithms and the corresponding theory were usually specific to each clustering problem. We propose a single, practical algorithm to construct strong coresets for a large class of hard and soft clustering problems based on Bregman divergences. This class includes hard clustering with popular distortion measures such as the Squared Euclidean distance, the Mahalanobis distance, KL-divergence and Itakura-Saito distance. The corresponding soft clustering problems are directly related to popular mixture models due to a dual relationship between Bregman divergences and Exponential family distributions. Our theoretical results further imply a randomized polynomial-time approximation scheme for hard clustering. We demonstrate the practicality of the proposed algorithm in an empirical evaluation.

Publication status

published

Book title

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics

Volume

51

Pages / Article No.

1 - 9

Publisher

PMLR

Event

19th International Conference on Artificial Intelligence and Statistics (AISTATS 2016)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03908 - Krause, Andreas / Krause, Andreas check_circle

Notes

Funding

307036 - Large-scale Adaptive Sensing, Learning and Decision Making: Theory and Applications (EC)

Related publications and datasets