Automatic topography of high-dimensional data sets by non-parametric density peak clustering


METADATA ONLY
Loading...

Date

2021-06

Publication Type

Journal Article

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Data analysis in high-dimensional spaces aims at obtaining a synthetic description of a data set, revealing its main structure and its salient features. We here introduce an approach providing this description in the form of a topography of the data, namely a human-readable chart of the probability density from which the data are harvested. The approach is based on an unsupervised extension of Density Peak clustering and on a non-parametric density estimator that measures the probability density in the manifold containing the data. This allows finding automatically the number and the height of the peaks of the probability density, and the depth of the “valleys” separating them. Importantly, the density estimator provides a measure of the error, which allows distinguishing genuine density peaks from density fluctuations due to finite sampling. The approach thus provides robust and visual information about the density peaks height, their statistical reliability and their hierarchical organization, offering a conceptually powerful extension of the standard clustering partitions. We show that this framework is particularly useful in the analysis of complex data sets. © 2021 Elsevier

Permanent link

Publication status

published

Editor

Book title

Volume

560

Pages / Article No.

476 - 492

Publisher

Elsevier

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Clustering-algorithm; High-dimensional-data; Hierarchy-visualization; Density-peak-clustering; Non-parametric-density-estimation

Organisational unit

02207 - Functional Genomics Center Zurich / Functional Genomics Center Zurich check_circle

Notes

Funding

Related publications and datasets