Open access
Date
2024-08-16Type
- Journal Article
ETH Bibliography
yes
Altmetrics
Abstract
BackgroundSingle-cell chromatin accessibility assays, such as scATAC-seq, are increasingly employed in individual and joint multi-omic profiling of single cells. As the accumulation of scATAC-seq and multi-omics datasets continue, challenges in analyzing such sparse, noisy, and high-dimensional data become pressing. Specifically, one challenge relates to optimizing the processing of chromatin-level measurements and efficiently extracting information to discern cellular heterogeneity. This is of critical importance, since the identification of cell types is a fundamental step in current single-cell data analysis practices.ResultsWe benchmark 8 feature engineering pipelines derived from 5 recent methods to assess their ability to discover and discriminate cell types. By using 10 metrics calculated at the cell embedding, shared nearest neighbor graph, or partition levels, we evaluate the performance of each method at different data processing stages. This comprehensive approach allows us to thoroughly understand the strengths and weaknesses of each method and the influence of parameter selection.ConclusionsOur analysis provides guidelines for choosing analysis methods for different datasets. Overall, feature aggregation, SnapATAC, and SnapATAC2 outperform latent semantic indexing-based methods. For datasets with complex cell-type structures, SnapATAC and SnapATAC2 are preferred. With large datasets, SnapATAC2 and ArchR are most scalable. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000690818Publication status
publishedExternal links
Journal / series
Genome BiologyVolume
Pages / Article No.
Publisher
BioMed CentralSubject
Benchmark; ScATAC-seq; Clustering; Feature engineering; Dimensional reductionMore
Show all metadata
ETH Bibliography
yes
Altmetrics