Open access
Date
2023-01Type
- Journal Article
ETH Bibliography
yes
Altmetrics
Abstract
In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integrated GPU), a highly scalable Python package which provides a scikit-learn-like interface for out-of-core, GPU-enabled similarity searches, principal component analysis and clustering. Due to the PyTorch backend, it is highly modular and particularly tailored to many data types with a particular focus on biobank data analysis. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000624223Publication status
publishedExternal links
Journal / series
BioinformaticsVolume
Pages / Article No.
Publisher
Oxford University PressFunding
186101 - Precision-Medicine for Neurological Disorders: Harnessing the Power of Big Data and Machine Learning for Biomarker Discovery and Drug Repositioning Strategies (SNF)
More
Show all metadata
ETH Bibliography
yes
Altmetrics