A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers
dc.contributor.author
Singh, Gagandeep
dc.contributor.author
Alser, Mohammed
dc.contributor.author
Khodamoradi, Alireza
dc.contributor.author
Denolf, Kristof
dc.contributor.author
Firtina, Can
dc.contributor.author
Cavlak, Meryem Banu
dc.contributor.author
Corporaal, Henk
dc.contributor.author
Mutlu, Onur
dc.date.accessioned
2023-03-02T12:47:57Z
dc.date.available
2023-01-29T23:24:02Z
dc.date.available
2023-03-02T12:47:57Z
dc.date.issued
2022-12-08
dc.identifier.other
10.48550/ARXIV.2211.03079
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/595589
dc.description.abstract
Nanopore sequencing generates noisy electrical signals that need to be converted into a standard string of DNA nucleotide bases using a computational step called basecalling. The accuracy and speed of basecalling have critical implications for all later steps in genome analysis. Many researchers adopt complex deep learning-based models to perform basecalling without considering the compute demands of such models, which leads to slow, inefficient, and memory-hungry basecallers. Therefore, there is a need to reduce the computation and memory cost of basecalling while maintaining accuracy. Our goal is to develop a comprehensive framework for creating deep learning-based basecallers that provide high efficiency and performance. We introduce RUBICON, a framework to develop hardware-optimized basecallers. RUBICON consists of two novel machine-learning techniques that are specifically designed for basecalling. First, we introduce the first quantization-aware basecalling neural architecture search (QABAS) framework to specialize the basecalling neural network architecture for a given hardware acceleration platform while jointly exploring and finding the best bit-width precision for each neural network layer. Second, we develop SkipClip, the first technique to remove the skip connections present in modern basecallers to greatly reduce resource and storage requirements without any loss in basecalling accuracy. We demonstrate the benefits of RUBICON by developing RUBICALL, the first hardware-optimized basecaller that performs fast and accurate basecalling. Compared to the fastest state-of-the-art basecaller, RUBICALL provides a 3.19x speedup with 2.97% higher accuracy. We show that RUBICON helps researchers develop hardware-optimized basecallers that are superior to expert-designed models.
en_US
dc.language.iso
en
en_US
dc.publisher
Cornell University
en_US
dc.subject
Hardware Architecture (cs.AR)
en_US
dc.subject
Distributed, Parallel, and Cluster Computing (cs.DC)
en_US
dc.subject
Genomics (q-bio.GN)
en_US
dc.subject
FOS: Computer and information sciences
en_US
dc.subject
FOS: Biological sciences
en_US
dc.title
A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers
en_US
dc.type
Working Paper
ethz.journal.title
arXiv
ethz.pages.start
2211.03079v3
en_US
ethz.size
29 p.
en_US
ethz.version.edition
v3
en_US
ethz.identifier.arxiv
2211.03079
ethz.publication.place
Ithaca, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::09483 - Mutlu, Onur / Mutlu, Onur
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::09483 - Mutlu, Onur / Mutlu, Onur
en_US
ethz.relation.isSupplementedBy
https://bridges.monash.edu/articles/dataset/Raw_fast5s/7676174
ethz.relation.isSupplementedBy
https://bridges.monash.edu/articles/dataset/Reference_genomes/7676135
ethz.relation.isSupplementedBy
https://github.com/rrwick/Basecalling-comparison
ethz.relation.isPreviousVersionOf
10.3929/ethz-b-000661676
ethz.date.deposited
2023-01-29T23:24:02Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2023-03-02T12:48:02Z
ethz.rosetta.lastUpdated
2023-03-02T12:48:02Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=A%20Framework%20for%20Designing%20Efficient%20Deep%20Learning-Based%20Genomic%20Basecallers&rft.jtitle=arXiv&rft.date=2022-12-08&rft.spage=2211.03079v3&rft.au=Singh,%20Gagandeep&Alser,%20Mohammed&Khodamoradi,%20Alireza&Denolf,%20Kristof&Firtina,%20Can&rft.genre=preprint&rft_id=info:doi/10.48550/ARXIV.2211.03079&
Files in this item
Files | Size | Format | Open in viewer |
---|---|---|---|
There are no files associated with this item. |
Publication type
-
Working Paper [5824]