Metadata only
Date
2024-02-01Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning and quantization, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multiplications and memory. However, pruning often fails to capture properties of the underlying hardware, causing unstructured sparsity and loadbalance inefficiency, thus bottlenecking resource improvements. We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures. Evaluated on a range of tasks, including submicrosecond particle classification at CERN's Large Hadron Collider and fast image classification, the proposed method achieves reductions ranging between 55% and 92% in the DSP utilization and up to 81% in BRAM utilization. Show more
Publication status
publishedExternal links
Book title
2023 International Conference on Field Programmable Technology (ICFPT)Pages / Article No.
Publisher
IEEEEvent
Subject
FPGA; Deep Learning; PruningOrganisational unit
00002 - ETH Zürich
Notes
Conference lecture held on December 13, 2023.More
Show all metadata
ETH Bibliography
yes
Altmetrics