An On-the-Fly Feature Map Compression Engine for Background Memory Access Cost Reduction in DNN Inference
Open access
Datum
2020-01Typ
- Working Paper
ETH Bibliographie
yes
Altmetrics
Abstract
Specialized hardware architectures and dedicated accelerators allow the application of Deep Learning directly within sensing nodes. With compute resources highly optimized for energy efficiency, a large part of the power consumption of such devices is caused by transfers of intermediate feature maps to and from large memories. Moreover, a significant share of the silicon area is dedicated to these memories to avoid highly expensive off-chip memory accesses. Extended Bit-Plane Compression (EBPC), a recently proposed compression scheme targeting DNN feature maps, offers an opportunity to increase energy efficiency by reducing both the data transfer volume and the size of large background memories. Besides exhibiting state-of-the-art compression ratios, it also has a small, simple hardware implementation. In post-layout power simulations, we show an energy cost between 0.27 pJ/word and 0.45 pJ/word, 3 orders of magnitude lower than the cost of off-chip memory accesses. It allows for a reduction in off-chip access energy by factors of 2.2x to 4x for MobileNetV2 and VGG16 respectively and can reduce on-chip access energy by up to 45 %. We further propose a way to integrate the EBPC hardware blocks, which perform on-the-fly compression and decompression on 8-bit feature map streams, into an embedded ultra-low-power processing system and show how the challenges arising from a variable-length compressed representation can be navigated in this context. Mehr anzeigen
Persistenter Link
https://doi.org/10.3929/ethz-b-000388819Publikationsstatus
publishedVerlag
ETH ZurichThema
Edge AI; Feature Map Compression; Deep Learning; Hardware AccelerationOrganisationseinheit
03996 - Benini, Luca / Benini, Luca
ETH Bibliographie
yes
Altmetrics