Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow
Abstract
One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s (0.65 V, 22nm FD-SOI technology). Show more
Permanent link
https://doi.org/10.3929/ethz-b-000714939Publication status
publishedExternal links
Journal / series
IEEE Design & TestPublisher
IEEESubject
NEURAL NETWORKS (COMPUTER SYSTEMS); TinyML; Deployment; Transformers; AcceleratorsOrganisational unit
03996 - Benini, Luca / Benini, Luca
Funding
101095947 - Together for RISc-V Technology and ApplicatioNs (SBFI)
101070634 - A multiprocessor system on chip with in-memory neural processing unit (SBFI)
More
Show all metadata
ETH Bibliography
yes
Altmetrics