DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training


Loading...

Date

2022

Publication Type

Journal Article

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present DARKSIDE, a System-on-Chip with a heterogeneous cluster of eight RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost the performance and efficiency on key compute-intensive DNN kernels, the cluster is enriched with three digital accelerators: 1) a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); 2) a minimal overhead datamover to marshal 1–32-b data on-the-fly; and 3) a 16-b floating-point tensor product engine (TPE) for tiled matrix-multiplication acceleration. DARKSIDE is implemented in 65-nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency—enough to enable on-chip floating-point training at competitive speed coupled with ultralow power quantized inference.

Publication status

published

Editor

Book title

Volume

2

Pages / Article No.

231 - 243

Publisher

IEEE

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Heterogeneous cluster; tensor product engine (TPE); ultralow-power AI

Organisational unit

03996 - Benini, Luca / Benini, Luca check_circle

Notes

Funding

101034126 - Pilot using Independent Local & Open Technologies (EC)

Related publications and datasets