Hardware Architectures for Energy-Efficient Neural Network Acceleration


Loading...

Author / Producer

Date

2023

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Rights / License

Abstract

Machine Learning (ML) models, particularly models based on Neural Networks (NNs), have revolutionized entire application fields such as image processing and synthesis, natural language processing, biomedical applications, or strategy games such as "Go". Fueled by the increasing availability of vast amounts of training data and compute power, the ubiquitous NN models are growing exponentially in size and complexity. This, in turn, caused an increase in the total compute power needed to run these networks by as much as three orders of magnitude within only two years. Hence, designing efficient hardware architectures to accelerate such models has become more crucial than ever. In this thesis, we propose efficient accelerator architectures for three distinct types of NNs: Recurrent Neural Networks (RNNs), Fully Connected Neural Networks (FCNNs) and Convolutional Neural Networks (CNNs). In the first part, we focus on a set of RNN and FCNN benchmarks for Radio Resource Management (RRM) – a critical set of problems in 5G mobile communications due to its ubiquitous deployment on every radio device and its low latency constraints. The rapidly evolving RRM algorithms make a multi-core Application-Specific Instruction-Set Processor (ASIP) architecture, which offers a trade-off between flexibility, efficiency, and cost, an optimal choice for an on-the-edge acceleration system. We have extended a single-core ASIP system into a multi-core ASIP acceleration system and parallelized the benchmarks. Our efficiency and performance evaluation reveals the typical memory bandwidth challenges inherent to RNN models due to their characteristic output-to-input feedback. In the second part, we present Muntaniala, a scalable standalone inference accelerator for Long Short-Term Memory (LSTM) RNNs, where we tackle the aforementioned distinctive memory bandwidth and scalability challenges when working with RNNs. The scalable architecture of Muntaniala allows running large RNN models by combining multiple tiles in a systolic array. For quantifying the overall system power, including I/O power, we built Vau da Muntanialas, to the best of our knowledge, the first demonstration of a systolic multi-chip-on-PCB array of RNN accelerators. In the next part of the thesis, we focus on resource-constraint Internet of Things (IoT) end nodes. We design a heterogeneous Artificial Intelligence-enabled Internet-of-Things (AI-IoT) System-on-a-Chip (SoC) called Marsellus, where we combine 16 enhanced RISC-V cores with Reconfigurable Binary Engine (RBE), a dedicated accelerator designed for re-configurable reduced-precision computation of Deep Neural Network (DNN) convolution layers according to the accuracy needs of the application. In the last part, we move on from quantized inference to the hardware (HW) acceleration of floating-point (FP) based training. We focus on a high-performance multicore cluster which exploits MiniFloat-NN, a RISC-V Instruction Set Architecture (ISA) extension for low-precision NN training as a soft tile. We explore the quality of results (QoR) of placed-and-routed cluster implementations with various floorplans and aspect ratios as a soft tile. Based on this exploration, we build a 432-core 2.5D chiplet system for ultra-efficient (Mini-)FP computation: Occamy.

Publication status

published

Editor

Contributors

Examiner: Benini, Luca
Examiner : Coussy, Philippe

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03996 - Benini, Luca / Benini, Luca check_circle

Notes

Funding

Related publications and datasets