Evaluation and Deployment of Resource-Constrained Machine Learning on Embedded Devices
OPEN ACCESS
Loading...
Author / Producer
Date
2020-09-30
Publication Type
Master Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Deep neural networks (DNNs) are a vital tool in pattern recognition and Machine Learning (ML) – solving a wide variety of problems in domains such as image classification, object detection, and speech processing. With the surge in the availability of cheap computation and memory resources, DNNs have grown both in architectural and computational complexity. Porting DNNs to resource-constrained devices – such as commercial home appliances – allows for cost-efficient deployment, widespread availability, and the preservation of sensitive personal data.
In this work, we discuss and address the challenges of enabling ML on microcontroller units (MCUs), where we focus on the popular ARM Cortex-M architecture. We deploy two well-known DNNs, which are used for image classification, on three different MCUs and subsequently benchmark their temporal runtime characteristics and energy consumption. This work proposes a toolchain, including a benchmarking suite based on TensorFlow Lite Micro (TFLu). The detailed effects and trade-offs that quantization, compiler options, and MCUs architecture can have on key performance metrics such as inference latency and energy consumption have not been investigated previously.
We find that such empirical investigations are indispensable, as the impact of specialized instructions and dedicated hardware units can be subtle. The actual empirical investigation by deployment is a cost-effective method for verifying and benchmarking, as theoretical assumptions regarding the latency and energy consumption are difficult to formulate due to the interdependence of DNN architecture, software, and the target hardware.
Using fixed point quantization for weights and activations, we achieve a 73 % reduction of the network memory footprint. Furthermore, we find that through the combination of quantization and the usage of hardware optimized acceleration libraries, a maximal 34× speedup of the inference latency is achieved – which consequently also leads to a decrease in energy consumption in the same order. We learn that the deployment of DNN on commercial off-the-shelf (COTS) MCUs is promising, but can be greatly accelerated by a combination of optimization techniques. This work concludes with an in-depth discussion on how to improve DNNs deployment on resource-constrained devices beyond the study.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Mähönen, Petri
Examiner : Thiele, Lothar
Examiner: Biri, Andreas
Examiner: Qu, Zhongnan
Examiner : Petrova, Marina
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)