Evaluation and Deployment of Resource-Constrained Machine Learning on Embedded Devices


Loading...

Author / Producer

Date

2020-09-30

Publication Type

Master Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Deep neural networks (DNNs) are a vital tool in pattern recognition and Machine Learning (ML) – solving a wide variety of problems in domains such as image classification, object detection, and speech processing. With the surge in the availability of cheap computation and memory resources, DNNs have grown both in architectural and computational complexity. Porting DNNs to resource-constrained devices – such as commercial home appliances – allows for cost-efficient deployment, widespread availability, and the preservation of sensitive personal data. In this work, we discuss and address the challenges of enabling ML on microcontroller units (MCUs), where we focus on the popular ARM Cortex-M architecture. We deploy two well-known DNNs, which are used for image classification, on three different MCUs and subsequently benchmark their temporal runtime characteristics and energy consumption. This work proposes a toolchain, including a benchmarking suite based on TensorFlow Lite Micro (TFLu). The detailed effects and trade-offs that quantization, compiler options, and MCUs architecture can have on key performance metrics such as inference latency and energy consumption have not been investigated previously. We find that such empirical investigations are indispensable, as the impact of specialized instructions and dedicated hardware units can be subtle. The actual empirical investigation by deployment is a cost-effective method for verifying and benchmarking, as theoretical assumptions regarding the latency and energy consumption are difficult to formulate due to the interdependence of DNN architecture, software, and the target hardware. Using fixed point quantization for weights and activations, we achieve a 73 % reduction of the network memory footprint. Furthermore, we find that through the combination of quantization and the usage of hardware optimized acceleration libraries, a maximal 34× speedup of the inference latency is achieved – which consequently also leads to a decrease in energy consumption in the same order. We learn that the deployment of DNN on commercial off-the-shelf (COTS) MCUs is promising, but can be greatly accelerated by a combination of optimization techniques. This work concludes with an in-depth discussion on how to improve DNNs deployment on resource-constrained devices beyond the study.

Publication status

published

External links

Editor

Contributors

Examiner : Mähönen, Petri
Examiner : Thiele, Lothar
Examiner: Biri, Andreas
Examiner: Qu, Zhongnan
Examiner : Petrova, Marina

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)

Notes

Funding

Related publications and datasets