Evaluation and Deployment of Resource-Constrained Machine Learning on Embedded Devices

Open access
Autor(in)
Datum
2020-09-30Typ
- Master Thesis
ETH Bibliographie
yes
Altmetrics
Abstract
Deep neural networks (DNNs) are a vital tool in pattern recognition and Machine Learning (ML) – solving a wide variety of problems in domains such as image classification, object detection, and speech processing. With the surge in the availability of cheap computation and memory resources, DNNs have grown both in architectural and computational complexity. Porting DNNs to resource-constrained devices – such as commercial home appliances – allows for cost-efficient deployment, widespread availability, and the preservation of sensitive personal data.
In this work, we discuss and address the challenges of enabling ML on microcontroller units (MCUs), where we focus on the popular ARM Cortex-M architecture. We deploy two well-known DNNs, which are used for image classification, on three different MCUs and subsequently benchmark their temporal runtime characteristics and energy consumption. This work proposes a toolchain, including a benchmarking suite based on TensorFlow Lite Micro (TFLu). The detailed effects and trade-offs that quantization, compiler options, and MCUs architecture can have on key performance metrics such as inference latency and energy consumption have not been investigated previously.
We find that such empirical investigations are indispensable, as the impact of specialized instructions and dedicated hardware units can be subtle. The actual empirical investigation by deployment is a cost-effective method for verifying and benchmarking, as theoretical assumptions regarding the latency and energy consumption are difficult to formulate due to the interdependence of DNN architecture, software, and the target hardware.
Using fixed point quantization for weights and activations, we achieve a 73 % reduction of the network memory footprint. Furthermore, we find that through the combination of quantization and the usage of hardware optimized acceleration libraries, a maximal 34× speedup of the inference latency is achieved – which consequently also leads to a decrease in energy consumption in the same order. We learn that the deployment of DNN on commercial off-the-shelf (COTS) MCUs is promising, but can be greatly accelerated by a combination of optimization techniques. This work concludes with an in-depth discussion on how to improve DNNs deployment on resource-constrained devices beyond the study. Mehr anzeigen
Persistenter Link
https://doi.org/10.3929/ethz-b-000479861Publikationsstatus
publishedBeteiligte
Referent: Mähönen, Petri
Referent: Thiele, Lothar
Referent: Biri, Andreas

Referent: Qu, Zhongnan

Referent: Petrova, Marina
Verlag
ETH ZurichOrganisationseinheit
03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)
ETH Bibliographie
yes
Altmetrics