Evaluation and Deployment of Resource-Constrained Machine Learning on Embedded Devices
Open access
Author
Date
2020-09-30Type
- Master Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Deep neural networks (DNNs) are a vital tool in pattern recognition and Machine Learning (ML) – solving a wide variety of problems in domains such as image classification, object detection, and speech processing. With the surge in the availability of cheap computation and memory resources, DNNs have grown both in architectural and computational complexity. Porting DNNs to resource-constrained devices – such as commercial home appliances – allows for cost-efficient deployment, widespread availability, and the preservation of sensitive personal data.
In this work, we discuss and address the challenges of enabling ML on microcontroller units (MCUs), where we focus on the popular ARM Cortex-M architecture. We deploy two well-known DNNs, which are used for image classification, on three different MCUs and subsequently benchmark their temporal runtime characteristics and energy consumption. This work proposes a toolchain, including a benchmarking suite based on TensorFlow Lite Micro (TFLu). The detailed effects and trade-offs that quantization, compiler options, and MCUs architecture can have on key performance metrics such as inference latency and energy consumption have not been investigated previously.
We find that such empirical investigations are indispensable, as the impact of specialized instructions and dedicated hardware units can be subtle. The actual empirical investigation by deployment is a cost-effective method for verifying and benchmarking, as theoretical assumptions regarding the latency and energy consumption are difficult to formulate due to the interdependence of DNN architecture, software, and the target hardware.
Using fixed point quantization for weights and activations, we achieve a 73 % reduction of the network memory footprint. Furthermore, we find that through the combination of quantization and the usage of hardware optimized acceleration libraries, a maximal 34× speedup of the inference latency is achieved – which consequently also leads to a decrease in energy consumption in the same order. We learn that the deployment of DNN on commercial off-the-shelf (COTS) MCUs is promising, but can be greatly accelerated by a combination of optimization techniques. This work concludes with an in-depth discussion on how to improve DNNs deployment on resource-constrained devices beyond the study. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000479861Publication status
publishedContributors
Examiner: Mähönen, Petri
Examiner: Thiele, Lothar
Examiner: Biri, Andreas
Examiner: Qu, Zhongnan
Examiner: Petrova, Marina
Publisher
ETH ZurichOrganisational unit
03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)
More
Show all metadata
ETH Bibliography
yes
Altmetrics