Evaluation and Deployment of Resource-Constrained Machine Learning on Embedded Devices
dc.contributor.author
Heim, Lennart
dc.contributor.supervisor
Mähönen, Petri
dc.contributor.supervisor
Thiele, Lothar
dc.contributor.supervisor
Biri, Andreas
dc.contributor.supervisor
Qu, Zhongnan
dc.contributor.supervisor
Petrova, Marina
dc.date.accessioned
2021-04-22T06:31:44Z
dc.date.available
2021-04-21T15:57:18Z
dc.date.available
2021-04-22T06:31:44Z
dc.date.issued
2020-09-30
dc.identifier.uri
http://hdl.handle.net/20.500.11850/479861
dc.identifier.doi
10.3929/ethz-b-000479861
dc.description.abstract
Deep neural networks (DNNs) are a vital tool in pattern recognition and Machine Learning (ML) – solving a wide variety of problems in domains such as image classification, object detection, and speech processing. With the surge in the availability of cheap computation and memory resources, DNNs have grown both in architectural and computational complexity. Porting DNNs to resource-constrained devices – such as commercial home appliances – allows for cost-efficient deployment, widespread availability, and the preservation of sensitive personal data.
In this work, we discuss and address the challenges of enabling ML on microcontroller units (MCUs), where we focus on the popular ARM Cortex-M architecture. We deploy two well-known DNNs, which are used for image classification, on three different MCUs and subsequently benchmark their temporal runtime characteristics and energy consumption. This work proposes a toolchain, including a benchmarking suite based on TensorFlow Lite Micro (TFLu). The detailed effects and trade-offs that quantization, compiler options, and MCUs architecture can have on key performance metrics such as inference latency and energy consumption have not been investigated previously.
We find that such empirical investigations are indispensable, as the impact of specialized instructions and dedicated hardware units can be subtle. The actual empirical investigation by deployment is a cost-effective method for verifying and benchmarking, as theoretical assumptions regarding the latency and energy consumption are difficult to formulate due to the interdependence of DNN architecture, software, and the target hardware.
Using fixed point quantization for weights and activations, we achieve a 73 % reduction of the network memory footprint. Furthermore, we find that through the combination of quantization and the usage of hardware optimized acceleration libraries, a maximal 34× speedup of the inference latency is achieved – which consequently also leads to a decrease in energy consumption in the same order. We learn that the deployment of DNN on commercial off-the-shelf (COTS) MCUs is promising, but can be greatly accelerated by a combination of optimization techniques. This work concludes with an in-depth discussion on how to improve DNNs deployment on resource-constrained devices beyond the study.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.title
Evaluation and Deployment of Resource-Constrained Machine Learning on Embedded Devices
en_US
dc.type
Master Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
ethz.size
130 p.
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03429 - Thiele, Lothar (emeritus) / Thiele, Lothar (emeritus)
en_US
ethz.date.deposited
2021-04-21T15:57:29Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2021-04-22T06:32:02Z
ethz.rosetta.lastUpdated
2023-02-06T21:43:33Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Evaluation%20and%20Deployment%20of%20Resource-Constrained%20Machine%20Learning%20on%20Embedded%20Devices&rft.date=2020-09-30&rft.au=Heim,%20Lennart&rft.genre=unknown&rft.btitle=Evaluation%20and%20Deployment%20of%20Resource-Constrained%20Machine%20Learning%20on%20Embedded%20Devices
Files in this item
Publication type
-
Master Thesis [2145]