Lukas Arno Jakob Cavigelli
Loading...
Last Name
Cavigelli
First Name
Lukas Arno Jakob
ORCID
Organisational unit
34 results
Search Results
Publications 1 - 10 of 34
- FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of ThingsItem type: Journal Article
IEEE Internet of Things JournalWang, Xiaying; Magno, Michele; Cavigelli, Lukas Arno Jakob; et al. (2020) - Hyperdrive: A systolically scalable binary-weight CNN Inference Engine for mW IoT End-NodesItem type: Conference Paper
2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)Andri, Renzo; Cavigelli, Lukas Arno Jakob; Rossi, Davide; et al. (2018) - Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural NetworksItem type: Journal Article
IEEE Transactions on Circuits and Systems I: Regular PapersCerutti, Gianmarco; Cavigelli, Lukas Arno Jakob; Andri, Renzo; et al. (2022)Keyword spotting (KWS) is a crucial function enabling the interaction with the many ubiquitous smart devices in our surroundings, either activating them through wake-word or directly as a human-computer interface. For many applications, KWS is the entry point for our interactions with the device and, thus, an always-on workload. Many smart devices are mobile and their battery lifetime is heavily impacted by continuously running services. KWS and similar always-on services are thus the focus when optimizing the overall power consumption. This work addresses KWS energy-efficiency on low-cost microcontroller units (MCUs). We combine analog binary feature extraction with binary neural networks. By replacing the digital preprocessing with the proposed analog front-end, we show that the energy required for data acquisition and preprocessing can be reduced by 29x, cutting its share from a dominating 85% to a mere 16% of the overall energy consumption for our reference KWS application. Experimental evaluations on the Speech Commands Dataset show that the proposed system outperforms state-of-the-art accuracy and energy efficiency, respectively, by 1% and 4.3x on a 10-class dataset while providing a compelling accuracy-energy trade-off including a 2% accuracy drop for a 71x energy reduction. - Fanncortexm: An open source toolkit for deployment of multi-layer neural networks on arm cortex-m family microcontrollers : formance analysis with stress detectionItem type: Conference Paper
2019 IEEE 5th World Forum on Internet of Things (WF-IoT)Magno, Michele; Cavigelli, Lukas Arno Jakob; Mayer, Philipp; et al. (2019) - Laelaps: An Energy-Efficient Seizure Detection Algorithm from Long-term Human iEEG Recordings without False AlarmsItem type: Conference Paper
Proceedings of the 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)Burrello, Alessio; Cavigelli, Lukas Arno Jakob; Schindler, Kaspar; et al. (2019)We propose Laelaps, an energy-efficient and fast learning algorithm with no false alarms for epileptic seizure detection from long-term intracranial electroencephalography (iEEG) signals. Laelaps uses end-to-end binary operations by exploiting symbolic dynamics and brain-inspired hyperdimensional computing. Laelaps’s results surpass those yielded by state-of-the-art (SoA) methods [1], [2], [3], including deep learning, on a new very large dataset containing 116 seizures of 18 drug-resistant epilepsy patients in 2656 hours of recordings—each patient implanted with 24 to 128 iEEG electrodes. Laelaps trains 18 patient-specific models by using only 24 seizures: 12 models are trained with one seizure per patient, the others with two seizures. The trained models detect 79 out of 92 unseen seizures without any false alarms across all the patients as a big step forward in practical seizure detection. Importantly, a simple implementation of Laelaps on the Nvidia Tegra X2 embedded device achieves 1.7X–3.9X faster execution and 1.4X–2.9X lower energy consumption compared to the best result from the SoA methods. Our source code and anonymized iEEG dataset are freely available at http://ieeg-swez.ethz.ch. - HR-SAR-Net: A Deep Neural Network for Urban Scene Segmentation from High-Resolution SAR DataItem type: Conference Paper
2020 IEEE Sensors Applications Symposium (SAS)Wang, Xiaying; Cavigelli, Lukas Arno Jakob; Eggimann, Manuel; et al. (2020)Synthetic aperture radar (SAR) data is becoming increasingly available to a wide range of users through commercial service providers with resolutions reaching 0.5 m/px. Segmenting SAR data still requires skilled personnel, limiting the potential for large-scale use. We show that it is possible to automatically and reliably perform urban scene segmentation from next-gen resolution SAR data (0.15 m/px) using deep neural networks (DNNs), achieving a pixel accuracy of 95.19% and a mean intersection-over-union (mIoU) of 74.67% with data collected over a region of merely 2.2km2. The presented DNN is not only effective, but is very small with only 63k parameters and computationally simple enough to achieve a throughput of around 500 Mpx/s using a single GPU. We further identify that additional SAR receive antennas and data from multiple flights massively improve the segmentation accuracy. We describe a procedure for generating a high-quality segmentation ground truth from multiple inaccurate building and road annotations, which has been crucial to achieving these segmentation results. - Ultra-Low Power Context Recognition Fusing Sensor Data from an Energy-Neutral Smart WatchItem type: Conference Paper
Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ~ Internet of Things. IoT Infrastructures. Second International Summit, IoT 360° 2015, Rome, Italy, October 27-29, 2015, Revised Selected Papers, Part IIMagno, Michele; Cavigelli, Lukas Arno Jakob; Andri, Renzo; et al. (2016) - Mixed-Precision Quantization and Parallel Implementation of Multispectral Riemannian Classification for Brain-Machine InterfacesItem type: Conference Paper
2021 IEEE International Symposium on Circuits and Systems (ISCAS)Wang, Xiaying; Schneider, Tibor; Hersche, Michael; et al. (2021)With Motor-Imagery (MI) Brain-Machine Interfaces (BMIs) we may control machines by merely thinking of performing a motor action. Practical use cases require a wearable solution where the classification of the brain signals is done locally near the sensor using machine learning models embedded on energy-efficient microcontroller units (MCUs), for assured privacy, user comfort, and long-term usage. In this work, we provide practical insights on the accuracy-cost tradeoff for embedded BMI solutions. Our proposed Multispectral Riemannian Classifier reaches 75.1% accuracy on 4-class MI task. We further scale down the model by quantizing it to mixed-precision representations with a minimal accuracy loss of 1%, which is still 3.2% more accurate than the state-of-the- art embedded convolutional neural network. We implement the model on a low-power MCU with parallel processing units taking only 33.39 ms and consuming 1.304 mJ per classification. © 2021 IEEE - Vau Da Muntanialas: Energy-Efficient Multi-Die Scalable Acceleration of RNN InferenceItem type: Journal Article
IEEE Transactions on Circuits and Systems I: Regular PapersPaulin, Gianna; Conti, Francesco; Cavigelli, Lukas Arno Jakob; et al. (2022)Recurrent neural networks such as Long Short-Term Memories (LSTMs) learn temporal dependencies by keeping an internal state, making them ideal for time-series problems such as speech recognition. However, the output-to-input feedback creates distinctive memory bandwidth and scalability challenges in designing accelerators for RNNs. We present Muntaniala, an RNN accelerator architecture for LSTM inference with a silicon-measured energy-efficiency of 3.25 TOP/s/W and performance of 30.53 GOP/s in UMC 65nm technology. The scalable design of Muntaniala allows running large RNN models by combining multiple tiles in a systolic array. We keep all parameters stationary on every die in the array, drastically reducing the I/O communication to only loading new features and sharing partial results with other dies. For quantifying the overall system power, including I/O power, we built Vau da Muntanialas, to the best of our knowledge, the first demonstration of a systolic multi-chip-on-PCB array of RNN accelerator. Our multi-die prototype performs LSTM inference with 192 hidden states in 330 μs with a total system power of 9.0 mW at 10 MHz consuming 2.95 μJ. Targeting the 8/16-bit quantization implemented in Muntaniala, we show a phoneme error rate (PER) drop of approximately 3% with respect to floating-point (FP) on a 3L-384NH-123NI LSTM network on the TIMIT dataset. - CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video StreamsItem type: Working PaperCavigelli, Lukas Arno Jakob; Benini, Luca (2018)
Publications 1 - 10 of 34