Moritz Scherer


Loading...

Last Name

Scherer

First Name

Moritz

Organisational unit

Search Results

Publications 1 - 10 of 25
  • Busia, Paola; Cossettini, Andrea; Ingolfsson, Thorir Mar; et al. (2022)
    2022 IEEE Biomedical Circuits and Systems Conference (BioCAS)
    The development of a device for long-term and continuous monitoring of epilepsy is a very challenging objective, due to the high accuracy standards and nearly zero false alarms required by clinical practices. To comply with such requirements, most of the approaches in the literature rely on a high number of acquisition channels and exploit classifiers operating on pre-processed features, hand-crafted considering the available data, currently fairly limited. Thus, they lack comfort, portability, and adaptability to future use cases and datasets. A step forward is needed towards the implementation of unobtrusive, wearable systems, with a reduced number of channels, implementable on ultra-low-power computing platforms. Leveraging the promising ability of transformers in capturing long-term raw data dependencies in time series, we present in this work EEGformer, a compact transformer model for more adaptable seizure detection, that can be executed in real-time on tiny MicroController Units (MCUs) and operates on just the raw electroencephalography (EEG) signal acquired by the 4 temporal channels. Our proposed model is able to detect 73% of the examined seizure events (100% when considering 6 out of 8 patients), with an average onset detection latency of 15.2s. The False Positive/hour (FP/h) rate is equal to 0.8 FP/h, although 100% specificity is obtained in most tests, with 5/40 outliers that are mostly caused by EEG artifacts. We deployed our model on the Ambiq Apollo4 MCU platform, where inference run requires 405 ms and 1.79 mJ at 96 MHz operating frequency, demonstrating the feasibility of epilepsy detection on raw EEG traces for low-power wearable systems. Considering the CHB-MIT Scalp EEG dataset as a reference, we compare with a state-of-the-art classifier, acting on handcrafted features designed on the target dataset, reaching well-aligned accuracy results and reducing the onset detection latency by over 20%. Moreover, we compare with two adequately optimized Convolutional Neural Networks-based approaches, outperforming both alternatives on all the accuracy metrics.
  • Scherer, Moritz; Sidler, Fabian; Rogenmoser, Michael; et al. (2022)
    2022 18th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
    The trend in Internet of Things research points toward performing increasingly compute-intensive data analysis tasks on embedded sensor nodes, rather than server centers. Exploiting the technological advances in both energy efficiency, and Tiny Machine Learning algorithms and methods, an increasing number of recognition and classification tasks can be performed by small, low-power, wireless sensor nodes. This paper presents WideVision, a wireless, wide-area sensing platform capable of performing on-board person detection with power requirements in the mW range. The WideVision platform integrates seamlessly into the Internet of Things, by coupling a dedicated multiradio platform, including a LoRa interface, enabling medium and long-range communication, with a novel parallel RISC-V microcontroller. We evaluate the proposed platform with the GAP8 microcontroller, which includes an 8-core RISC-V cluster, and greyscale camera to perform person detection by training and deploying an advanced, quantized neural network, achieving a statistical accuracy 84.5% for a 5-person detection task with a latency of only 182 ms. Experimental results demonstrate that the WideVision sensor node platform while performing inference at a rate of one image per minute on-board, is capable of lasting 300 days on a 2400 mAh Li-ion battery, and 65 days when evaluating one image per 10 seconds while providing effective surveillance of its perimeter.
  • Rutishauser, Georg; Mihali, Joan; Scherer, Moritz; et al. (2024)
    2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
    Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2 %, resulting in an energy efficiency improvement by 57.1 %. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISCV-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.
  • Jung, Victor J.B.; Burrello, Alessio; Scherer, Moritz; et al. (2025)
    IEEE Transactions on Computers
    Transformer networks are rapidly becoming State of the Art (SotA) in many fields, such as Natural Language Processing (NLP) and Computer Vision (CV). Similarly to Convolutional Neural Networks (CNNs), there is a strong push for deploying Transformer models at the extreme edge, ultimately fitting the tiny power budget and memory footprint of Micro-Controller Units (MCUs). However, the early approaches in this direction are mostly ad-hoc, platform, and model-specific. This work aims to enable and optimize the flexible, multi-platform deployment of encoder Tiny Transformers on commercial MCUs. We propose a complete framework to perform end-to-end deployment of Transformer models onto single and multi-core MCUs. Our framework provides an optimized library of kernels to maximize data reuse and avoid unnecessary data marshaling operations into the crucial attention block. A novel Multi-Head Self-Attention (MHSA) inference schedule, named Fused-Weight Self-Attention (FWSA), is introduced, fusing the linear projection weights offline to further reduce the number of operations and parameters. Furthermore, to mitigate the memory peak reached by the computation of the attention map, we present a Depth-First Tiling (DFT) scheme for MHSA tailored for cache-less MCU devices that allows splitting the computation of the attention map into successive steps, never materializing the whole matrix in memory. We evaluate our framework on three different MCU classes exploiting ARM and RISC-V Instruction Set Architecture (ISA), namely the STM32H7 (ARM Cortex M7), the STM32L4 (ARM Cortex M4), and GAP9 (RV32IMC-XpulpV2). We reach an average of 4.79× and 2.0× lower latency compared to SotA libraries CMSIS-NN (ARM) and PULP-NN (RISC-V), respectively. Moreover, we show that our MHSA depth-first tiling scheme reduces the memory peak by up to 6.19×, while the fused-weight attention can reduce the runtime by 1.53×, and number of parameters by 25 %. Leveraging the optimizations proposed in this work, we run end-to-end inference of three SotA Tiny Transformers for three applications characterized by different input dimensions and network hyperparameters. We report significant improvements across the networks: for instance, when executing a transformer block for the task of radar-based hand-gesture recognition on GAP9, we achieve a latency of 0.14ms and energy consumption of 4.92 μJ, 2.32× lower than the SotA PULP-NN library on the same platform.
  • Busia, Paola; Cossettini, Andrea; Ingolfsson, Thorir Mar; et al. (2024)
    IEEE Transactions on Biomedical Circuits and Systems
    The long-term, continuous analysis of electroencephalography (EEG) signals on wearable devices to automatically detect seizures in epileptic patients is a high-potential application field for deep neural networks, and specifically for transformers, which are highly suited for end-to-end time series processing without handcrafted feature extraction. In this work, we propose a small-scale transformer detector, the EEGformer, compatible with unobtrusive acquisition setups that use only the temporal channels. EEGformer is the result of a hardware-oriented design exploration, aiming for efficient execution on tiny low-power micro-controller units (MCUs) and low latency and false alarm rate to increase patient and caregiver acceptance.Tests conducted on the CHB-MIT dataset show a 20% reduction of the onset detection latency with respect to the state-of-the-art model for temporal acquisition, with a competitive 73% seizure detection probability and 0.15 false-positive-per-hour (FP/h). Further investigations on a novel and challenging scalp EEG dataset result in the successful detection of 88% of the annotated seizure events, with 0.45 FP/h.We evaluate the deployment of the EEGformer on three commercial low-power computing platforms: the single-core Apollo4 MCU and the GAP8 and GAP9 parallel MCUs. The most efficient implementation (on GAP9) results in as low as 13.7 ms and 0.31 mJ per inference, demonstrating the feasibility of deploying the EEGformer on wearable seizure detection systems with reduced channel count and multi-day battery duration.
  • Scherer, Moritz; Di Mauro, Alfio; Rutishauser, Georg; et al. (2022)
    2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)
    Tiny Machine Learning (TinyML) applications impose mu J/Inference constraints, with maximum power consumption of a few tens of mW. It is extremely challenging to meet these requirement at a reasonable accuracy level. In this work, we address this challenge with a flexible, fully digital Ternary Neural Network (TNN) accelerator in a RISC-V-based SoC. The design achieves 2.72 mu J/Inference, 12.2 mW, 3200 Inferences/sec at 0.5 V for a non-trivial 9-layer, 96 channels-per-layer network with CIFAR-10 accuracy of 86 %. The peak energy efficiency is 1036 TOp/s/W, outperforming the state-of-the-art in silicon-proven TinyML accelerators by 1.67x.
  • Prasad, Arpan Suravi; Scherer, Moritz; Conti, Francesco; et al. (2024)
    IEEE Journal of Solid-State Circuits
    Extended reality (XR) applications are machine learning (ML)-intensive, featuring deep neural networks (DNNs) with millions of weights, tightly latency-bound (10-20 ms end-to-end), and power-constrained (low tens of mW average power). While ML performance and efficiency can be achieved by introducing neural engines within low-power systems-on-chip (SoCs), system-level power for nontrivial DNNs depends strongly on the energy of non-volatile memory (NVM) access for network weights. This work introduces Siracusa, a near-sensor heterogeneous SoC for next-generation XR devices manufactured in 16 nm CMOS. Siracusa couples an octa-core cluster of RISC-V digital signal processing (DSP) cores with a novel tightly coupled "At-Memory" integration between a state-of-the-art digital neural engine called and an on-chip NVM based on magnetoresistive random access memory (MRAM), achieving 1.7x higher throughput and 3x better energy efficiency than XR SoCs using NVM as background memory. The fabricated SoC prototype achieves an area efficiency of 65.2 GOp/s/mm(2) and a peak energy efficiency of 8.84 TOp/J for DNN inference while supporting complex, heterogeneous application workloads, which combine ML with conventional signal processing and control.
  • Di Mauro, Alfio; Scherer, Moritz; Rossi, Davide; et al. (2022)
  • Scherer, Moritz; Mayer, Philipp; Di Mauro, Alfio; et al. (2021)
    2021 IEEE International Instrumentation and Measurement Technology Conference (I2MTC)
    A recent and promising approach to minimize the power consumption of always-on battery-operated sensors is to perform 'smart' detection of events to trigger processing. This approach effectively reduces the data bandwidth and power consumption at the system-level and increases the lifetime of sensor nodes. This paper presents an always-on, event-driven ultra-low-power camera platform for motion detection applications. The platform exploits an event-driven VGA imager that features a motion detection mode based on a tunable scene background subtraction algorithm and a grayscale imaging mode. To reduce the power consumption in the motion detection mode, the platform implements a configurable refresh rate which allows for adaption to sensing requirements by trading off between power consumption and detection sensitivity. With accurate experimental evaluation the paper demonstrates that the proposed approach reduces the system-level power consumption for always-on motion sensing applications by switching between an active 15 FPS imaging mode, consuming 5.5 mW and a low-power motion detection mode consuming 1.8 mW. We further estimate the power consumption for a single-chip solution and show that the system-level power budget can be reduced to 2.4 mW in imaging, and 400W in detection mode.
  • Burrello, Alessio; Scherer, Moritz; Zanghieri, Marcello; et al. (2021)
    2021 IEEE International Conference on Omni-Layer Intelligent Systems (COINS)
    Transformer networks have become state-of-The-Art for many tasks such as NLP and are closing the gap on other tasks like image recognition. Similarly, Transformers and Attention methods are starting to attract attention on smaller-scale tasks, which fit the typical memory envelope of MCUs. In this work, we propose a new set of execution kernels tuned for efficient execution on MCU-class RISC-V and ARM Cortex-M cores. We focus on minimizing memory movements while maximizing data reuse in the Attention layers. With our library, we obtain 3.4×, 1.8×, and 2.1× lower latency and energy on 8-bit Attention layers, compared to previous state-of-The-Art (SoA) linear and matrix multiplication kernels in the CMSIS-NN and PULP-NN libraries on the STM32H7 (Cortex M7), STM32L4 (Cortex M4), and GAP8 (RISC-V IMC-Xpulp) platforms, respectively. As a use case for our TinyTransformer library, we also demonstrate that we can fit a 263 kB Transformer on the GAP8 platform, outperforming the previous SoA convolutional architecture on the TinyRadarNN dataset, with a latency of 9.24 ms and 0.47 mJ energy consumption and an accuracy improvement of 3.5%.
Publications 1 - 10 of 25