Julian Moosmann


Loading...

Last Name

Moosmann

First Name

Julian

Organisational unit

03996 - Benini, Luca / Benini, Luca

Search Results

Publications 1 - 7 of 7
  • Bian, Sizhen; Kang, Pixi; Moosmann, Julian; et al. (2024)
    ISWC '24: Proceedings of the 2024 ACM International Symposium on Wearable Computers
    Electroencephalogram (EEG)-based Brain-Computer Interfaces (BCIs) have garnered significant interest across various domains, including rehabilitation and robotics. Despite advancements in neural network-based EEG decoding, maintaining performance across diverse user populations remains challenging due to feature distribution drift. This paper presents an effective approach to address this challenge by implementing a lightweight and efficient on-device learning engine for wearable motor imagery recognition. The proposed approach, applied to the well-established EEGNet architecture, enables real-time and accurate adaptation to EEG signals from unregistered users. Leveraging the newly released low-power parallel RISC-V-based processor, GAP9 from Greeenwaves, and the Physionet EEG Motor Imagery dataset, we demonstrate a remarkable accuracy gain of up to 7.31% with respect to the baseline with a memory footprint of 15.6 KByte. Furthermore, by optimizing the input stream, we achieve enhanced real-time performance without compromising inference accuracy. Our tailored approach exhibits inference time of 14.9 ms and 0.76 mJ per single inference and 20 us and 0.83 uJ per single update during online training. These findings highlight the feasibility of our method for edge EEG devices as well as other battery-powered wearable AI systems suffering from subject-dependant feature distribution drift.
  • Moosmann, Julian; Mandula, Jakub; Mayer, Philipp; et al. (2023)
    2023 IEEE SENSORS
    Event-based cameras, also called silicon retinas, potentially revolutionize computer vision by detecting and reporting significant changes in intensity asynchronous events, offering extended dynamic range, low latency, and low power consumption, enabling a wide range of applications from autonomous driving to longtime surveillance. As an emerging technology, there is a notable scarcity of publicly available datasets for event-based systems that also feature frame-based cameras, in order to exploit the benefits of both technologies. This work quantitatively evaluates a multi-modal camera setup for fusing high-resolution DVS data with RGB image data by static camera alignment. The proposed setup, which is intended for semi-automatic DVS data labeling, combines two recently released Prophesee EVK4 DVS cameras and one global shutter XIMEA MQ022CG-CM RGB camera. After alignment, state-of-the-art object detection or segmentation networks label the image data by mapping boundary boxes or labeled pixels directly to the aligned events. To facilitate this process, various time-based synchronization methods for DVS data are analyzed, and calibration accuracy, camera alignment, and lens impact are evaluated. Experimental results demonstrate the benefits of the proposed system: the best synchronization method yields an image calibration error of less than 0.90px and a pixel cross-correlation deviation of1.6px, while a lens with 8mm focal length enables detection of objects with size 30cm at a distance of 350m against homogeneous background.
  • Moosmann, Julian; Giordano, Marco; Vogt, Christian; et al. (2023)
    2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)
    This paper introduces a highly flexible, quantized, memory-efficient, and ultra-lightweight object detection network, called TinyissimoYOLO. It aims to enable object detection on microcontrollers in the power domain of milliwatts, with less than 0.5 MB memory available for storing convolutional neural network (CNN) weights. The proposed quantized network architecture with 422 k parameters, enables real-time object detection on embedded microcontrollers, and it has been evaluated to exploit CNN accelerators. In particular, the proposed network has been deployed on the MAX78000 microcontroller achieving high frame-rate of up to 180 fps and an ultra-low energy consumption of only 196 µJ per inference with an inference efficiency of more than 106 MAC/Cycle. TinyissimoYOLO can be trained for any multi-object detection. However, considering the small network size, adding object detection classes will increase the size and memory consumption of the network, thus object detection with up to 3 classes is demonstrated. Furthermore, the network is trained using quantization-aware training and deployed with 8-bit quantization on different microcontrollers, such as STM32H7A3, STM32L4R9, Apollo4b and on the MAX78000’s CNN accelerator. Performance evaluations are presented in this paper.
  • Moosmann, Julian; Mandula, Jakub; Li, Jiayong; et al. (2025)
    Proceedings of SPIE ~ Target and Background Signatures XI: Traditional Methods and Artificial Intelligence
    Wide Field-of-View (FoV) and high-resolution imaging are crucial capabilities for visual surveillance. However, they impose significant computational demands, posing significant challenges for embedded platforms targeting real-time performance. Event-based cameras help in curtailing computational effort by capturing only brightness changes at the pixel level, thereby reducing data volume while enhancing temporal resolution and dynamic range. Moreover, their inherent sensitivity to object edges improves the visibility of camouflage patterns. This work presents a real-time, wide-FoV event-based vision system based on a dual-camera setup with onboard object detection and tracking. The proposed system implements a low-latency end-to-end pipeline encompassing data capture, event-stream processing, image stitching, object detection, and tracking. Custom CUDA kernels are developed for efficient event processing and stitching, while a YOLOv8-based detector is evaluated in combination with multiple tracking algorithms. With the dual-camera configuration generating up to 20 million events per second per camera, end-to-end object detection and tracking is achieved in under 30 milliseconds (30–40 FPS) on an NVIDIA Jetson AGX Orin. The system demonstrates linear scalability, establishes a baseline for real-time multi-camera event-based vision-at-the-edge platforms, and provides the first embedded implementation exploiting event-based stitching and detection.
  • Bartoli , Pietro; Jayaprakash , Varsha; Moosmann, Julian; et al. (2025)
    2025 10th International Workshop on Advances in Sensors and Interfaces (IWASI)
    Hand gesture recognition is a cornerstone of intuitive human–computer interaction (HCI), particularly for wearable and extended reality (XR) systems. Today’s gesture recognition solutions are predominantly based on frame-based cameras. However, these systems underperform in challenging lighting conditions, and require substantial computational power not always available on battery-powered wearable devices. This limitation can significantly hinder real-time processing performance. A promising alternative is the use of emerging Dynamic Vision Sensors (DVS), which perform exceptionally well in high-dynamic lighting environments. These sensors adjust energy consumption based on scene activity, yielding sparse yet semantically rich data, enabling efficient battery operation and supporting real-time processing. However, current event-based gesture datasets generally include only temporal segmentation, without the spatial annotations essential for hand tracking. To address this gap, we introduce LynX, a novel egocentric gesture dataset collected using custom-designed wearable hardware built around the Prophesee GENX320 DVS and a low power multicore RISC-V processor called GAP9 from Greenwaves. The dataset includes recordings from 18 subjects performing 13 gesture classes across four diverse scenarios, specifically designed to exploit the advantages of DVS by incorporating dynamic lighting conditions and motion-rich environments. Each event frame is annotated with per-frame hand bounding boxes in YOLO format and precise temporal segmentation for each gesture instance. By combining spatial and temporal annotations from a first-person perspective, LynX advances event-based HCI benchmarks, enabling spatio-temporal analysis for low-latency, high-dynamic-range gesture recognition in XR. The dataset is publicly available at: https://huggingface.co/datasets/pietroba/Lynx
  • Boyle, Liam; Moosmann, Julian; Baumann, Nicolas; et al. (2024)
    IEEE Sensors Journal
    Advances in lightweight neural networks have revolutionized computer vision in a broad range of Internet of Things (IoT) applications, encompassing remote monitoring and process automation. However, the detection of small objects, which is crucial for many of these applications, remains an underexplored area in current computer vision research, particularly for low-power embedded devices that host resource-constrained processors. To address said gap, this paper proposes an adaptive tiling method for lightweight and energy-efficient object detection networks, including YOLO-based models and the popular Faster Objects More Objects (FOMO) network. The proposed tiling enables object detection on low-power Microcontroller Units (MCUs) with no compromise on accuracy compared to large-scale detection models. The benefit of the proposed method is demonstrated by applying it to FOMO and TinyissimoYOLO networks on a novel RISC-V -based MCU with built-in Machine Learning (ML) accelerators. Extensive experimental results show that the proposed tiling method boosts the F1-score by up to 225% for both FOMO and TinyissimoYOLO networks while reducing the average object count error by up to 76% with FOMO and up to 89% for TinyissimoYOLO. Furthermore, the findings of this work indicate that using a soft F1 loss over the popular binary cross-entropy loss can serve as an implicit non-maximum suppression for the FOMO network. To evaluate the real-world performance, the networks are deployed on the RISC-V based GAP9 microcontroller from GreenWaves Technologies , showcasing the proposed method’s ability to strike a balance between detection performance (58% − 95% F1 score), low latency (0.6ms/Inference - 16.2ms/Inference), and energy efficiency (31 μJ/Inference - 1.27mJ/Inference) while performing multiple predictions using high-resolution images on a MCU.
  • Moosmann, Julian; Müller, Hanna; Zimmerman, Nicky; et al. (2024)
    IEEE Access
    This paper deploys and explores variants of TinyissimoYOLO, a highly flexible and fully quantized ultra-lightweight object detection network designed for edge systems with a power envelope of a few milliwatts. With experimental measurements, we present a comprehensive characterization of the network’s detection performance, exploring the impact of various parameters, including input resolution, number of object classes, and hidden layer adjustments. We deploy variants of TinyissimoYOLO on state-of-the-art ultra-low-power extreme edge platforms, presenting a detailed comparison on latency, energy efficiency, and their ability to efficiently parallelize the workload. In particular, the paper presents a comparison between a RISC-V-based parallel processor (GAP9 from GreenWaves Technologies) with and without use of its on-chip hardware accelerator, an ARM Cortex-M7 core (STM32H7 from ST Microelectronics), two ARM Cortex-M4 cores (STM32L4 from ST Microelectronics and Apollo4b from Ambiq), and a multi-core platform aimed at edge AI applications with a CNN hardware accelerator (MAX78000 from Analog Devices). Experimental results show that the GAP9’s hardware accelerator achieves the lowest inference latency and energy at $\mathrm {2.12ms }$ and $\mathrm {150~\mu \text {J} }$ respectively, which is around 2x faster and 20% more energy efficient than the next best platform, the MAX78000. The hardware accelerator of GAP9 can even run an increased resolution version of TinyissimoYOLO with $112\times 112$ pixels and 10 detection classes within 3.2 ms, consuming $\mathrm {245~\mu \text {J} }$ . We also deployed and profiled a multi-core implementation on GAP9 at different core voltages and frequencies, achieving $\mathrm {11.3ms }$ with the lowest-latency and $\mathrm {490~\mu \text {J} }$ with the most energy-efficient configuration. With this paper, we demonstrate the flexibility of TinyissimoYOLO and prove its detection accuracy on a widely used detection dataset. Furthermore, we demonstrate its suitability for real-time ultra-low-power edge inference.
Publications 1 - 7 of 7