Pietro Bonazzi
Loading...
Last Name
Bonazzi
First Name
Pietro
ORCID
Organisational unit
01225 - D-ITET Zentr. f. projektbasiertes Lernen / D-ITET Center for Project-Based Learning
9 results
Filters
Reset filtersSearch Results
Publications 1 - 9 of 9
- RGB-Event Fusion with Self-Attention for Collision PredictionItem type: Conference PaperBonazzi, Pietro; Christian Vogt; Michael Jost; et al. (2025)Ensuring robust and real-time obstacle avoidance is critical for the safe operation of autonomous robots in dynamic, real-world environments. This paper proposes a neural network framework for predicting the time and collision position of an unmanned aerial vehicle with a dynamic object, using RGB and event-based vision sensors. The proposed architecture consists of two separate encoder branches, one for each modality, followed by fusion by self-attention to improve prediction accuracy. To facilitate benchmarking, we leverage the ABCD [8] dataset collected that enables detailed comparisons of single-modality and fusion-based approaches. At the same prediction throughput of 50Hz, the experimental results show that the fusion-based model offers an improvement in prediction accuracy over single-modality approaches of 1% on average and 10% for distances beyond 0.5m, but comes at the cost of +71% in memory and + 105% in FLOPs. Notably, the event-based model outperforms the RGB model by 4% for position and 26% for time error at a similar computational cost, making it a competitive alternative. Additionally, we evaluate quantized versions of the event-based models, applying 1- to 8-bit quantization to assess the trade-offs between predictive performance and computational efficiency. These findings highlight the trade-offs of multi-modal perception using RGB and event-based cameras in robotic applications.
- Towards Low-Latency Event-based Obstacle Avoidance on a FPGA-DroneItem type: Conference PaperBonazzi, Pietro; Christian Vogt; Michael Jost; et al. (2025)This work quantitatively evaluates the performance of event-based vision systems (EVS) against conventional RGB-based models for action prediction in collision avoidance on an FPGA accelerator. Our experiments demonstrate that the EVS model achieves a significantly higher effective frame rate (1 kHz) and lower temporal (-20 ms) and spatial prediction errors (-20 mm) compared to the RGB-based model, particularly when tested on out-of-distribution data. The EVS model also exhibits superior robustness in selecting optimal evasion maneuvers. In particular, in distinguishing between movement and stationary states, it achieves a 59 percentage point advantage in precision (78% vs. 19%) and a substantially higher F1 score (0.73 vs. 0.06), highlighting the susceptibility of the RGB model to overfitting. Further analysis in different combinations of spatial classes confirms the consistent performance of the EVS model in both test data sets. Finally, we evaluated the system end-to-end and achieved a latency of approximately 2.14 ms, with event aggregation (1 ms) and inference on the processing unit (0.94 ms) accounting for the largest components. These results underscore the advantages of event-based vision for real-time collision avoidance and demonstrate its potential for deployment in resource-constrained environments.
- Evaluating Electric Charge Variation Sensors for Camera-free Eye Tracking on Smart GlassesItem type: Conference PaperAlan Magdaleno; Bonazzi, Pietro; Tommaso Polonelli; et al. (2025)Contactless Electrooculography (EOC) using electric charge variation (QVar) sensing has recently emerged as a promising eye-tracking technique for wearable devices. QVar enables low-power and unobtrusive interaction without requiring skin-contact electrodes. Previous work demonstrated that such systems can accurately classify eye movements using onboard TinyML under controlled laboratory conditions. However, the performance and robustness of contactless EOC in real-world scenarios, where environmental noise and user variability are significant, remain largely unexplored. In this paper, we present a field evaluation of a previously proposed QVar-based eye-tracking system, assessing its limitations in everyday usage contexts across 29 users and 100 recordings in everyday scenarios such as working in front of a laptop. Our results show that classification accuracy varies between 57% and 89% depending on the user, with an average of 74.5%, and degrades significantly in the presence of nearby electronic noise sources. These results show that contactless EOC remains viable under realistic conditions, though subject variability and environmental factors can significantly affect classification accuracy. The findings inform the future development of wearable gaze interfaces for human-computer interaction and augmented reality, supporting the transition of this technology from prototype to practice.
- PicoSAM2: Low-Latency Segmentation In-Sensor for Edge Vision ApplicationsItem type: Conference PaperBonazzi, Pietro; Nicola Farronato; Stefan Zihlmann; et al. (2025)Real-time, on-device segmentation is critical for latency-sensitive and privacy-aware applications like smart glasses and IoT devices. We introduce PicoSAM2, a lightweight (1.3M parameters, 336M MACs) promptable segmentation model optimized for edge and in-sensor execution, including the Sony IMX500. It builds on a depthwise separable U-Net, with knowledge distillation and fixed-point prompt encoding to learn from the Segment Anything Model 2 (SAM2). On COCO and LVIS, it achieves 51.9% and 44.9% mIoU, respectively. The quantized model (1.22MB) runs at 14.3 ms on the IMX500-achieving 86 MACs/cycle, making it the only model meeting both memory and compute constraints for in-sensor deployment. Distillation boosts LVIS performance by +3.5% mIoU and +5.1% mAP. These results demonstrate that efficient, promptable segmentation is feasible directly on-camera, enabling privacy-preserving vision without cloud or host processing.
- On-device Learning of EEGNet-based Network For Wearable Motor Imagery Brain-Computer InterfaceItem type: Conference Paper
ISWC '24: Proceedings of the 2024 ACM International Symposium on Wearable ComputersBian, Sizhen; Kang, Pixi; Moosmann, Julian; et al. (2024)Electroencephalogram (EEG)-based Brain-Computer Interfaces (BCIs) have garnered significant interest across various domains, including rehabilitation and robotics. Despite advancements in neural network-based EEG decoding, maintaining performance across diverse user populations remains challenging due to feature distribution drift. This paper presents an effective approach to address this challenge by implementing a lightweight and efficient on-device learning engine for wearable motor imagery recognition. The proposed approach, applied to the well-established EEGNet architecture, enables real-time and accurate adaptation to EEG signals from unregistered users. Leveraging the newly released low-power parallel RISC-V-based processor, GAP9 from Greeenwaves, and the Physionet EEG Motor Imagery dataset, we demonstrate a remarkable accuracy gain of up to 7.31% with respect to the baseline with a memory footprint of 15.6 KByte. Furthermore, by optimizing the input stream, we achieve enhanced real-time performance without compromising inference accuracy. Our tailored approach exhibits inference time of 14.9 ms and 0.76 mJ per single inference and 20 us and 0.83 uJ per single update during online training. These findings highlight the feasibility of our method for edge EEG devices as well as other battery-powered wearable AI systems suffering from subject-dependant feature distribution drift. - AI-based Multi-Wavelength PPG Device for Blood Pressure MonitoringItem type: Conference Paper
2024 IEEE International Symposium on Medical Measurements and Applications (MeMeA)Botrugno, Chiara; Dheman, Kanika; Bonazzi, Pietro; et al. (2024)Non-invasive vital signs monitoring, particularly blood pressure, plays a pivotal role in assessing overall health and detecting early signs of medical conditions. Photoplethysmography (PPG) is a non-invasive technology that is increasingly used in vital signs monitoring, carrying the advantage of continuously and accurately measuring a cardiac signal, relying on the sensitivity to blood volume changes in blood vessels, which allows the real-time recording of each heartbeat and the assessment of the presence of any abnormalities. This study explores the efficacy of machine learning models for arterial pressure prediction using the only photoplethysmography (PPG) signal, relying on a new dataset and multi-wavelength sensor technology, which provides signals derived from four different wavelengths (infrared, red, green and blue). The recruited population is made up of 88 people, performing a measurement with a duration of about 30 seconds (sampled at 100 Hz). After accurate signal preprocessing, three approaches are evaluated: a multi-layer perceptron (MLP) leveraging 84 features per subject, a dimensionality reduction strategy using Principal Component Analysis (PCA), and a Convolutional Neural Network (CNN) architecture. The outstanding performances of CNN-based method were evaluated in terms of Mean Absolute Error (MAE) and Standard Deviation (SD), resulted in 5.52 +/- 7.62 mmHg for SBP and 4.67 +/- 6.63 for DBP, meeting the requirements imposed by Association for the Advancement of Medical Instrumentation (AAMI) and British Hypertension Society (BHS). The combined CNN and L2-norm approach demonstrated potential as a reliable tool for noninvasive arterial pressure prediction, offering valuable insights for cardiovascular health monitoring and management. - Retina: Low-Power Eye Tracking with Event Camera and Spiking HardwareItem type: Conference Paper
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)Bonazzi, Pietro; Bian, Sizhen; Lippolis, Giovanni; et al. (2024)This paper introduces a neuromorphic dataset and methodology for eye tracking, harnessing event data captured streamed continuously by a Dynamic Vision Sensor (DVS). The framework integrates a directly trained Spiking Neuron Network (SNN) regression model and leverages a state-of-the-art low power edge neuromorphic processor - Speck. First, it introduces a representative event-based eye-tracking dataset, "Ini-30," which was collected with two glass-mounted DVS cameras from thirty volunteers. Then, a SNN model, based on Integrate And Fire (IAF) neurons, named "Retina", is described , featuring only 64k parameters (6.63x fewer than 3ET) and achieving pupil tracking error of only 3.24 pixels in a 64x64 DVS input. The continuous regression output is obtained by means of temporal convolution using a non-spiking 1D filter slided across the output spiking layer over time. Retina is evaluated on the neuromorphic processor, showing an end-to-end power between 2.89-4.8 mW and a latency of 5.57-8.01 ms dependent on the time to slice the event-based video recording. The model is more precise than the latest event-based eye-tracking method, "3ET", on Ini-30, and shows comparable performance with significant model compression (35 times fewer MAC operations) in the synthetic dataset used in "3ET". We hope this work will open avenues for further investigation of close-loop neuromorphic solutions and true event-based training pursuing edge performance. - Q-Segment: Segmenting Images In-Sensor for Vessel-Based Medical DiagnosisItem type: Conference Paper
2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS)Bonazzi, Pietro; Li, Yawei; Bian, Sizhen; et al. (2024)This paper addresses the growing interest in deploying deep learning models directly in-sensor. We present "Q-Segment", a quantized real-time segmentation algorithm, and conduct a comprehensive evaluation on a low-power edge vision platform with an in-sensors processor, the Sony IMX500. One of the main goals of the model is to achieve end-to-end image segmentation for vessel-based medical diagnosis. Deployed on the IMX500 platform, Q-Segment achieves ultra-low inference time in-sensor only 0.23 ms and power consumption of only 72mW. We compare the proposed network with state-of-the-art models, both float and quantized, demonstrating that the proposed solution outperforms existing networks on various platforms in computing efficiency, e.g., by a factor of 75x compared to ERFNet. The network employs an encoder-decoder structure with skip connections, and results in a binary accuracy of 97.25 % and an Area Under the Receiver Operating Characteristic Curve (AUC) of 96.97 % on the CHASE dataset. We also present a comparison of the IMX500 processing core with the Sony Spresense, a low-power multi-core ARM Cortex-M microcontroller, and a single-core ARM Cortex-M4 showing that it can achieve in-sensor processing with end-to-end low latency (17 ms) and power consumption (254mW). This research contributes valuable insights into edge-based image segmentation, laying the foundation for efficient algorithms tailored to low-power environments. - TinyTracker: Ultra-Fast and Ultra-Low-Power Edge Vision In-Sensor for Gaze EstimationItem type: Conference Paper
2023 IEEE SENSORSBonazzi, Pietro; Rüegg, Thomas; Bian, Sizhen; et al. (2023)Intelligent edge vision tasks encounter the critical challenge of ensuring power and latency efficiency due to the typically heavy computational load they impose on edge platforms. This work leverages one of the first “Artificial Intelligence (AI) in sensor” vision platforms, IMX500 by Sony, to achieve ultra- fast and ultra-low-power end-to-end edge vision applications. We evaluate the IMX500 and compare it to other edge platforms, such as the Google Coral Dev Micro and Sony Spresense, by exploring gaze estimation as a case study. We propose TinyTracker, a highly efficient, fully quantized model for 2D gaze estimation designed to maximize the performance of the edge vision systems considered in this study. TinyTracker achieves a 41x size reduction (~ 600Kb) compared to iTracker [1] without significant loss in gaze estimation accuracy (maximum of 0.16 cm when fully quantized). TinyTracker's deployment on the Sony IMX500 vision sensor results in end-to-end latency of around 19ms. The camera takes around 17.9ms to read, process and transmit the pixels to the accelerator. The inference time of the network is 0.86ms with an additional 0.24 ms for retrieving the results from the sensor. The overall energy consumption of the end-to-end system is 4.9 mJ, including 0.06 mJ for inference. The end-to-end study shows that IMX500 is 1.7x faster than Coral Micro (19ms vs 34.4ms) and 7x more power efficient (4.9mJ VS 34.2mJ).
Publications 1 - 9 of 9