Michael Rogenmoser


Loading...

Last Name

Rogenmoser

First Name

Michael

Organisational unit

03996 - Benini, Luca / Benini, Luca

Search Results

Publications 1 - 10 of 13
  • Scherer, Moritz; Sidler, Fabian; Rogenmoser, Michael; et al. (2022)
    2022 18th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
    The trend in Internet of Things research points toward performing increasingly compute-intensive data analysis tasks on embedded sensor nodes, rather than server centers. Exploiting the technological advances in both energy efficiency, and Tiny Machine Learning algorithms and methods, an increasing number of recognition and classification tasks can be performed by small, low-power, wireless sensor nodes. This paper presents WideVision, a wireless, wide-area sensing platform capable of performing on-board person detection with power requirements in the mW range. The WideVision platform integrates seamlessly into the Internet of Things, by coupling a dedicated multiradio platform, including a LoRa interface, enabling medium and long-range communication, with a novel parallel RISC-V microcontroller. We evaluate the proposed platform with the GAP8 microcontroller, which includes an 8-core RISC-V cluster, and greyscale camera to perform person detection by training and deploying an advanced, quantized neural network, achieving a statistical accuracy 84.5% for a 5-person detection task with a latency of only 182 ms. Experimental results demonstrate that the WideVision sensor node platform while performing inference at a rate of one image per minute on-board, is capable of lasting 300 days on a 2400 mAh Li-ion battery, and 65 days when evaluating one image per 10 seconds while providing effective surveillance of its perimeter.
  • Ulbricht, Markus; Tortorella, Yvan; Rogenmoser, Michael; et al. (2023)
    2023 IEEE European Test Symposium (ETS)
    Due to their flexibility and openness, the RISC-V ISA and processor architectures have emerged as notable contenders in various application domains. Their advantages over commercial solutions have attracted the interest of academia and industry and even led to their planned adoption in aeronautics and space. However, in these demanding environments, system reliability is of paramount importance. To address this issue, this paper presents an overview of several hardware-centric approaches for developing reliable systems based on the parallel-ultra low power (PULP) open-source RISC-V hardware platform. These approaches range from gate-level optimizations to system-level improvements and highlight the versatility of the PULP architecture and its potential as a viable architecture for developing various aerospace platforms.
  • Rogenmoser, Michael; Benini, Luca (2023)
    2023 30th IEEE International Conference on Electronics, Circuits and Systems (ICECS)
    One of the key challenges when operating micro controllers in harsh environments such as space is radiation induced single event upsets (SEUs), which can lead to errors in computation. Common countermeasures rely on proprietary radiation-hardened technologies, low density technologies, or extensive replication, leading to high costs and low performance and efficiency. To combat this, we present Trikarenos, a fault tolerant 32-bit RISC-V microcontroller SoC in an advanced TSMC 28nm technology. Trikarenos alleviates the replication cost by employing a configurable triple-core lockstep configuration, allowing three Ibex cores to execute applications reliably, operating on ECC-protected memory. If reliability is not needed for a given application, the cores can operate independently in parallel for higher performance and efficiency. Trikarenos consumes 15.7 mW at 250 MHz executing a fault-tolerant matrix-matrix multiplication, a 21.5x efficiency gain over state-of-the-art, and performance is increased by 2.96x when reliability is not needed for processing, with a 2.36x increase in energy efficiency.
  • Rogenmoser, Michael; Wiese, Philip; Endres Forlin, Bruno; et al. (2024)
    We present a fault-tolerant by-design RISC-V SoC and experimentally assess it under atmospheric neutrons and 200 MeV protons. The dedicated ECC and Triple-Core Lockstep countermeasures correct most errors, guaranteeing a device cross section lower than 5.36 × 10−12 cm2.
  • Rogenmoser, Michael; Tortorella, Yvan; Rossi, Davide; et al. (2025)
    ACM Transactions on Cyber-Physical Systems
    Space Cyber-Physical Systems such as spacecraft and satellites strongly rely on the reliability of onboard computers to guarantee the success of their missions. Relying solely on radiation-hardened technologies is extremely expensive, and developing inflexible architectural and microarchitectural modifications to introduce modular redundancy within a system leads to significant area increase and performance degradation. To mitigate the overheads of traditional radiation hardening and modular redundancy approaches, we present a novel Hybrid Modular Redundancy approach, a redundancy scheme that features a cluster of RISC-V processors with a flexible on-demand dual-core and triple-core lockstep grouping of computing cores with runtime split-lock capabilities. Further, we propose two recovery approaches, software-based and hardware-based, trading off performance and area overhead. Running at 430 MHz, our fault-tolerant cluster achieves up to 1,160 MOPS on a matrix multiplication benchmark when configured in non-redundant mode and 617 and 414 MOPS in dual and triple mode, respectively. A software-based recovery in triple mode requires 363 clock cycles and occupies 0.612 mm2, representing a 1.3% area overhead over a non-redundant 12-core RISC-V cluster. As a high-performance alternative, a new hardware-based method provides rapid fault recovery in just 24 clock cycles and occupies 0.660 mm2, namely, ∼9.4% area overhead over the baseline non-redundant RISC-V cluster. The cluster is also enhanced with split-lock capabilities to enter one of the available redundant modes with minimum performance loss, allowing execution of a mission-critical portion of code when in independent mode, or a performance section when in a reliability mode, with <400 clock cycles overhead for entry and exit. The proposed system is the first to integrate these functionalities on an open-source RISC-V-based compute device, enabling finely tunable reliability versus performance trade-offs.
  • Rogenmoser, Michael; Ottaviano, Alessandro; Benz, Thomas; et al. (2024)
    In the last decade, we have witnessed exponential growth in the complexity of control systems for safety-critical applications (automotive, robots, industrial automation) and their transition to heterogeneous mixed-criticality systems (MCSs). The growth of the RISC-V ecosystem is creating a major opportunity to develop open-source, vendor-neutral reference platforms for safety-critical computing. We present SentryCore, a reliable, real-time, self-contained, open-source mega-IP for advanced control functions that can be seamlessly integrated into Systems-on-Chip, e.g., for automotive applications, through industry-standard Advanced eXtensible Interface 4 (AXI4). SentryCore features three embedded RISC-V processor cores in lockstep with error-correcting code (ECC) protected data memory for reliable execution of any safety-critical application. Context switching is accelerated to under 110 clock cycles via a RISC-V core-local interrupt controller (CLIC) and dedicated hardware extensions, while a timer-based direct memory access (DMA) engine streamlines sensor data readout during periodic control loops. SentryCore was implemented in Intel’s 16nm process node and tested with FreeRTOS, ThreadX, and RTIC software support.
  • Jain, Vikram; Cavalcante, Matheus; Bruschi, Nazareno; et al. (2023)
    2023 60th ACM/IEEE Design Automation Conference (DAC)
    Emerging deep neural network (DNN) applications require high-performance multi-core hardware acceleration with large data bursts. Classical network-on-chips (NoCs) use serial packet-based protocols suffering from significant protocol translation overheads towards the endpoints. This paper proposes PATRONoC, an open-source fully AXI-compliant NoC fabric to better address the specific needs of multi-core DNN computing platforms. Evaluation of PATRONoC in a 2D-mesh topology shows 34% higher area efficiency compared to a state-of-the-art classical NoC at 1 GHz. PATRONoC's throughput outperforms a baseline NoC by 2-8x on uniform random traffic and provides a high aggregated throughput of up to 350 GiB/s on synthetic and DNN workload traffic.
  • Benz, Thomas; Rogenmoser, Michael; Scheffler, Paul; et al. (2024)
    IEEE Transactions on Computers
    Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAEs) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous systems, DMAEs must operate efficiently in increasingly diverse environments. This work proposes a modular and highly configurable open-source DMAE architecture called intelligent DMA (iDMA), split into three parts that can be composed and customized independently. The front-end implements the control plane binding to the surrounding system. The mid-end accelerates complex data transfer patterns such as multi-dimensional transfers, scattering, or gathering. The back-end interfaces with the on-chip communication fabric (data plane). We assess the efficiency of iDMA in various instantiations: In high-performance systems, we achieve speedups of up to 15.8x with only 1 % additional area compared to a base system without a DMAE. We achieve an area reduction of 10 % while improving ML inference performance by 23 % in ultra-low-energy edge AI systems over an existing DMAE solution. We provide area, timing, latency, and performance characterization to guide its instantiation in various systems.
  • Rogenmoser, Michael; Wiese, Philip; Endres Forlin, Bruno; et al. (2025)
    IEEE Transactions on Nuclear Science
    RISC-V-based fault-tolerant system-on-chip (SoC) designs are critical for the new generation of automotive and space SoC architectures. However, reliability assessment requires characterization under controlled radiation doses to accurately quantify the fault tolerance of the fabricated designs. This work analyzes the Trikarenos design, a SoC implemented in TSMC 28nm, for single event upset (SEU) vulnerability under atmospheric neutron and 200 MeV proton radiation, comparing these results to simulation-based fault injection. All faults in error correction codes (ECC) protected memory are corrected by a scrubber, showing an estimated cross-section per bit of up to 1.09 × 10−14 cm2 bit−1. Furthermore, the triple-core lockstep (TCLS) mechanism implemented in Trikarenos is validated and is shown to correct errors affecting a cross-section up to 3.23 × 10−11 cm2 , with the remaining uncorrectable vulnerabil ity below 5.36 × 10−12 cm2 . When augmenting the experimental analysis of fabricated chips with gate-level fault injection in simulation, 99.10 % of injections into the SoC produced correct results, while 100 % of injections in the TCLS-protected cores were handled correctly. With 12.28 % of all injected faults leading to a TCLS recovery, this indicates an approximate effective flip flop cross-section of up to 1.28 × 10−14 cm2 /FF.
  • Fischer, Tim; Rogenmoser, Michael; Benz, Thomas; et al. (2025)
    IEEE Transactions on Very Large Scale Integration (VLSI) Systems
    The new generation of domain-specific AI accelerators is characterized by rapidly increasing demands for bulk data transfers, as opposed to small, latency-critical cache line transfers typical of traditional cache-coherent systems. In this article, we address this critical need by introducing the FlooNoC network-on-chip (NoC), featuring very wide, fully advanced extensible interface (AXI4) compliant links designed to meet the massive bandwidth needs at high energy efficiency. At the transport level, nonblocking transactions are supported for latency tolerance. In addition, a novel end-to-end ordering approach for AXI4, enabled by a multistream capable direct memory access (DMA) engine, simplifies network interfaces (NIs) and eliminates interstream dependencies. Furthermore, dedicated physical links are instantiated for short, latency-critical messages. A complete end-to-end reference implementation in 12-nm FinFET technology demonstrates the physical feasibility and power performance area (PPA) benefits of our approach. Using wide links on high levels of metal, we achieve a bandwidth of 645 Gb/s/link and a total aggregate bandwidth of 103 Tb/s for an $8\times 4$ mesh of processors' cluster tiles, with a total of 288 RISC-V cores. The NoC imposes a minimal area overhead of only 3.5% per compute tile and achieves a leading-edge energy efficiency of 0.15at 0.8. Compared with state-of-the-art (SoA) NoCs, our system offers three times the energy efficiency and more than double the link bandwidth. Furthermore, compared with a traditional AXI4-based multilayer interconnect, our NoC achieves a 30% reduction in area, corresponding to a 47% increase in within the same floorplan.
Publications 1 - 10 of 13