Alessandro Ottaviano
Loading...
Last Name
Ottaviano
First Name
Alessandro
ORCID
Organisational unit
03996 - Benini, Luca / Benini, Luca
19 results
Filters
Reset filtersSearch Results
Publications1 - 10 of 19
- LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems through Polling-Free and Retry-Free OperationItem type: Conference Paper
2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)Riedel, Samuel; Gantenbein, Marc; Ottaviano, Alessandro; et al. (2024)Extensive polling in shared-memory manycore systems can lead to contention, decreased throughput, and poor energy efficiency. Both lock implementations and the general-purpose atomic operation, load-reserved/store-conditional (LRSC), cause polling due to serialization and retries. To alleviate this overhead, we propose LRwait and SCwait, a synchronization pair that eliminates polling by allowing contending cores to sleep while waiting for previous cores to finish their atomic access. As a scalable implementation of LRwait, we present Colibri, a distributed and scalable approach to managing LRwait reservations. Through extensive benchmarking on an open-source RISC-V platform with 256 cores, we demonstrate that Colibri outperforms current synchronization approaches for various concurrent algorithms with high and low contention regarding throughput, fairness, and energy efficiency. With an area overhead of only 6%, Colibri outperforms LRSC-based implementations by a factor of 6.5x in terms of throughput and 7.1x in terms of energy efficiency. - PELS: A Lightweight and Flexible Peripheral Event Linking System for Ultra-Low Power IoT ProcessorsItem type: Conference Paper
2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)Ottaviano, Alessandro; Balas, Robert; Sauter, Philippe; et al. (2024)A key challenge for ultra-low-power (ULP) devices is handling peripheral linking, where the main central processing unit (CPU) periodically mediates the interaction among multiple peripherals following wake-up events. Current solutions address this problem by either integrating event interconnects that route single-wire event lines among peripherals or by general-purpose I/O processors, with a strong trade-off between the latency, efficiency of the former, and the flexibility of the latter. In this paper, we present an open-source, peripheral-agnostic, lightweight, and flexible Peripheral Event Linking System (PELS) that combines dedicated event routing with a tiny I/O processor. With the proposed approach, the power consumption of a linking event is reduced by 2.5 times compared to a baseline relying on the main core for the event-linking process, at a low area of just 7 kGE in its minimal configuration, when integrated into a ULP RISC-V IoT processor. - vCLIC: Towards Fast Interrupt Handling in Virtualized RISC-V Mixed-criticality SystemsItem type: Conference Paper
2024 IEEE 42nd International Conference on Computer Design (ICCD)Zelioli, Enrico; Ottaviano, Alessandro; Balas, Robert; et al. (2024)The widespread diffusion of compute-intensive edge-AI workloads and the stringent demands of modern autonomous systems require advanced heterogeneous embedded architectures. Such architectures must support high-performance and reliable execution of parallel tasks with different levels of criticality. Hardware-assisted virtualization is crucial for isolating applications concurrently executing these tasks under real-time constraints, but interrupt virtualization poses challenges in ensuring transparency to virtual guests while maintaining real-time system features, such as interrupt vectoring, nesting, and tail-chaining. Despite its rapid advancement to address virtualization needs for mixed-criticality systems, the RISC-V ecosystem still lacks interrupt controllers with integrated virtualization and real-time features, currently relying on non-deterministic, bus-mediated message-signaled interrupts (MSIs) for virtualization. To overcome this limitation, we present the design, implementation, and in-system assessment of vCLIC, a virtualization extension to the RISC-V CLIC fast interrupt controller. Our approach achieves 20 x interrupt latency speedup over the software emulation required for handling non-virtualization-aware systems, reduces response latency by 15% compared to existing MSI- based approaches, and is free from interference from the system bus, at an area cost of just 8kGE when synthesized in an advanced 16nm FinFet technology. - A Gigabit, DMA-enhanced Open-Source Ethernet Controller for Mixed-Criticality SystemsItem type: Conference Paper
CF '24 Companion: Proceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special SessionsLiang, Chaoqun; Ottaviano, Alessandro; Benz, Thomas; et al. (2024)The ongoing revolution in application domains targeting autonomous navigation, first and foremost automotive "zonalization", has increased the importance of certain off-chip communication interfaces, particularly Ethernet. The latter will play an essential role in next-generation vehicle architectures as the backbone connecting simultaneously and instantaneously the zonal/domain controllers. There is thereby an incumbent need to introduce a performant Ethernet controller in the open-source HW community, to be used as a proxy for architectural explorations and prototyping of mixed-criticality systems (MCSs). Driven by this trend, in this work, we propose a fully open-source, DMA-enhanced, technology-agnostic Gigabit Ethernet architecture that overcomes the limitations of existing open-source architectures, such as Lowrisc's Ethernet, often tied to FPGA implementation, performance-bound by sub-optimal design choices such as large memory buffers, and in general not mature enough to bridge the gap between academia and industry. Besides the area advantage, the proposed design increases packet transmission speed up to almost 3x compared to Lowrisc's and is validated through implementation and FPGA prototyping into two open-source, heterogeneous MCSs. - ISA Support for Hardware Resource Partitioning in RISC-VItem type: Other Conference ItemWistoff, Nils; Balas, Robert; Ottaviano, Alessandro; et al. (2024)In modern computing environments, applications concurrently executing on the same system often compete for shared hardware resources, such as caches and buffers. The ensuing contention can lead to timing interferences, posing significant threats such as deadline misses in real-time systems and the creation of timing channels in secure systems. This work proposes an ISA extension based on the RISC-V Capacity and Bandwith Controller QoS Register Interface (CBQRI). Our proposal enables dynamic, comprehensive temporal and spatial partitioning of shared hardware resources, ensuring the isolated execution times of concurrent applications.
- Towards a RISC-V Open Platform for Next-generation Automotive ECUsItem type: Conference Paper
2023 12th Mediterranean Conference on Embedded Computing (MECO)Cuomo, Luca; Scordino, Claudio; Ottaviano, Alessandro; et al. (2023)The complexity of automotive systems is increasing quickly due to the integration of novel functionalities such as assisted or autonomous driving. However, increasing complexity poses considerable challenges to the automotive supply chain since the continuous addition of new hardware and network cabling is not considered tenable. The availability of modern heterogeneous multi-processor chips represents a unique opportunity to reduce vehicle costs by integrating multiple functionalities into fewer Electronic Control Units (ECUs). In addition, the recent improvements in open-hardware technology allow to further reduce costs by avoiding lock-in solutions. This paper presents a mixed-criticality multi-OS architecture for automotive ECUs based on open hardware and open-source technologies. Safety-critical functionalities are executed by an AUTOSAR OS running on a RISC-V processor, while the Linux OS executes more advanced functionalities on a multi-core ARM CPU. Besides presenting the implemented stack and the communication infrastructure, this paper provides a quantitative gap analysis between an HW/SW optimized version of the RISCV processor and a COTS Arm Cortex-R in terms of real-time features, confirming that RISC-V is a valuable candidate for running AUTOSAR Classic stacks of next-generation automotive MCUs. - ControlPULPlet: A Flexible Real-time Multicore RISC-V Controller for 2.5-D Systems-in-PackageItem type: Journal Article
IEEE Transactions on Very Large Scale Integration (VLSI) SystemsOttaviano, Alessandro; Balas, Robert; Fischer, Tim; et al. (2025)The growing complexity of real-time (RT) control algorithms with increasing performance demands along with the shift to 2.5-D technology drive the need for scalable controllers to manage chiplets’ coupled operation in 2.5-D systems-in-package (SiPs). These controllers must offer RT computing capabilities, as well as SiP-compatible IO interfaces for communicating with the controlled dies. Due to RT constraints, a key challenge is minimizing the performance penalty of die-to-die (D2D) communication with respect to native on-chip control interfaces. We address this challenge with ControlPULPlet, an open-source, RT multicore RISC-V controller designed specifically for SiP integration. ControlPULPlet features a 32-bit CV32RT core for fast interrupt handling and a specialized direct memory access engine to automate periodic sensor readout. A tightly coupled programmable multicore cluster for acceleration of advanced control algorithms is integrated through a dedicated AXI4 port. A flexible AXI4-compatible D2D link enables efficient communication in 2.5-D SiPs. We implemented and fabricated ControlPULPlet as a silicon demonstrator called Kairos in TSMC’s 65-nm CMOS. Kairos runs model predictive control algorithms at up to 290 MHz in a 30 mW power envelope. The D2D link attains a peak duplex transfer rate of 51 Gbit/s at 200 MHz, at the minimal costs of just 7.6 kGE in PHY area per channel, adding just 2.9% to the total system area. - Towards Reliable Systems: A Scalable Approach to AXI4 Transaction MonitoringItem type: Conference PaperLiang, Chaoqun; Benz, Thomas; Ottaviano, Alessandro; et al. (2025)In safety-critical SoC applications such as automotive and aerospace, reliable transaction monitoring is crucial for maintaining system integrity. This paper introduces a drop-in Transaction Monitoring Unit (TMU) for AXI4 subordinate endpoints that detects transaction failures including protocol violations or timeouts and triggers recovery by resetting the affected subordinates. Two TMU variants address different constraints: a Tiny-Counter solution for tightly area-constrained systems and a Full-Counter solution for critical subordinates in mixed-criticality SoCs. The Tiny-Counter employs a single counter per outstanding transaction, while the Full-Counter uses multiple counters to track distinct transaction stages, offering finer-grained monitoring and reducing detection latencies by up to hundreds of cycles at roughly 2.5x the area cost. The Full-Counter also provides detailed error logs for performance and bottleneck analysis. Evaluations at both IP and system levels confirm the TMU's effectiveness and low overhead. In GF12 technology, monitoring 16-32 outstanding transactions occupies 1330-2616 um2 for the Tiny-Counter and 3452-6787 um2 for the Full-Counter; moderate prescaler steps reduce these figures by 18-39% and 19-32%, respectively, with no loss of functionality. Results from a full-system integration demonstrate the TMU's robust and precise monitoring capabilities in safety-critical SoC environments.
- CVA6-VMRT: A Modular Approach Towards Time-Predictable Virtual Memory in a 64-bit Application Class RISC-V ProcessorItem type: Conference Paper
CF '25: Proceedings of the 22nd ACM International Conference on Computing FrontiersReinwardt, Christopher; Balas, Robert; Ottaviano, Alessandro; et al. (2025)The increasing complexity of autonomous systems has driven a shift to integrated heterogeneous SoCs with real-time and safety demands. Ensuring deterministic WCETs and low-latency for critical tasks requires minimizing interference on shared resources like virtual memory. Existing techniques, such as software coloring and memory replication, introduce significant area and performance overhead, especially with virtualized memory where address translation adds latency uncertainty. To address these limitations, we propose CVA6-VMRT, an extension of the open-source RISC-V CVA6 core, adding hardware support for predictability in virtual memory access with minimal area overhead. CVA6-VMRT features dynamically partitioned Translation Look-aside Buffers (TLBs) and hybrid L1 cache/scratchpad memory (SPM) functionality. It allows fine-grained per-thread control of resources, enabling the operating system to manage TLB replacements, including static overwrites, to ensure single-cycle address translation for critical memory regions. Additionally, CVA6-VMRT enables runtime partitioning of data and instruction caches into cache and SPM sections, providing low and predictable access times for critical data without impacting other accesses. In a virtualized setting, CVA6-VMRT enhances execution time determinism for critical guests by 94% during interference from non-critical guests, with minimal impact on their average absolute execution time compared to isolated execution of the critical guests only. This interference-aware behaviour is achieved with just a 4% area overhead and no timing penalty compared to the baseline CVA6 core. - A Reliable, Time-Predictable Heterogeneous SoC for AI-Enhanced Mixed-Criticality Edge ApplicationsItem type: Journal Article
IEEE Transactions on Circuits and Systems II. Express BriefsGarofalo, Angelo; Ottaviano, Alessandro; Perotti, Matteo; et al. (2025)Next-generation mixed-criticality Systems-on-chip (SoCs) must execute mixed-criticality AI-enhanced sensor processing and control workloads, ensuring reliable and time-predictable execution of critical tasks while fitting within a sub-2W power envelope. To tackle these challenges, we present a 16nm, reliable, time-predictable heterogeneous SoC with multiple programmable accelerators. Within a 1.2W power envelope, the SoC integrates software-configurable hardware IPs to ensure predictable access to shared resources, such as the on-chip interconnect and memory system, leading to tight upper bounds on execution times of critical applications. To accelerate mission-critical AI, the SoC integrates a reliable multi-core accelerator achieving 304.9 GOPS peak performance at 1.6 TOPS/W energy efficiency. Non-critical, compute-intensive, floating-point workloads are accelerated by a vector cluster, achieving 1.1 TFLOPS/W and 106.8 GFLOPS/mm2.
Publications1 - 10 of 19