Pirmin Vogel


Loading...

Last Name

Vogel

First Name

Pirmin

Organisational unit

Search Results

Publications1 - 5 of 5
  • Kurth, Andreas; Capotondi, Alessandro; Vogel, Pirmin; et al. (2018)
    Heterogeneous systems on chip (HeSoCs) co-integrate a high-performance multicore host processor with programmable manycore accelerators (PMCAs) to combine “standard platform” software support (e.g. the Linux OS) with energy-efficient, domain-specific, highly parallel processing capabilities. In this work, we present HERO, a HeSoC platform that tackles this challenge in a novel way HERO’s host processor is an industry-standard ARM Cortex-A multicore complex, while its PMCA is a scalable, silicon-proven, open-source many-core processing engine, based on the extensible, open RISC-V ISA. We evaluate a prototype implementation of HERO, where the PMCA implemented on an FPGA fabric is coupled with a hard ARM Cortex-A host processor, and show that the run time overhead compared to manually written PMCA code operating on private physical memory is lower than 10 % for pivotal benchmarks and operating conditions. Thus, HERO demonstrates that ARM and RISC-V can productively coexist in a dual-ISA HW-SW platform.
  • Hager, Pascal Alexander; Vogel, Pirmin; Bartolini, Andrea; et al. (2014)
    2014 IEEE Biomedical Circuits and Systems Conference (BioCAS) Proceedings
    High-frame-rate and high-resolution 3D medical ultrasound imaging imposes high requirements on the involved processing hardware. Several thousands of analog signals need to be processed in many steps to obtain a final image. Fully digital beamforming makes it possible to achieve high image quality coupled with extreme flexibility. Unfortunately, digital beamforming imposes staggering requirements on main memory bandwidth caused by the loading of off-chip stored beamforming delays. In this paper we present the first fully-digital integrated beamformer that is able to compute 269.3M focal points (FP) per second from 10 000 receive channels, and which does not require off-chip main memory. This is enabled by our novel delay approximation circuit that exploits temporal correlation between subsequent computations and thereby allows to compute the delays for beamforming online. To estimate the area and power requirements, the complete system was designed and the beamformer core was evaluated for a 130 nm CMOS technology. The estimated complexity per channel is 37.2 kGE and the corresponding power dissipation was estimated with 48 mW.
  • Vogel, Pirmin; Kurth, Andreas; Weinbuch, Johannes; et al. (2017)
    ACM Transactions on Embedded Computing Systems
    Shared virtual memory is key in heterogeneous systems on chip (SoCs) that combine a general-purpose host processor with a many-core accelerator, both for programmability and performance. In contrast to the full-blown, hardware-only solutions predominant in modern high-end systems, lightweight hardware-software co-designs are better suited in the context of more power- and area-constrained embedded systems and provide additional benefits in terms of flexibility and predictability. As a downside, the latter solutions require the host to handle in software synchronization in case of page misses as well as miss handling. This may incur considerable run-time overheads. In this work, we present a novel hardware-software virtual memory management approach for many-core accelerators in heterogeneous embedded SoCs. It exploits an accelerator-side helper thread concept that enables the accelerator to manage its virtual memory hardware autonomously while operating cache-coherently on the page tables of the user-space processes of the host. This greatly reduces overhead with respect to host-side solutions while retaining flexibility. We have validated the design with a set of parameterizable benchmarks and real-world applications covering various application domains. For purely memory-bound kernels, the accelerator performance improves by a factor of 3.8 compared with host-based management and lies within 50% of a lower-bound ideal memory management unit.
  • Rogenmoser, Michael; Wistoff, Nils; Vogel, Pirmin; et al. (2022)
    2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
    With the shrinking of technology nodes and the use of parallel processor clusters in hostile and critical environments, such as space, run-time faults caused by radiation are a serious cross-cutting concern, also impacting architectural design. This paper introduces an architectural approach to run-time configurable soft-error tolerance at the core level, augmenting a six-core open-source RISC-V cluster with a novel On-Demand Redundancy Grouping (ODRG) scheme. ODRG allows the cluster to operate either as two fault-tolerant cores, or six individual cores for high-performance, with limited overhead to switch between these modes during run-time. The ODRG unit adds less than 11% of a core's area for a three-core group, or a total of 1% of the cluster area, and shows negligible timing increase, which compares favorably to a commercial state-of-the-art implementation, and is 2.5× faster in fault recovery re-synchronization. Furthermore, when redundancy is not necessary, the ODRG approach allows the redundant cores to be used for independent computation, allowing up to 2.96× increase in performance for selected applications.
  • Hollberg, Alexander; Vogel, Pirmin; Habert, Guillaume (2018)
    Life-Cycle of Civil Engineering Systems ~ Life Cycle Analysis and Assessment in Civil Engineering: Towards an Integrated Vision
Publications1 - 5 of 5