Stefan Mach


Loading...

Last Name

Mach

First Name

Stefan

Organisational unit

Search Results

Publications 1 - 10 of 21
  • Rossi, Davide; Conti, Francesco; Eggimann, Manuel; et al. (2022)
    IEEE Journal of Solid-State Circuits
    The Internet-of-Things (IoT) requires endnodes with ultra-low-power always-on capability for a long battery lifetime, as well as high performance, energy efficiency, and extreme flexibility to deal with complex and fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT endnode system on chip (SoC) capable of scaling from a 1.7-mu W fully retentive cognitive sleep mode up to 32.2-GOPS (at 49.4 mW) peak performance on NSAAs, including mobile deep neural network (DNN) inference, exploiting 1.6 MB of state-retentive SRAM, and 4 MB of non-volatile magnetoresistive random access memory (MRAM). To meet the performance and flexibility requirements of NSAAs, the SoC features ten RISC-V cores: one core for SoC and IO management and a nine-core cluster supporting multi-precision single instruction multiple data (SIMD) integer and floating-point (FP) computation. Vega achieves the state-of-the-art (SoA)-leading efficiency of 615 GOPS/W on 8-bit INT computation (boosted to 1.3 TOPS/W for 8-bit DNN inference with hardware acceleration). On FP computation, it achieves the SoA-leading efficiency of 79 and 129 GFLOPS/W on 32- and 16-bit FP, respectively. Two programmable machine learning (ML) accelerators boost energy efficiency in cognitive sleep and active states.
  • Mach, Stefan; Schuiki, Fabian; Zaruba, Florian; et al. (2019)
    Proceedings of the 27th IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC 2019)
  • Mach, Stefan (2021)
  • Montagna, Fabio; Mach, Stefan; Benatti, Simone; et al. (2022)
    IEEE Transactions on Parallel and Distributed Systems
    Recent applications in low-power (1-20 mW) near-sensor computing require the adoption of floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this article, we propose a low-power multi-core computing cluster that leverages the fined-grained tunable principles of transprecision computing to provide support to near-sensor applications at a minimum power budget. Our solution - based on the open-source RISC-V architecture - combines parallelization and sub-word vectorization with a dedicated interconnect design capable of sharing floating-point units (FPUs) among the cores. On top of this architecture, we provide a full-fledged software stack support, including a parallel low-level runtime, a compilation toolchain, and a high-level programming model, with the aim to support the development of end-to-end applications. We performed an exhaustive exploration of the design space of the transprecision cluster on a cycle-accurate FPGA emulator, varying the number of cores and FPUs to maximize performance. Orthogonally, we performed a vertical exploration to identify the most efficient solutions in terms of non-functional requirements (operating frequency, power, and area). We conducted an experimental assessment on a set of benchmarks representative of the near-sensor processing domain, complementing the timing results with a post place-&-route analysis of the power consumption. A comparison with the state-of-the-art shows that our solution outperforms the competitors in energy efficiency, reaching a peak of 97 Gflop/s/W on single-precision scalars and 162 Gflop/s/W on half-precision vectors. Finally, a real-life use case demonstrates the effectiveness of our approach in fulfilling accuracy constraints.
  • Mach, Stefan; Rossi, Davide; Tagliavini, Giuseppe; et al. (2018)
    2018 IEEE International Symposium on Circuits and Systems (ISCAS)
  • Eggimann, Manuel; Mach, Stefan; Magno, Michele; et al. (2019)
    2019 IEEE 8th International Workshop on Advances in Sensors and Interfaces (IWASI)
  • Diamantopoulos, Dionysios; Scheidegger, Florian; Mach, Stefan; et al. (2020)
    2020 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)
    The performance improvement rate of conventional von Neumann processors has slowed as Moore's Law grinds to an economic halt, giving rise to a new age of heterogeneity for energy-efficient computing. Extending processors with finely tunable precision instructions have emerged as a form of heterogeneity that tradeoffs computation precision with power consumption. However, the prolonged design time due to customization of the supported framework for a system-on-a-chip may counteract the advantages of transprecision computing. We propose XwattPilot, a system aiming at accelerating the transprecision software development of low-power processors using cloud technology. We show that the total energy-to-solution can be significantly decreased by using transprecision computations, whereas the proposed system can accelerate tie energy-efficiency evaluation runtime by 10.3x.
  • Zaruba, Florian; Schuiki, Fabian; Mach, Stefan; et al. (2019)
    2019 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS)
  • An 826 MOPS, 210 uW/MHz Unum ALU in 65 nm
    Item type: Conference Paper
    Glaser, Florian; Mach, Stefan; Rahimi, Abbas; et al. (2018)
    2018 IEEE International Symposium on Circuits and Systems (ISCAS)
  • Rossi, Davide; Conti, Francesco; Eggiman, Manuel; et al. (2021)
    Digest of Technical Papers / IEEE International Solid State Circuits Conference ~ 2021 IEEE International Solid- State Circuits Conference (ISSCC)
    The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for long battery lifetime, as well as high performance, energy efficiency, and extreme flexibility to deal with complex and fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an always-on IoT end-node SoC capable of scaling from a 1.7 μW fully retentive COGNITIVE sleep mode up to 32.2GOPS (@49.4mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6MB of state- retentive SRAM, and 4MB of non-volatile MRAM. To meet the performance and flexibility requirements of NSAAs, the SoC features 10 RISC-V cores: one core for SoC and IO management and a 9-core cluster supporting multi-precision SIMD integer and floating- point computation. Two programmable machine-learning (ML) accelerators boost energy efficiency in sleep and active state, respectively.
Publications 1 - 10 of 21