Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing
Metadata only
Date
2021Type
- Journal Article
Abstract
Data-parallel problems demand ever growing floating-point (FP) operations per second under tight area-and energy-efficiency constraints. In this work we present Manticore, a general-purpose, ultra-efficient chiplet-based architecture for data-parallel FP workloads. We have manufactured a prototype of the chiplet's computational core in Globalfoundries 22FDX process and demonstrate more than 5x improvement in energy efficiency on FP intensive workloads compared to CPUs and GPUs. The compute capability at high energy and area efficiency is provided by Snitch clusters [1] containing eight small integer cores, each controlling a large floating-point unit (FPU). The core supports two custom ISA extensions: The Stream Semantic Register (SSR) extension elides explicit load and store instructions by encoding them as register reads and writes [2]. The Floating-point Repetition (FREP) extension decouples the integer core from the FPU allowing floating-point instructions to be issued independently. These two extensions allow the single-issue core to minimize its instruction fetch bandwidth and saturate the instruction bandwidth of the FPU, achieving FPU utilization above 90%, with more than 40% of core area dedicated to the FPU. © 2020 IEEE. Show more
Publication status
publishedExternal links
Journal / series
IEEE MicroVolume
Pages / Article No.
Publisher
IEEEOrganisational unit
03996 - Benini, Luca / Benini, Luca
Funding
732631 - Open Transprecision Computing (EC)
More
Show all metadata