Manticore: A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing


METADATA ONLY
Loading...

Date

2021

Publication Type

Journal Article

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Data-parallel problems demand ever growing floating-point (FP) operations per second under tight area-and energy-efficiency constraints. In this work we present Manticore, a general-purpose, ultra-efficient chiplet-based architecture for data-parallel FP workloads. We have manufactured a prototype of the chiplet's computational core in Globalfoundries 22FDX process and demonstrate more than 5x improvement in energy efficiency on FP intensive workloads compared to CPUs and GPUs. The compute capability at high energy and area efficiency is provided by Snitch clusters [1] containing eight small integer cores, each controlling a large floating-point unit (FPU). The core supports two custom ISA extensions: The Stream Semantic Register (SSR) extension elides explicit load and store instructions by encoding them as register reads and writes [2]. The Floating-point Repetition (FREP) extension decouples the integer core from the FPU allowing floating-point instructions to be issued independently. These two extensions allow the single-issue core to minimize its instruction fetch bandwidth and saturate the instruction bandwidth of the FPU, achieving FPU utilization above 90%, with more than 40% of core area dedicated to the FPU. © 2020 IEEE.

Permanent link

Publication status

published

Editor

Book title

Journal / series

Volume

41 (2)

Pages / Article No.

36 - 42

Publisher

IEEE

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03996 - Benini, Luca / Benini, Luca check_circle

Notes

Funding

732631 - Open Transprecision Computing (EC)

Related publications and datasets