This is not the latest version of this item. The latest version can be found here.
A Dynamic Allocation Scheme for Adaptive Shared-Memory Mapping on Kilo-Core RV Clusters for Attention-Based Model Deployment
METADATA ONLY
Loading...
Author / Producer
Date
2025
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Scopus:
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
Attention-based models demand flexible hardware to manage diverse kernels with varying arithmetic intensities and memory access patterns. Large clusters with shared L1 memory, a common architectural pattern, struggle to fully utilize their processing elements (PEs) when scaled up due to reduced throughput in the hierarchical PE-to-L1 intra-cluster interconnect. This paper presents Dynamic Allocation Scheme (DAS), a runtime programmable address remapping hardware unit coupled with a unified memory allocator, designed to minimize data access contention of PEs onto the multi-banked L1. We evaluated DAS on an aggressively scaled-up 1024-PE RISC-V cluster with Non-Uniform Memory Access (NUMA) PE-to-L1 interconnect to demonstrate its potential for improving data locality in large parallel machine learning workloads. For a Vision Transformer (ViT)-L/16 model, each encoder layer executes in 5.67 ms, achieving a 1.94× speedup over the fixed word-level interleaved baseline with 0.81 PE utilization. Implemented in 12nm FinFET technology, DAS incurs <0.1% area overhead.
Permanent link
Publication status
published
External links
Editor
Book title
2025 IEEE 36th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
Journal / series
Volume
Pages / Article No.
9 - 16
Publisher
IEEE
Event
36th International Conference on Application-specific Systems, Architectures and Processors (ASAP 2025)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
RISC-V; Manycore; Transformers
Organisational unit
03996 - Benini, Luca / Benini, Luca