AXI-Pack: Near-Memory Bus Packing for Bandwidth-Efficient Irregular Workloads
METADATA ONLY
Loading...
Author / Producer
Date
2023
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
Data-intensive applications involving irregular memory streams are inefficiently handled by modern processors and memory systems highly optimized for regular, contiguous data. Recent work tackles these inefficiencies in hardware through core-side stream extensions or memory-side prefetchers and accelerators, but fails to provide end-to-end solutions which also achieve high efficiency in on-chip interconnects. We propose AXI-Pack, an extension to ARM's AXI4 protocol introducing bandwidth-efficient strided and indirect bursts to enable end-to-end irregular streams. AXI-Pack adds irregular stream semantics to memory requests and avoids inefficient narrow-bus transfers by packing multiple narrow data elements onto a wide bus. It retains full compatibility with AXI4 and does not require modifications to non-burst-reshaping interconnect IPs. To demonstrate our approach end-to-end, we extend an open-source RISC-V vector processor to leverage AXI-Pack at its memory interface for strided and indexed accesses. On the memory side, we design a banked memory controller efficiently handling AXI-Pack requests. On a system with a 256-bit-wide interconnect running FP32 workloads, AXI-Pack achieves near-ideal peak on-chip bus utilizations of 87% and 39%, speedups of 5.4x and 2.4x, and energy efficiency improvements of 5.3x and 2.1x over a baseline using an AXI4 bus on strided and indirect benchmarks, respectively.
Permanent link
Publication status
published
Editor
Book title
2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)
Journal / series
Volume
Pages / Article No.
10137243
Publisher
IEEE
Event
26th Design, Automation and Test in Europe Conference and Exhibition (DATE 2023)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Computer architecture; On-chip interconnects; Memory systems; Irregular workloads
Organisational unit
03996 - Benini, Luca / Benini, Luca
Notes
Funding
101034126 - Pilot using Independent Local & Open Technologies (EC)
101036168 - European Processor Initiative (EPI) SGA2 (EC)
101036168 - European Processor Initiative (EPI) SGA2 (EC)