AXI-Pack: Near-Memory Bus Packing for Bandwidth-Efficient Irregular Workloads


METADATA ONLY
Loading...

Date

2023

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

Data-intensive applications involving irregular memory streams are inefficiently handled by modern processors and memory systems highly optimized for regular, contiguous data. Recent work tackles these inefficiencies in hardware through core-side stream extensions or memory-side prefetchers and accelerators, but fails to provide end-to-end solutions which also achieve high efficiency in on-chip interconnects. We propose AXI-Pack, an extension to ARM's AXI4 protocol introducing bandwidth-efficient strided and indirect bursts to enable end-to-end irregular streams. AXI-Pack adds irregular stream semantics to memory requests and avoids inefficient narrow-bus transfers by packing multiple narrow data elements onto a wide bus. It retains full compatibility with AXI4 and does not require modifications to non-burst-reshaping interconnect IPs. To demonstrate our approach end-to-end, we extend an open-source RISC-V vector processor to leverage AXI-Pack at its memory interface for strided and indexed accesses. On the memory side, we design a banked memory controller efficiently handling AXI-Pack requests. On a system with a 256-bit-wide interconnect running FP32 workloads, AXI-Pack achieves near-ideal peak on-chip bus utilizations of 87% and 39%, speedups of 5.4x and 2.4x, and energy efficiency improvements of 5.3x and 2.1x over a baseline using an AXI4 bus on strided and indirect benchmarks, respectively.

Publication status

published

Editor

Book title

2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Journal / series

Volume

Pages / Article No.

10137243

Publisher

IEEE

Event

26th Design, Automation and Test in Europe Conference and Exhibition (DATE 2023)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Computer architecture; On-chip interconnects; Memory systems; Irregular workloads

Organisational unit

03996 - Benini, Luca / Benini, Luca check_circle

Notes

Funding

101034126 - Pilot using Independent Local & Open Technologies (EC)
101036168 - European Processor Initiative (EPI) SGA2 (EC)

Related publications and datasets