To buffer, or not to buffer? A case study on FFT accelerators for ultra-low-power multicore clusters


Loading...

Date

2021

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Hardware-accelerated multicore clusters have recently emerged as a viable approach to deploy advanced digital signal processing (DSP) capabilities in ultra-low-power extreme edge nodes. As a critical basic block for DSP, Fast Fourier Transforms (FFTs) are one of the best candidates for implementation on a dedicated accelerator core; however, their peculiar memory access patterns make direct integration of an FFT accelerator with a core cluster challenging. In this paper, we compare two different approaches for cluster-coupled FFT accelerators: one with a large internal buffer to store and shuffle partial results; and a buffer-less accelerator sharing all memory with the cluster cores. Both versions can work on complex data with 8/16/32-bit real and imaginary parts. We show that, thanks to a newly proposed scheme to reorder data access and exploit full bandwidth also for sub-word FFTs, the buffer-less accelerator can be made as fast as the buffered one at only 0.26× the area cost. We report post-layout performance and power results showing that the buffer-less accelerator can provide up to 4/2/1 butterfly/cycle performance, with an average power consumption of 4.1/5.5/6.8 mW @ 350 MHz, 0.65 V operating point in 22 nm CMOS technology, respectively for complex data with 8/16/32-bit real and imaginary part. The buffer-less accelerator is 8 × faster than an optimized multicore software implementation working on 16-bit data and compares favorably with FFT accelerators presented in the recent literature.

Publication status

published

Editor

Book title

2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Journal / series

Volume

Pages / Article No.

1 - 8

Publisher

IEEE

Event

32nd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2021)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Buffer-less accelerator; Cluster-coupled HW accelerator; Energy-efficient architecture; Fixed-point FFT; Reordering scheme

Organisational unit

03996 - Benini, Luca / Benini, Luca check_circle

Notes

Funding

Related publications and datasets