FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters


Loading...

Date

2021-08

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

We present FleetRec, a high-performance and scalable recommendation inference system within tight latency constraints. FleetRec takes advantage of heterogeneous hardware including GPUs and the latest FPGAs equipped with high-bandwidth memory. By disaggregating computation and memory to different types of hardware and bridging their connections by high-speed network, FleetRec gains the best of both worlds, and can naturally scale out by adding nodes to the cluster. Experiments on three production models up to 114 GB show that FleetRec outperforms optimized CPU baseline by more than one order of magnitude in terms of throughput while achieving significantly lower latency.

Publication status

published

Editor

Book title

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21)

Journal / series

Volume

Pages / Article No.

3097 - 3105

Publisher

Association for Computing Machinery

Event

27th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Recommendation system; Hardware acceleration; FPGA; GPU

Organisational unit

03506 - Alonso, Gustavo / Alonso, Gustavo check_circle
09588 - Zhang, Ce (ehemalig) / Zhang, Ce (former) check_circle

Notes

Funding

Related publications and datasets