FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters
OPEN ACCESS
Loading...
Author / Producer
Date
2021-08
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
We present FleetRec, a high-performance and scalable recommendation inference system within tight latency constraints. FleetRec takes advantage of heterogeneous hardware including GPUs and the latest FPGAs equipped with high-bandwidth memory. By disaggregating computation and memory to different types of hardware and bridging their connections by high-speed network, FleetRec gains the best of both worlds, and can naturally scale out by adding nodes to the cluster. Experiments on three production models up to 114 GB show that FleetRec outperforms optimized CPU baseline by more than one order of magnitude in terms of throughput while achieving significantly lower latency.
Permanent link
Publication status
published
External links
Editor
Book title
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21)
Journal / series
Volume
Pages / Article No.
3097 - 3105
Publisher
Association for Computing Machinery
Event
27th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Recommendation system; Hardware acceleration; FPGA; GPU
Organisational unit
03506 - Alonso, Gustavo / Alonso, Gustavo
09588 - Zhang, Ce (ehemalig) / Zhang, Ce (former)