
Open access
Date
2021-08Type
- Conference Paper
Citations
Cited 4 times in
Web of Science
Cited 13 times in
Scopus
ETH Bibliography
yes
Altmetrics
Abstract
We present FleetRec, a high-performance and scalable recommendation inference system within tight latency constraints. FleetRec takes advantage of heterogeneous hardware including GPUs and the latest FPGAs equipped with high-bandwidth memory. By disaggregating computation and memory to different types of hardware and bridging their connections by high-speed network, FleetRec gains the best of both worlds, and can naturally scale out by adding nodes to the cluster. Experiments on three production models up to 114 GB show that FleetRec outperforms optimized CPU baseline by more than one order of magnitude in terms of throughput while achieving significantly lower latency. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000485153Publication status
publishedExternal links
Book title
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD '21)Pages / Article No.
Publisher
Association for Computing MachineryEvent
Subject
Recommendation system; Hardware acceleration; FPGA; GPUOrganisational unit
03506 - Alonso, Gustavo / Alonso, Gustavo
09588 - Zhang, Ce (ehemalig) / Zhang, Ce (former)
More
Show all metadata
Citations
Cited 4 times in
Web of Science
Cited 13 times in
Scopus
ETH Bibliography
yes
Altmetrics