
Open access
Date
2021-07Type
- Conference Paper
Abstract
Data analysts often need to characterize a data stream as a first step to its further processing. Some of the initial insights to be gained include, e.g., the cardinality of the data set and its frequency distribution. Such information is typically extracted by using sketch algorithms, now widely employed to process very large data sets in manageable space and in a single pass over the data. Often, analysts need more than one parameter to characterize the stream. However, computing multiple sketches becomes expensive even when using high-end CPUs. Exploiting the increasing adoption of hardware accelerators, this paper proposes SKT, an FPGA-based accelerator that can compute several sketches along with basic statistics (av- erage, max, min, etc.) in a single pass over the data. SKT has been designed to characterize a data set by calculating its cardinality, its second frequency moment, and its frequency distribution. The design processes data streams coming either from PCIe or TCP/IP, and it is built to fit emerging cloud service architectures, such as Microsoft’s Catapult or Amazon’s AQUA. The paper explores the trade-offs of designing sketch algorithms on a spatial architecture and how to combine several sketch algorithms into a single design. The empirical evaluation shows how SKT on an FPGA offers a significant performance gain over high-end, server-class CPUs. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000496334Publication status
publishedExternal links
Journal / series
Proceedings of the VLDB EndowmentVolume
Pages / Article No.
Publisher
Association of Computing MachineryEvent
Subject
FPGA; Data analytics; Data SummarizationOrganisational unit
03506 - Alonso, Gustavo / Alonso, Gustavo
More
Show all metadata