SKT: A One-Pass Multi-Sketch Data Analytics Accelerator
dc.contributor.author
Chiosa, Monica
dc.contributor.author
Preußer, Thomas B.
dc.contributor.author
Alonso, Gustavo
dc.contributor.editor
Papenbrock, Thorsten
dc.contributor.editor
Mühleisen, Hannes
dc.date.accessioned
2021-09-16T06:35:35Z
dc.date.available
2021-09-16T06:35:35Z
dc.date.issued
2021-07
dc.identifier.issn
2150-8097
dc.identifier.other
10.14778/3476249.3476287
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/505690
dc.identifier.doi
10.3929/ethz-b-000496334
dc.description.abstract
Data analysts often need to characterize a data stream as a first step to its further processing. Some of the initial insights to be gained include, e.g., the cardinality of the data set and its frequency distribution. Such information is typically extracted by using sketch algorithms, now widely employed to process very large data sets in manageable space and in a single pass over the data. Often, analysts need more than one parameter to characterize the stream. However, computing multiple sketches becomes expensive even when using high-end CPUs. Exploiting the increasing adoption of hardware accelerators, this paper proposes SKT, an FPGA-based accelerator that can compute several sketches along with basic statistics (av- erage, max, min, etc.) in a single pass over the data. SKT has been designed to characterize a data set by calculating its cardinality, its second frequency moment, and its frequency distribution. The design processes data streams coming either from PCIe or TCP/IP, and it is built to fit emerging cloud service architectures, such as Microsoft’s Catapult or Amazon’s AQUA. The paper explores the trade-offs of designing sketch algorithms on a spatial architecture and how to combine several sketch algorithms into a single design. The empirical evaluation shows how SKT on an FPGA offers a significant performance gain over high-end, server-class CPUs.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Association for Computing Machinery
en_US
dc.rights.uri
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject
FPGA
en_US
dc.subject
Data analytics
en_US
dc.subject
Data Summarization
en_US
dc.title
SKT: A One-Pass Multi-Sketch Data Analytics Accelerator
en_US
dc.type
Conference Paper
dc.rights.license
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
ethz.journal.title
Proceedings of the VLDB Endowment
ethz.journal.volume
14
en_US
ethz.journal.issue
11
en_US
ethz.journal.abbreviated
Proc. VLDB Endow.
ethz.pages.start
2369
en_US
ethz.pages.end
2382
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.event
47th International Conference on Very Large Data Bases (VLDB 2021)
en_US
ethz.event.location
Copenhagen, Denmark
en_US
ethz.event.date
August 16-20, 2021
en_US
ethz.identifier.wos
ethz.publication.place
New York, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02663 - Institut für Computing Platforms / Institute for Computing Platforms::03506 - Alonso, Gustavo / Alonso, Gustavo
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02663 - Institut für Computing Platforms / Institute for Computing Platforms::03506 - Alonso, Gustavo / Alonso, Gustavo
en_US
ethz.date.deposited
2021-07-20T09:04:21Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2021-09-16T06:35:43Z
ethz.rosetta.lastUpdated
2023-02-06T22:34:29Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/496334
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/505473
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=SKT:%20A%20One-Pass%20Multi-Sketch%20Data%20Analytics%20Accelerator&rft.jtitle=Proceedings%20of%20the%20VLDB%20Endowment&rft.date=2021-07&rft.volume=14&rft.issue=11&rft.spage=2369&rft.epage=2382&rft.issn=2150-8097&rft.au=Chiosa,%20Monica&Preu%C3%9Fer,%20Thomas%20B.&Alonso,%20Gustavo&rft.genre=proceeding&rft_id=info:doi/10.14778/3476249.3476287&
Files in this item
Publication type
-
Conference Paper [33525]