Storage-centric load management for data streams with update semantics
OPEN ACCESS
Loading...
Author / Producer
Date
2009-03
Publication Type
Report
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Most data stream processing systems model their inputs as append-only sequences dfg of data elements. In this model, the application expects to receive a query answer on the complete input stream. However, there are many situations in which each data element (or a window of data elements) in the stream is in fact an update to a previous one, and therefore, the most recent arrival is all that really matters to the application. UpStream defines a storage-centric approach to efficiently processing continuous queries under such an update-based stream data model. The goal is to provide the most up-to-date answers to the application with the lowest staleness possible. To achieve this, we developed a lossy tuple storage model (called an “update queue”), which under high load, will choose to sacrifice old tuples in favor of newer ones using a number of different update key scheduling heuristics. Our techniques can correctly process queries with different types of streaming operators (including sliding windows), while efficiently handling large numbers of update keys with different update frequencies. We present a detailed analysis and experimental evidence showing the effectiveness of our algorithms using both synthetic as well as real data sets.
Permanent link
Publication status
published
External links
Editor
Book title
Journal / series
Volume
620
Pages / Article No.
Publisher
ETH Zurich, Department of Computer Science, Systems Group
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
PROCESS MANAGEMENT (OPERATING SYSTEMS); MULTIMEDIA (INFORMATIONSSYSTEME); PROZESSVERWALTUNG + PROZESSMANAGEMENT (BETRIEBSSYSTEME); MULTIMEDIA (INFORMATION SYSTEMS)
Organisational unit
02150 - Dep. Informatik / Dep. of Computer Science