Storage-centric load management for data streams with update semantics


Loading...

Date

2009-03

Publication Type

Report

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Most data stream processing systems model their inputs as append-only sequences dfg of data elements. In this model, the application expects to receive a query answer on the complete input stream. However, there are many situations in which each data element (or a window of data elements) in the stream is in fact an update to a previous one, and therefore, the most recent arrival is all that really matters to the application. UpStream defines a storage-centric approach to efficiently processing continuous queries under such an update-based stream data model. The goal is to provide the most up-to-date answers to the application with the lowest staleness possible. To achieve this, we developed a lossy tuple storage model (called an “update queue”), which under high load, will choose to sacrifice old tuples in favor of newer ones using a number of different update key scheduling heuristics. Our techniques can correctly process queries with different types of streaming operators (including sliding windows), while efficiently handling large numbers of update keys with different update frequencies. We present a detailed analysis and experimental evidence showing the effectiveness of our algorithms using both synthetic as well as real data sets.

Publication status

published

External links

Editor

Book title

Volume

620

Pages / Article No.

Publisher

ETH Zurich, Department of Computer Science, Systems Group

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

PROCESS MANAGEMENT (OPERATING SYSTEMS); MULTIMEDIA (INFORMATIONSSYSTEME); PROZESSVERWALTUNG + PROZESSMANAGEMENT (BETRIEBSSYSTEME); MULTIMEDIA (INFORMATION SYSTEMS)

Organisational unit

02150 - Dep. Informatik / Dep. of Computer Science

Notes

Funding

Related publications and datasets