Abstract
The growing popularity of the JSON format has fueled increased
interest in loading and processing JSON data within analytical data
processing systems. However, in many applications, JSON pars-
ing dominates performance and cost. In this paper, we present a
new JSON parser called Mison that is particularly tailored to this
class of applications, by pushing down both projection and filter
operators of analytical queries into the parser. To achieve these
features, we propose to deviate from the traditional approach of
building parsers using finite state machines (FSMs). Instead, we
follow a two-level approach that enables the parser to jump di-
rectly to the correct position of a queried field without having to
perform expensive tokenizing steps to find the field. At the upper
level, Mison speculatively predicts the logical locations of queried
fields based on previously seen patterns in a dataset. At the lower
level, Mison builds structural indices on JSON data to map logi-
cal locations to physical locations. Unlike all existing FSM-based
parsers, building structural indices converts control flow into data
flow, thereby largely eliminating inherently unpredictable branches
in the program and exploiting the parallelism available in modern
processors. We experimentally evaluate Mison using representative
real-world JSON datasets and the TPC-H benchmark, and show
that Mison produces significant performance benefits over the best
existing JSON parsers; in some cases, the performance improve-
ment is over one order of magnitude. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000234616Publication status
publishedBook title
Proceedings of the VLDB EndowmentVolume
Pages / Article No.
Publisher
Association for Computing MachineryEvent
Organisational unit
03689 - Kossmann, Donald (ehemalig)
More
Show all metadata
ETH Bibliography
yes
Altmetrics