Show simple item record

dc.contributor.author
Zhang, Feng
dc.contributor.author
Zhai, Jidong
dc.contributor.author
Shen, Xipeng
dc.contributor.author
Wang, Dalin
dc.contributor.author
Chen, Zheng
dc.contributor.author
Mutlu, Onur
dc.contributor.author
Chen, Wenguang
dc.contributor.author
Du, Xiaoyong
dc.date.accessioned
2021-03-22T12:17:49Z
dc.date.available
2020-10-04T02:40:57Z
dc.date.available
2020-10-05T09:01:44Z
dc.date.available
2021-03-22T12:17:49Z
dc.date.issued
2021-03
dc.identifier.issn
1066-8888
dc.identifier.issn
0949-877X
dc.identifier.other
10.1007/s00778-020-00636-3
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/444404
dc.description.abstract
This article provides a comprehensive description of text analytics directly on compression (TADOC), which enables direct document analytics on compressed textual data. The article explains the concept of TADOC and the challenges to its effective realizations. Additionally, a series of guidelines and technical solutions that effectively address those challenges, including the adoption of a hierarchical compression method and a set of novel algorithms and data structure designs, are presented. Experiments on six data analytics tasks of various complexities show that TADOC can save 90.8% storage space and 87.9% memory usage, while halving data processing times. © 2020 Springer-Verlag GmbH Germany.
en_US
dc.language.iso
en
en_US
dc.publisher
Springer
en_US
dc.subject
Text analytics
en_US
dc.subject
Document analytics
en_US
dc.subject
Compression
en_US
dc.subject
Sequitur
en_US
dc.title
TADOC: Text analytics directly on compression
en_US
dc.type
Journal Article
dc.date.published
2020-09-19
ethz.journal.title
The VLDB Journal
ethz.journal.volume
30
en_US
ethz.journal.issue
2
en_US
ethz.journal.abbreviated
VLDB j.
ethz.pages.start
163
en_US
ethz.pages.end
188
en_US
ethz.identifier.wos
ethz.identifier.scopus
ethz.publication.place
Berlin
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::09483 - Mutlu, Onur / Mutlu, Onur
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::09483 - Mutlu, Onur / Mutlu, Onur
ethz.date.deposited
2020-10-04T02:41:03Z
ethz.source
WOS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2021-03-22T12:18:00Z
ethz.rosetta.lastUpdated
2021-03-22T12:18:00Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=TADOC:%20Text%20analytics%20directly%20on%20compression&rft.jtitle=The%20VLDB%20Journal&rft.date=2021-03&rft.volume=30&rft.issue=2&rft.spage=163&rft.epage=188&rft.issn=1066-8888&0949-877X&rft.au=Zhang,%20Feng&Zhai,%20Jidong&Shen,%20Xipeng&Wang,%20Dalin&Chen,%20Zheng&rft.genre=article&rft_id=info:doi/10.1007/s00778-020-00636-3&
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record