Show simple item record

dc.contributor.author
Rausch, Johannes
dc.contributor.author
Martinez, Octavio
dc.contributor.author
Bissig, Fabian
dc.contributor.author
Zhang, Ce
dc.contributor.author
Feuerriegel, Stefan
dc.date.accessioned
2021-09-07T13:50:13Z
dc.date.available
2021-09-07T13:50:13Z
dc.date.issued
2021-05
dc.identifier.issn
2159-5399
dc.identifier.issn
2374-3468
dc.identifier.uri
http://hdl.handle.net/20.500.11850/504620
dc.description.abstract
Translating renderings (e. g. PDFs, scans) into hierarchical document structures is extensively demanded in the daily routines of many real-world applications. However, a holistic, principled approach to inferring the complete hierarchical structure of documents is missing. As a remedy, we developed "DocParser": an end-to-end system for parsing the complete document structure - including all text elements, nested figures, tables, and table cell structures. Our second contribution is to provide a dataset for evaluating hierarchical document structure parsing. Our third contribution is to propose a scalable learning framework for settings where domain-specific data are scarce, which we address by a novel approach to weak supervision that significantly improves the document structure parsing performance. Our experiments confirm the effectiveness of our proposed weak supervision: Compared to the baseline without weak supervision, it improves the mean average precision for detecting document entities by 39.1% and improves the F1 score of classifying hierarchical relations by 35.8%.
en_US
dc.language.iso
en
en_US
dc.publisher
AAAI
dc.subject
Applications
en_US
dc.subject
Information extraction
en_US
dc.title
DocParser: Hierarchical Document Structure Parsing from Renderings
en_US
dc.type
Conference Paper
dc.date.published
2021-05-18
ethz.journal.title
Proceedings of the AAAI Conference on Artificial Intelligence
ethz.journal.volume
35
en_US
ethz.journal.issue
5
en_US
ethz.pages.start
4328
en_US
ethz.pages.end
4338
en_US
ethz.event
35th AAAI Conference on Artificial Intelligence (AAAI 2021)
ethz.event.location
Online
ethz.event.date
February 2-9, 2021
ethz.grant
EASEML: Toward a More Accessible and Usable Machine Learning Platform for Non-expert Users
en_US
ethz.grant
Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC, and Machine Learning
en_US
ethz.identifier.wos
ethz.publication.place
Palo Alto, CA
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02120 - Dep. Management, Technologie und Ökon. / Dep. of Management, Technology, and Ec.::09623 - Feuerriegel, Stefan (ehemalig) / Feuerriegel, Stefan (former)
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02663 - Institut für Computing Platforms / Institute for Computing Platforms::09588 - Zhang, Ce (ehemalig) / Zhang, Ce (former)
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02663 - Institut für Computing Platforms / Institute for Computing Platforms::09588 - Zhang, Ce (ehemalig) / Zhang, Ce (former)
ethz.identifier.url
https://ojs.aaai.org/index.php/AAAI/article/view/16558
ethz.grant.agreementno
184628
ethz.grant.agreementno
957407
ethz.grant.agreementno
184628
ethz.grant.agreementno
957407
ethz.grant.fundername
SNF
ethz.grant.fundername
EC
ethz.grant.fundername
SNF
ethz.grant.fundername
EC
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.funderDoi
10.13039/501100000780
ethz.grant.funderDoi
10.13039/501100001711
ethz.grant.funderDoi
10.13039/501100000780
ethz.grant.program
H2020
ethz.grant.program
H2020
ethz.grant.program
Projekte MINT
ethz.relation.isCitedBy
10.3929/ethz-b-000530965
ethz.date.deposited
2020-12-02T10:50:24Z
ethz.source
WOS
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2021-09-07T13:50:22Z
ethz.rosetta.lastUpdated
2024-02-02T14:39:42Z
ethz.rosetta.versionExported
true
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/504070
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/454188
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=DocParser:%20Hierarchical%20Document%20Structure%20Parsing%20from%20Renderings&rft.jtitle=Proceedings%20of%20the%20AAAI%20Conference%20on%20Artificial%20Intelligence&rft.date=2021-05&rft.volume=35&rft.issue=5&rft.spage=4328&rft.epage=4338&rft.issn=2159-5399&2374-3468&rft.au=Rausch,%20Johannes&Martinez,%20Octavio&Bissig,%20Fabian&Zhang,%20Ce&Feuerriegel,%20Stefan&rft.genre=proceeding&
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record