Show simple item record

dc.contributor.author
Stammbach, Dominik
dc.contributor.supervisor
Ash, Elliott
dc.contributor.supervisor
Sachan, Mrinmaya
dc.contributor.supervisor
Vlachos, Andreas
dc.date.accessioned
2024-05-28T10:40:45Z
dc.date.available
2024-05-27T15:14:55Z
dc.date.available
2024-05-28T08:17:22Z
dc.date.available
2024-05-28T10:40:45Z
dc.date.issued
2024
dc.identifier.uri
http://hdl.handle.net/20.500.11850/674960
dc.identifier.doi
10.3929/ethz-b-000674960
dc.description.abstract
In the digital age, we are faced with a steady stream of mis- and disinformation. Automatic fact checking tries to automatically detect factually wrong claims by contrasting them to trustworthy facts found in a dependable knowledge base. Such methods can be used to assist fact checkers and content moderators, and increase online safety by making online discourse more truthful. This thesis is a cumulative thesis, and the individual projects are concerned with explainable claim verification, evidence retrieval, the knowledge bases from which we retrieve evidence and finally environmental claim detection. The recurrent theme is a focus on data, and thus can be loosely interpreted as data-centric automated fact checking. The contributions consist of firstly an automatically generated dataset for explainable claim verification using few-shot prompting and how to use such new technology to tackle problems which previously were thought of being too expensive to even approach. Secondly, new advances in sparse transformer models enable us to model data in evidence retrieval using more context. We show that this approach leads to better performance on all conceivable metrics while retrieving evidence for claim verification from Wikipedia pages. Thirdly, we expand the definition of data-centric in automated fact checking to all data dependencies, that is not only the individual datasets which should be of high quality, but also the knowledge bases used. Last, we introduce the task of environmental claim detection and annotate and release a strictly speaking data-centric expert-annotated dataset for this task. Thus, this thesis tackles automated fact checking in the ever fast-paced field of Natural Language Processing. Three years in this field are a long time, and new methods and best practices are seemingly emerging every other month. We tried to do justice to such challenging circumstances.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
Automated Fact Checking
en_US
dc.subject
Natural Language Processing (NLP)
en_US
dc.subject
MACHINE LEARNING (ARTIFICIAL INTELLIGENCE)
en_US
dc.title
Towards Data-Centric Automated Fact Checking
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2024-05-28
ethz.size
148 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science
en_US
ethz.identifier.diss
30183
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02045 - Dep. Geistes-, Sozial- u. Staatswiss. / Dep. of Humanities, Social and Pol.Sc.::09627 - Ash, Elliott / Ash, Elliott
en_US
ethz.date.deposited
2024-05-27T15:14:56Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2024-05-28T10:40:47Z
ethz.rosetta.lastUpdated
2024-05-28T10:40:47Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Towards%20Data-Centric%20Automated%20Fact%20Checking&rft.date=2024&rft.au=Stammbach,%20Dominik&rft.genre=unknown&rft.btitle=Towards%20Data-Centric%20Automated%20Fact%20Checking
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record