Towards Data-Centric Automated Fact Checking


Author / Producer

Date

2024

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

In the digital age, we are faced with a steady stream of mis- and disinformation. Automatic fact checking tries to automatically detect factually wrong claims by contrasting them to trustworthy facts found in a dependable knowledge base. Such methods can be used to assist fact checkers and content moderators, and increase online safety by making online discourse more truthful. This thesis is a cumulative thesis, and the individual projects are concerned with explainable claim verification, evidence retrieval, the knowledge bases from which we retrieve evidence and finally environmental claim detection. The recurrent theme is a focus on data, and thus can be loosely interpreted as data-centric automated fact checking. The contributions consist of firstly an automatically generated dataset for explainable claim verification using few-shot prompting and how to use such new technology to tackle problems which previously were thought of being too expensive to even approach. Secondly, new advances in sparse transformer models enable us to model data in evidence retrieval using more context. We show that this approach leads to better performance on all conceivable metrics while retrieving evidence for claim verification from Wikipedia pages. Thirdly, we expand the definition of data-centric in automated fact checking to all data dependencies, that is not only the individual datasets which should be of high quality, but also the knowledge bases used. Last, we introduce the task of environmental claim detection and annotate and release a strictly speaking data-centric expert-annotated dataset for this task. Thus, this thesis tackles automated fact checking in the ever fast-paced field of Natural Language Processing. Three years in this field are a long time, and new methods and best practices are seemingly emerging every other month. We tried to do justice to such challenging circumstances.

Publication status

published

Editor

Contributors

Examiner : Ash, Elliott
Examiner : Sachan, Mrinmaya
Examiner : Vlachos, Andreas

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Automated Fact Checking; Natural Language Processing (NLP); MACHINE LEARNING (ARTIFICIAL INTELLIGENCE)

Organisational unit

09627 - Ash, Elliott / Ash, Elliott check_circle

Notes

Funding

Related publications and datasets