
Open access
Author
Date
2021Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
In this thesis, we aim to narrow the gap between human language processing and computational language processing. Natural language processing (NLP) models are imperfect and lack intricate capabilities that humans access automatically when processing speech or reading text. Human language processing signals can be leveraged to increase the performance of machine learning (ML) models and to pursue explanatory research for a better understanding of the differences between human and machine language processing. In particular, the contributions of this thesis are threefold:
1. We compile the Zurich Cognitive Language Processing Corpus (ZuCo), a dataset of simultaneous eye tracking and electroencephalography (EEG) recordings from participants reading natural sentences from real-world texts. When we read, our brain processes language and generates cognitive processing signals such as gaze patterns and brain activity. ZuCo includes data of 30 English native speakers, each reading 700-1,100 sentences. This corpus represents a valuable resource for cognitively-inspired NLP.
2. We leverage these cognitive signals to augment ML models for NLP. Compared to purely text-based models, we show consistent improvements across a range of tasks and for both eye tracking and brain activity data. We further explore two of the main challenges in this area: (i) decoding brain activity for language processing and (ii) dealing with limited training data to eliminate the need for recorded cognitive signals at test time.
3. We evaluate the cognitive plausibility of computational language models, the cornerstones of state-of-the-art NLP. We develop CogniVal, the first openly available framework for evaluating English word embeddings based on cognitive lexical semantics. Specifically, embeddings are evaluated by their performance at predicting a wide range of cognitive data sources recorded during language comprehension, including multiple eye tracking datasets and brain activity recordings such as electroencephalography and functional magnetic resonance imaging. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000472454Publication status
publishedExternal links
Search print copy at ETH Library
Contributors
Examiner: Zhang, Ce
Examiner: Buhmann, Joachim
Examiner: Langer, Nicolas
Examiner: Beinborn, Lisa
Examiner: Volk, Martin
Publisher
ETH ZurichSubject
machine learning; natural language processing; Cognitive scienceOrganisational unit
09588 - Zhang, Ce / Zhang, Ce
More
Show all metadata
ETH Bibliography
yes
Altmetrics