An information-theoretic analysis of targeted regressions during reading
OPEN ACCESS
Author / Producer
Date
2024-08
Publication Type
Journal Article
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Regressions, or backward saccades, are common during reading, accounting for between 5% and 20% of all saccades. And yet, relatively little is known about what causes them. We provide an information-theoretic operationalization for two previous qualitative hypotheses about regressions, which we dub reactivation and reanalysis. We argue that these hypotheses make different predictions about the pointwise mutual information or PMI between a regression's source and target. Intuitively, the PMI between two words measures how much more (or less) likely one word is to be present given the other. On one hand, the reactivation hypothesis predicts that regressions occur between words that are associated, implying high positive values of PMI. On the other hand, the reanalysis hypothesis predicts that regressions should occur between words that are not associated with each other, implying negative, low values of PMI. As a second theoretical contribution, we expand on previous theories by considering not only PMI but also expected values of PMI, E[PMI], where the expectation is taken over all possible realizations of the regression's target. The rationale for this is that language processing involves making inferences under uncertainty, and readers may be uncertain about what they have read, especially if a previous word was skipped. To test both theories, we use contemporary language models to estimate PMI-based statistics over word pairs in three corpora of eye tracking data in English, as well as in six languages across three language families (Indo-European, Uralic, and Turkic). Our results are consistent across languages and models tested: Positive values of PMI and E[PMI] consistently help to predict the patterns of regressions during reading, whereas negative values of PMI and E[PMI] do not. Our information-theoretic interpretation increases the predictive scope of both theories and our studies present the first systematic crosslinguistic analysis of regressions in the literature. Our results support the reactivation hypothesis and, more broadly, they expand the number of language processing behaviors that can be linked to information-theoretic principles.
Permanent link
Publication status
published
Editor
Book title
Journal / series
Volume
249
Pages / Article No.
105765
Publisher
Elsevier
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Regressions; Reading; Language processing; Information theory; Mutual information; Eye tracking
Organisational unit
09682 - Cotterell, Ryan / Cotterell, Ryan
09462 - Hofmann, Thomas / Hofmann, Thomas