An information-theoretic analysis of targeted regressions during reading


Date

2024-08

Publication Type

Journal Article

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Regressions, or backward saccades, are common during reading, accounting for between 5% and 20% of all saccades. And yet, relatively little is known about what causes them. We provide an information-theoretic operationalization for two previous qualitative hypotheses about regressions, which we dub reactivation and reanalysis. We argue that these hypotheses make different predictions about the pointwise mutual information or PMI between a regression's source and target. Intuitively, the PMI between two words measures how much more (or less) likely one word is to be present given the other. On one hand, the reactivation hypothesis predicts that regressions occur between words that are associated, implying high positive values of PMI. On the other hand, the reanalysis hypothesis predicts that regressions should occur between words that are not associated with each other, implying negative, low values of PMI. As a second theoretical contribution, we expand on previous theories by considering not only PMI but also expected values of PMI, E[PMI], where the expectation is taken over all possible realizations of the regression's target. The rationale for this is that language processing involves making inferences under uncertainty, and readers may be uncertain about what they have read, especially if a previous word was skipped. To test both theories, we use contemporary language models to estimate PMI-based statistics over word pairs in three corpora of eye tracking data in English, as well as in six languages across three language families (Indo-European, Uralic, and Turkic). Our results are consistent across languages and models tested: Positive values of PMI and E[PMI] consistently help to predict the patterns of regressions during reading, whereas negative values of PMI and E[PMI] do not. Our information-theoretic interpretation increases the predictive scope of both theories and our studies present the first systematic crosslinguistic analysis of regressions in the literature. Our results support the reactivation hypothesis and, more broadly, they expand the number of language processing behaviors that can be linked to information-theoretic principles.

Publication status

published

Editor

Book title

Journal / series

Volume

249

Pages / Article No.

105765

Publisher

Elsevier

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Regressions; Reading; Language processing; Information theory; Mutual information; Eye tracking

Organisational unit

09682 - Cotterell, Ryan / Cotterell, Ryan check_circle
09462 - Hofmann, Thomas / Hofmann, Thomas check_circle

Notes

Funding

Related publications and datasets