Causal Direction of Data Collection Matters: Implications of Causal and Anticausal Learning for NLP
OPEN ACCESS
Loading...
Author / Producer
Date
2021-11
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
The principle of independent causal mechanisms (ICM) states that generative processes of real world data consist of independent modules which do not influence or inform each other. While this idea has led to fruitful developments in the field of causal inference, it is not widely-known in the NLP community. In this work, we argue that the causal direction of the data collection process bears nontrivial implications that can explain a number of published NLP findings, such as differences in semi-supervised learning (SSL) and domain adaptation (DA) performance across different settings. We categorize common NLP tasks according to their causal direction and empirically assay the validity of the ICM principle for text data using minimum description length. We conduct an extensive meta-analysis of over 100 published SSL and 30 DA studies, and find that the results are consistent with our expectations based on causal insights. This work presents the first attempt to analyze the ICM principle in NLP, and provides constructive suggestions for future modeling choices.
Permanent link
Publication status
published
Book title
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Journal / series
Volume
Pages / Article No.
9499 - 9513
Publisher
Association for Computational Linguistics
Event
2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
09684 - Sachan, Mrinmaya / Sachan, Mrinmaya