Cross-Domain Topic Classification for Political Texts
OPEN ACCESS
Loading...
Author / Producer
Date
2023-01
Publication Type
Journal Article
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
We introduce and assess the use of supervised learning in cross-domain topic classification. In this approach, an algorithm learns to classify topics in a labeled source corpus and then extrapolates topics in an unlabeled target corpus from another domain. The ability to use existing training data makes this method significantly more efficient than within-domain supervised learning. It also has three advantages over unsupervised topic models: the method can be more specifically targeted to a research question and the resulting topics are easier to validate and interpret. We demonstrate the method using the case of labeled party platforms (source corpus) and unlabeled parliamentary speeches (target corpus). In addition to the standard within-domain error metrics, we further validate the cross-domain performance by labeling a subset of target-corpus documents. We find that the classifier accurately assigns topics in the parliamentary speeches, although accuracy varies substantially by topic. We also propose tools diagnosing cross-domain classification. To illustrate the usefulness of the method, we present two case studies on how electoral rules and the gender of parliamentarians influence the choice of speech topics.
Permanent link
Publication status
published
External links
Editor
Book title
Journal / series
Volume
31 (1)
Pages / Article No.
58 - 80
Publisher
Cambridge University Press
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
cross-domain classification; supervised learning; text analysis; manifesto corpus; parliamentary speeches; electoral reform; debate participation
Organisational unit
09627 - Ash, Elliott / Ash, Elliott
Notes
Funding
Related publications and datasets
Is new version of: