Leveraging large amounts of weakly supervised data for multi-language sentiment classification
OPEN ACCESS
Loading...
Author / Producer
Date
2017
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
This paper presents a novel approach for multi-lingual sentiment classification in short texts. This is a challenging task as the amount of training data in languages other than English is very limited. Previously proposed multi-lingual approaches typically require to establish a correspondence to English for which powerful classifiers are already available. In contrast, our method does not require such supervision. We leverage large amounts of weakly- supervised data in various languages to train a multi-layer convolutional network and demonstrate the importance of using pre-training of such networks. We thoroughly evaluate our approach on various multi-lingual datasets, including the recent SemEval-2016 sentiment prediction benchmark (Task 4), where we achieved state-of-the-art performance. We also compare the performance of our model trained individually for each language to a variant trained for all languages at once. We show that the latter model reaches slightly worse - but still acceptable - performance when compared to the single language model, while benefiting from better generalization properties across languages.
Permanent link
Publication status
published
External links
Editor
Book title
Proceedings of the 26th International Conference on World Wide Web (WWW' 17)
Journal / series
Volume
Pages / Article No.
1045 - 1052
Publisher
Association for Computing Machinery
Event
26th International World Wide Web Conference (WWW 2017)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Sentiment classification; multi-language; weak supervision; neural networks
Organisational unit
09462 - Hofmann, Thomas / Hofmann, Thomas