The Art of Natural Language Processing: Classical, Modern and Contemporary Approaches to Text Document Classification
Metadata only
Datum
2020-03-31Typ
- Working Paper
ETH Bibliographie
yes
Altmetrics
Abstract
In this tutorial we introduce three approaches to preprocess text data with Natural Language Processing (NLP) and perform text document classification using machine learning. The first approach is based on 'bag-of-' models, the second one on word embeddings, while the third one introduces the two most popular Recurrent Neural Networks (RNNs), i.e. the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures. We apply all approaches on a case study where we classify movie reviews using Python and Tensorflow 2.0. The results of the case study show that extreme gradient boosting algorithms outperform adaptive boosting and random forests on bag-of-words and word embedding models, as well as LSTM and GRU RNNs, but at a steep computational cost. Finally, we provide the reader with comments on NLP applications for the insurance industry. Mehr anzeigen
Publikationsstatus
publishedExterne Links
Zeitschrift / Serie
SSRNSeiten / Artikelnummer
Verlag
Social Science Research NetworkThema
Natural language processing; Bag-of-words models; Word embeddings; Machine learning; Recurrent neural networks; Deep learning; Python; Tensorflow 2.0; KerasOrganisationseinheit
03995 - von Wangenheim, Florian / von Wangenheim, Florian
ETH Bibliographie
yes
Altmetrics