An Approach to Geotag a Web Sized Corpus of Documents with Addresses in Randstad, Netherlands
Open access
Author
Date
2018-01-15Type
- Conference Paper
ETH Bibliography
no
Altmetrics
Abstract
This paper describes a cluster compute workflow about how a web sized corpus of documents (3.6 ×10^9 documents, 260 TiB of data) can be geotagged and how semantic similarities of documents geotagged to the same address could be used to verify these tags.
Permanent link
https://doi.org/10.3929/ethz-b-000225615Publication status
publishedBook title
Adjunct Proceedings of the 14th International Conference on Location Based ServicesPages / Article No.
Publisher
ETH ZurichEvent
Subject
Geotagging; Data Science; Data Mining; Natural Language ProcessingRelated publications and datasets
Is part of: https://doi.org/10.3929/ethz-b-000224043
More
Show all metadata
ETH Bibliography
no
Altmetrics