Show simple item record

dc.contributor.author
Laumer, Daniel
dc.contributor.author
Gümgümcü, Hasret
dc.contributor.author
Heitzler, Magnus
dc.contributor.author
Hurni, Lorenz
dc.contributor.editor
Irás, Krisztina
dc.date.accessioned
2021-03-19T13:54:59Z
dc.date.available
2020-12-21T16:34:02Z
dc.date.available
2021-03-19T13:54:59Z
dc.date.issued
2020-03
dc.identifier.uri
http://hdl.handle.net/20.500.11850/457895
dc.description.abstract
Digitizing historical maps automatically offers a multitude of challenges. This is particularly true for the case of label extraction since labels vary strongly in shape, size, orientation and type. In addition, characters may overlap with other features such as roads or hachures, which makes ex-traction even harder. To tackle this issue, we propose a novel semi-automatic workflow consisting of a sequence of deep learning and conventional text processing steps in conjunction with tailor-made correction software. To prove its efficiency, the workflow is being applied to the Siegfried Map Series (1870-1949) covering entire Switzerland with scales 1 : 25.000 and 1 : 50.000. The workflow consists of the following steps. First, we decide for each pixel if the content is text or background. For this purpose, we use a convolutional neuronal network with the U-Net architecture which was developed for biomedical image segmentation (Ronneberger, 2015). The weights are calculated with four manually annotated map sheets as ground truth. The trained model can then be used to predict the segmentation on any other map sheet. The results are clustered with DBSCAN (Ester, Kriegel, Sander, & Xu, 1996) to aggregate the individual pixels to letters and words. This way, each label can be localized and extracted without background. But since this is still a non-vectorized representation of the labels, we use the Google Vision API to interpret the text of each label and also search for matching entries in the Swiss Names database by Swisstopo for verification. As for most label extraction workflows, the last step consists of manually checking all labels and correcting possible mistakes. For this purpose, we modified the VGG Image Annotator to simplify the selection of the correct entry. Our framework reduces the time consumption of digitizing labels drastically by a factor of around 5. The fully automatic part (seg-mentation, interpretation, matching) takes around 5-10 min per sheet and the manual processing part around 1.5-2h. Compared to a fully manual digitizing process, time efficiency is not the only benefit. Also the chance of missing labels decreases strongly. A human cannot detect labels with the same accuracy as a computer algorithm. Most problems leading to more manual work occur during clustering and text recognition with the Google Vision API. Since the model is trained for maps in a flat part of German-speaking Switzerland, the algorithm performs poorer for other parts. In Alpine regions, the rock hachures are often misinterpreted as labels, leading to many false positives. French labels are often composed of several words, which are not clustered into one label by DBSCAN. Possible further work could include retraining with more diverse ground truth or extending the U-Net model so that it can also recognize and learn textual information.
en_US
dc.language.iso
en
en_US
dc.publisher
Department of Cartography and Geoinformatics, ELTE Eötvös Loránd University
en_US
dc.subject
Historical maps
en_US
dc.subject
Vectorization
en_US
dc.subject
Deep learning
en_US
dc.subject
Convolutional neuronal network
en_US
dc.subject
Label extraction
en_US
dc.title
A Semi-Automatic Label Digitization Workflow for the Siegfried Map
en_US
dc.type
Conference Paper
ethz.book.title
Automatic Vectorisation of Historical Maps
en_US
ethz.pages.start
55
en_US
ethz.pages.end
62
en_US
ethz.event
International Workshop on Automatic Vectorisation of Historical Maps
en_US
ethz.event.location
Budapest, Hungary
en_US
ethz.event.date
March 13, 2020
en_US
ethz.publication.place
Budapest
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02115 - Dep. Bau, Umwelt und Geomatik / Dep. of Civil, Env. and Geomatic Eng.::02648 - Inst. f. Kartografie und Geoinformation / Institute of Cartography&Geoinformation::03466 - Hurni, Lorenz / Hurni, Lorenz
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02115 - Dep. Bau, Umwelt und Geomatik / Dep. of Civil, Env. and Geomatic Eng.::02648 - Inst. f. Kartografie und Geoinformation / Institute of Cartography&Geoinformation::03466 - Hurni, Lorenz / Hurni, Lorenz
en_US
ethz.date.deposited
2020-12-21T16:34:10Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2021-03-19T13:55:08Z
ethz.rosetta.lastUpdated
2021-03-19T13:55:08Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=A%20Semi-Automatic%20Label%20Digitization%20Workflow%20for%20the%20Siegfried%20Map&rft.date=2020-03&rft.spage=55&rft.epage=62&rft.au=Laumer,%20Daniel&G%C3%BCmg%C3%BCmc%C3%BC,%20Hasret&Heitzler,%20Magnus&Hurni,%20Lorenz&rft.genre=proceeding&rft.btitle=Automatic%20Vectorisation%20of%20Historical%20Maps
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record