Show simple item record

dc.contributor.author
de Lutio, Riccardo
dc.contributor.supervisor
Schindler, Konrad
dc.contributor.supervisor
Wegner, Jan D.
dc.contributor.supervisor
Mac Aodha, Oisin
dc.date.accessioned
2023-11-16T12:29:14Z
dc.date.available
2023-11-15T18:26:19Z
dc.date.available
2023-11-16T12:29:14Z
dc.date.issued
2023
dc.identifier.uri
http://hdl.handle.net/20.500.11850/642249
dc.identifier.doi
10.3929/ethz-b-000642249
dc.description.abstract
Computer vision systems have made huge leaps forward since the times of classifying hand-written digits. Supervised learning in particular has become a ubiquitous approach for solving tasks beyond scientific research. Such systems are deployed in numerous products across a variety of industries from self-driving cars to automatic medical diagnosis and weather forecasting. These advances can be attributed to the progress in deep learning algorithms, specialized libraries, and dedicated hardware as well as an increased availability of large annotated datasets for model training. However, there remain tasks where the standard paradigm of simply capturing and annotating more data is not a viable solution. Throughout this thesis, we investigate how to best leverage multimodal data for computer vision tasks where data of sufficient quality or completeness is difficult to obtain. We focus on two specific tasks: guided super-resolution and fine-grained classification. Guided super-resolution involves upscaling low-resolution data by combining it with an auxiliary modality, while fine-grained classification requires exploiting side-information to enable classification algorithms to capture the subtle differences in appearance between fine-grained classes. Initially, we provide solutions to guided super-resolution in scenarios where ground truth data is scarce or unavailable. First, we propose a novel unsupervised formulation that views guided super-resolution as learning a pixel-to-pixel map from the guide to the source domain. We use a multi-layer perceptron parametrisation that preserves high-frequency detail. Second, we propose a novel hybrid model to best leverage deep learning methods while maintaining the rigour of solving an optimisation problem at test-time. The key is a differentiable optimisation layer that operates on a learned affinity graph, ensuring a high fidelity of the target to the source and therefore high generalisability to unseen domains. Subsequently, we propose a unified methodology for automatically identifying fine-grained plant specimens from community scientist photographs. This method is designed to exploit various priors that are generally available in community scientist observations, including geographical and temporal context, as well as plant taxonomy to learn transferable representations across similar species. Finally, we present the Herbarium 2021 Half-Earth Dataset, a large curated and open-access dataset of herbarium specimens we have created as part of a machine learning competition to encourage further research on the automatic identification of fine-grained plant species from photographs.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
Computer Vision
en_US
dc.subject
Image Classification
en_US
dc.subject
Super-Resolution
en_US
dc.subject
Deep Learning
en_US
dc.title
Exploiting Multimodal Data in Computer Vision with Applications to Super-Resolution and Classification
en_US
dc.type
Doctoral Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2023-11-16
ethz.size
133 p.
en_US
ethz.code.ddc
DDC - DDC::0 - Computer science, information & general works::004 - Data processing, computer science
en_US
ethz.identifier.diss
29402
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02115 - Dep. Bau, Umwelt und Geomatik / Dep. of Civil, Env. and Geomatic Eng.::02647 - Inst. f. Geodäsie und Photogrammetrie / Institute of Geodesy and Photogrammetry::03886 - Schindler, Konrad / Schindler, Konrad
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02115 - Dep. Bau, Umwelt und Geomatik / Dep. of Civil, Env. and Geomatic Eng.::02647 - Inst. f. Geodäsie und Photogrammetrie / Institute of Geodesy and Photogrammetry::03886 - Schindler, Konrad / Schindler, Konrad
en_US
ethz.date.deposited
2023-11-15T18:26:19Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2023-11-16T12:29:16Z
ethz.rosetta.lastUpdated
2024-02-03T06:38:20Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Exploiting%20Multimodal%20Data%20in%20Computer%20Vision%20with%20Applications%20to%20Super-Resolution%20and%20Classification&rft.date=2023&rft.au=de%20Lutio,%20Riccardo&rft.genre=unknown&rft.btitle=Exploiting%20Multimodal%20Data%20in%20Computer%20Vision%20with%20Applications%20to%20Super-Resolution%20and%20Classification
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record