Show simple item record

dc.contributor.author
Naeem, Muhammad Ferjad
dc.contributor.author
Khan, Muhammad Gul Zain Ali
dc.contributor.author
Xian, Yongqin
dc.contributor.author
Afzal, Muhammad Zeshan
dc.contributor.author
Stricker, Didier
dc.contributor.author
Van Gool, Luc
dc.contributor.author
Tombari, Federico
dc.date.accessioned
2023-11-16T09:05:28Z
dc.date.available
2023-11-16T04:45:41Z
dc.date.available
2023-11-16T09:05:28Z
dc.date.issued
2023
dc.identifier.isbn
979-8-3503-0129-8
en_US
dc.identifier.issn
1063-6919
dc.identifier.other
10.1109/CVPR52729.2023.01456
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/642272
dc.description.abstract
Recent works have shown that unstructured text (documents) from online sources can serve as useful auxiliary information for zero-shot image classification. However, these methods require access to a high-quality source like Wikipedia and are limited to a single source of information. Large Language Models (LLM) trained on web-scale text show impressive abilities to repurpose their learned knowledge for a multitude of tasks. In this work, we provide a novel perspective on using an LLM to provide text supervision for a zero-shot image classification model. The LLM is provided with a few text descriptions from different annotators as examples. The LLM is conditioned on these examples to generate multiple text descriptions for each class (referred to as views). Our proposed model, I2MVFormer, learns multi-view semantic embeddings for zero-shot image classification with these class views. We show that each text view of a class provides complementary information allowing a model to learn a highly discriminative class embedding. Moreover, we show that I2MVFormer is better at consuming the multi-view text supervision from LLM compared to baseline models. I2MVFormer establishes a new state-of-the-art on three public benchmark datasets for zero-shot image classification with unsupervised semantic embeddings. Code available at https://github.com/ferjad/I2DFormer
en_US
dc.language.iso
en
en_US
dc.publisher
IEEE
en_US
dc.title
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
en_US
dc.type
Conference Paper
dc.date.published
2023-08-22
ethz.book.title
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
en_US
ethz.pages.start
15169
en_US
ethz.pages.end
15179
en_US
ethz.event
34th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)
en_US
ethz.event.location
Vancouver, Canada
en_US
ethz.event.date
June 18-22, 2023
en_US
ethz.identifier.wos
ethz.publication.place
Piscataway, NJ
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02652 - Institut für Bildverarbeitung / Computer Vision Laboratory::03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02652 - Institut für Bildverarbeitung / Computer Vision Laboratory::03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)
ethz.date.deposited
2023-11-16T04:45:45Z
ethz.source
WOS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2023-11-16T09:05:30Z
ethz.rosetta.lastUpdated
2024-02-03T06:37:53Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=I2MVFormer:%20Large%20Language%20Model%20Generated%20Multi-View%20Document%20Supervision%20for%20Zero-Shot%20Image%20Classification&rft.date=2023&rft.spage=15169&rft.epage=15179&rft.issn=1063-6919&rft.au=Naeem,%20Muhammad%20Ferjad&Khan,%20Muhammad%20Gul%20Zain%20Ali&Xian,%20Yongqin&Afzal,%20Muhammad%20Zeshan&Stricker,%20Didier&rft.isbn=979-8-3503-0129-8&rft.genre=proceeding&rft_id=info:doi/10.1109/CVPR52729.2023.01456&rft.btitle=2023%20IEEE/CVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20(CVPR)
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record