Segment Anything in High Quality
dc.contributor.author
Ke, Lei
dc.contributor.author
Ye, Mingqiao
dc.contributor.author
Danelljan, Martin
dc.contributor.author
Liu, Yifan
dc.contributor.author
Tai, Yu-Wing
dc.contributor.author
Tang, Chi-Keung
dc.contributor.author
Yu, Fisher
dc.contributor.editor
Oh, Alice
dc.contributor.editor
Naumann, Tristan
dc.contributor.editor
Globerson, Amir
dc.contributor.editor
Saenko, Kate
dc.contributor.editor
Hardt, Moritz
dc.contributor.editor
Levine, Sergey
dc.date.accessioned
2024-07-15T08:21:49Z
dc.date.available
2023-12-05T14:47:54Z
dc.date.available
2023-12-05T15:42:23Z
dc.date.available
2023-12-05T15:42:53Z
dc.date.available
2024-07-15T08:21:49Z
dc.date.issued
2024-07
dc.identifier.isbn
978-1-7138-9992-1
dc.identifier.uri
http://hdl.handle.net/20.500.11850/645738
dc.identifier.doi
10.3929/ethz-b-000645738
dc.description.abstract
The recent Segment Anything Model (SAM) represents a big leap in scaling up segmentation models, allowing for powerful zero-shot capabilities and flexible prompting. Despite being trained with 1.1 billion masks, SAM's mask prediction quality falls short in many cases, particularly when dealing with objects that have intricate structures. We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM's original promptable design, efficiency, and zero-shot generalizability. Our careful design reuses and preserves the pre-trained model weights of SAM, while only introducing minimal additional parameters and computation. We design a learnable High-Quality Output Token, which is injected into SAM's mask decoder and is responsible for predicting the high-quality mask. Instead of only applying it on mask-decoder features, we first fuse them with early and final ViT features for improved mask details. To train our introduced learnable parameters, we compose a dataset of 44K fine-grained masks from several sources. HQ-SAM is only trained on the introduced detaset of 44k masks, which takes only 4 hours on 8 GPUs. We show the efficacy of HQ-SAM in a suite of 10 diverse segmentation datasets across different downstream tasks, where 8 out of them are evaluated in a zero-shot transfer protocol.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Curran
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.title
Segment Anything in High Quality
en_US
dc.type
Conference Paper
dc.rights.license
In Copyright - Non-Commercial Use Permitted
ethz.book.title
Advances in Neural Information Processing Systems 36
en_US
ethz.pages.start
29914
en_US
ethz.pages.end
29934
en_US
ethz.version.deposit
acceptedVersion
en_US
ethz.event
37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023)
en_US
ethz.event.location
New Orleans, LA, USA
en_US
ethz.event.date
December 10-16, 2023
en_US
ethz.notes
Poster presentation on December 13, 2023
en_US
ethz.identifier.wos
ethz.publication.place
Red Hook, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02652 - Institut für Bildverarbeitung / Computer Vision Laboratory::09688 - Yu, Fisher / Yu, Fisher
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02652 - Institut für Bildverarbeitung / Computer Vision Laboratory::09688 - Yu, Fisher / Yu, Fisher
en_US
ethz.identifier.url
https://papers.nips.cc/paper_files/paper/2023/hash/5f828e38160f31935cfe9f67503ad17c-Abstract-Conference.html
ethz.relation.isSupplementedBy
https://github.com/SysCV/SAM-HQ
ethz.date.deposited
2023-12-05T14:47:54Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2024-07-15T08:21:56Z
ethz.rosetta.lastUpdated
2024-07-15T08:21:56Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Segment%20Anything%20in%20High%20Quality&rft.date=2024-07&rft.spage=29914&rft.epage=29934&rft.au=Ke,%20Lei&Ye,%20Mingqiao&Danelljan,%20Martin&Liu,%20Yifan&Tai,%20Yu-Wing&rft.isbn=978-1-7138-9992-1&rft.genre=proceeding&rft.btitle=Advances%20in%20Neural%20Information%20Processing%20Systems%2036
Dateien zu diesem Eintrag
Publikationstyp
-
Conference Paper [35260]