Show simple item record

dc.contributor.author
Wu, Zongwei
dc.contributor.author
Wang, Jingjing
dc.contributor.author
Zhou, Zhuyun
dc.contributor.author
An, Zhaochong
dc.contributor.author
Jiang, Qiuping
dc.contributor.author
Demonceaux, Cédric
dc.contributor.author
Sun, Guolei
dc.contributor.author
Timofte, Radu
dc.date.accessioned
2023-12-22T10:14:58Z
dc.date.available
2023-12-19T10:41:24Z
dc.date.available
2023-12-22T10:14:58Z
dc.date.issued
2023-10
dc.identifier.isbn
979-8-4007-0108-5
en_US
dc.identifier.other
10.1145/3581783.3611970
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/648567
dc.description.abstract
Multi-sensor clues have shown promise for object segmentation, but inherent noise in each sensor, as well as the calibration error in practice, may bias the segmentation accuracy. In this paper, we propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features, with the aim of controlling the modal contribution based on relative entropy. We explore semantics among the multimodal inputs in two aspects: the modality-shared consistency and the modality-specific variation. Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision. On the one hand, the AF block explicitly dissociates the shared and specific representation and learns to weight the modal contribution by adjusting the proportion, region, and pattern, depending upon the quality. On the other hand, our CFD initially decodes the shared feature and then refines the output through specificity-aware querying. Further, we enforce semantic consistency across the decoding layers to enable interaction across network hierarchies, improving feature discriminability. Exhaustive comparison on eleven datasets with depth or thermal clues, and on two challenging tasks, namely salient and camouflage object segmentation, validate our effectiveness in terms of both performance and robustness. The source code is publicly available at https://github.com/Zongwei97/XMSNet.
en_US
dc.language.iso
en
en_US
dc.publisher
Association for Computing Machinery
en_US
dc.subject
RGB-X Object Segmentation
en_US
dc.subject
Cross-Modal Semantics
en_US
dc.subject
Robustness
en_US
dc.title
Object Segmentation by Mining Cross-Modal Semantics
en_US
dc.type
Conference Paper
dc.date.published
2023-10-27
ethz.book.title
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
en_US
ethz.pages.start
3455
en_US
ethz.pages.end
3464
en_US
ethz.event
31st ACM International Conference on Multimedia (MM 2023)
en_US
ethz.event.location
Ottawa, Canada
en_US
ethz.event.date
October 29 - November 3, 2021
en_US
ethz.identifier.scopus
ethz.publication.place
New York, NY
en_US
ethz.publication.status
published
en_US
ethz.date.deposited
2023-12-19T10:41:26Z
ethz.source
SCOPUS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2023-12-22T10:14:59Z
ethz.rosetta.lastUpdated
2023-12-22T10:14:59Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Object%20Segmentation%20by%20Mining%20Cross-Modal%20Semantics&rft.date=2023-10&rft.spage=3455&rft.epage=3464&rft.au=Wu,%20Zongwei&Wang,%20Jingjing&Zhou,%20Zhuyun&An,%20Zhaochong&Jiang,%20Qiuping&rft.isbn=979-8-4007-0108-5&rft.genre=proceeding&rft_id=info:doi/10.1145/3581783.3611970&rft.btitle=MM%20'23:%20Proceedings%20of%20the%2031st%20ACM%20International%20Conference%20on%20Multimedia
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record