Mining Relations Among Cross-Frame Affinities for Video Semantic Segmentation
Metadata only
Date
2022Type
- Conference Paper
ETH Bibliography
yes
Altmetrics
Abstract
The essence of video semantic segmentation (VSS) is how to leverage temporal information for prediction. Previous efforts are mainly devoted to developing new techniques to calculate the cross-frame affinities such as optical flow and attention. Instead, this paper contributes from a different angle by mining relations among cross-frame affinities, upon which better temporal information aggregation could be achieved. We explore relations among affinities in two aspects: single-scale intrinsic correlations and multi-scale relations. Inspired by traditional feature processing, we propose Single-scale Affinity Refinement (SAR) and Multiscale Affinity Aggregation (MAA). To make it feasible to execute MAA, we propose a Selective Token Masking (STM) strategy to select a subset of consistent reference tokens for different scales when calculating affinities, which also improves the efficiency of our method. At last, the cross-frame affinities strengthened by SAR and MAA are adopted for adaptively aggregating temporal information. Our experiments demonstrate that the proposed method performs favorably against state-of-theart VSS methods. The code is publicly available at https://github.com/GuoleiSun/VSS-MRCFA. Show more
Publication status
publishedExternal links
Book title
Computer Vision – ECCV 2022Journal / series
Lecture Notes in Computer ScienceVolume
Pages / Article No.
Publisher
SpringerEvent
Subject
Video semantic segmentation; Cross-frame affinities; Single-scale Affinity Refinement; Multi-scale Affinity AggregationMore
Show all metadata
ETH Bibliography
yes
Altmetrics