Local Memory Attention for Fast Video Semantic Segmentation
METADATA ONLY
Loading...
Author / Producer
Date
2021
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
We propose a novel neural network module that transforms an existing single-frame semantic segmentation model into a video semantic segmentation pipeline. In contrast to prior works, we strive towards a simple, fast, and general module that can be integrated into virtually any single-frame architecture. Our approach aggregates a rich representation of the semantic information in past frames into a memory module. Information stored in the memory is then accessed through an attention mechanism. In contrast to previous memory-based approaches, we propose a fast local attention layer, providing temporal appearance cues in the local region of prior frames. We further fuse these cues with an encoding of the current frame through a second attention-based module. The segmentation decoder processes the fused representation to predict the final semantic segmentation. We integrate our approach into two popular semantic segmentation networks: ERFNet and PSPNet. We observe an improvement in segmentation performance on Cityscapes by 1.7% and 2.1% in mIoU respectively, while increasing inference time of ERFNet by only 1.5ms. Source code is available at https://github.com/mattpfr/lmanet.
Permanent link
Publication status
published
Editor
Book title
2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Journal / series
Volume
Pages / Article No.
1102 - 1109
Publisher
IEEE
Event
2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Video Semantic Segmentation; Attention mechanism
Organisational unit
03514 - Van Gool, Luc (emeritus) / Van Gool, Luc (emeritus)