Contrastive Learning for Multi-Object Tracking with Transformers
METADATA ONLY
Loading...
Author / Producer
Date
2024
Publication Type
Conference Paper
ETH Bibliography
yes
Citations
Altmetric
METADATA ONLY
Data
Rights / License
Abstract
The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-level contrastive loss, a revised sampling strategy and a lightweight assignment method. Our training scheme learns object appearances while preserving detection capabilities and with little overhead. Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset and is comparable to existing transformer-based methods on the MOT17 dataset.
Permanent link
Publication status
published
External links
Editor
Book title
2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Journal / series
Volume
Pages / Article No.
6853 - 6863
Publisher
IEEE
Event
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Algorithms; Video recognition and understanding; Image recognition and understanding; Applications; Autonomous Driving