Search
Results
-
Stereo Risk: A Continuous Modeling Approach to Stereo Matching
(2024)Proceedings of Machine Learning Research ~ Proceedings of the 41st International Conference on Machine LearningWe introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization often fails to capture the nuanced, continuous ...Conference Paper -
LocalViT: Analyzing Locality in Vision Transformers
(2023)2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)The aim of this paper is to study the influence of locality mechanisms in vision transformers. Transformers originated from machine translation and are particularly good at modelling long-range dependencies within a long sequence. Although the global interaction between the token embeddings could be well modelled by the self-attention mechanism of transformers, what is lacking is a locality mechanism for information exchange within a local ...Conference Paper -
Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions
(2023)2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)The task of predicting 3D eye gaze from eye images can be performed either by (a) end-to-end learning for image-to-gaze mapping or by (b) fitting a 3D eye model onto images. The former case requires 3D gaze labels, while the latter requires eye semantics or landmarks to facilitate the model fitting. Although obtaining eye semantics and landmarks is relatively easy, fitting an accurate 3D eye model on them remains to be very challenging ...Conference Paper -
Optimizing Long-Term Player Tracking and Identification in NAO Robot Soccer by fusing Game-state and External Video
(2023)RoboLetics: Workshop on Robot Learning in Athletics @CoRL 2023Monitoring a fleet of robots requires stable long-term tracking with re-identification, which is yet an unsolved challenge in many scenarios. One application of this is the analysis of autonomous robotic soccer games at RoboCup. Tracking in these games requires handling of identically looking players, strong occlusions, and non-professional video recordings, but also offers state information estimated by the robots. In order to make ...Conference Paper -
Multi-Domain Referee Dataset: Enabling Recognition of Referee Signals on Robotic Platforms
(2023)RoboLetics: Workshop on Robot Learning in Athletics @CoRL 2023Recognizing referee signals is crucial in human and RoboCup soccer games, where an emphasis currently lies on full robot autonomy through understanding referee signals. To advance towards this goal, we introduce the Multi-Domain Referee Dataset aimed at high-efficiency action recognition in RoboCup and examine the transfer between simulated and real domains in strongly structured settings. Our dataset includes 3,108 action sequences across ...Conference Paper -
Token-consistent Dropout for Calibrated Vision Transformers
(2023)2023 IEEE International Conference on Image Processing (ICIP)We introduce token-consistent dropout in vision transformers, which improves network calibration without causing any severe drop in performance. We use linear layers with token-consistent stochastic parameters inside the multilayer perceptron blocks, without altering the architecture of the transformer. The stochastic parameters are sampled from the uniform distribution, both during training and inference. The applied linear operations ...Conference Paper -
Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency
(2023)International Conference on Learning Representations (ICLR 2019)Image-to-image translation has recently received significant attention due to advances in deep learning. Most works focus on learning either a one-to-one mapping in an unsupervised way or a many-to-many mapping in a supervised way. However, a more practical setting is many-to-many mapping in an unsupervised way, which is harder due to the lack of supervision and the complex inner- and cross-domain variations. To alleviate these issues, ...Conference Paper -
VA-DepthNet: A Variational Approach to Single Image Depth Prediction
(2023)We introduce VA-DepthNet, a simple, effective, and accurate deep neural network approach for the single-image depth prediction (SIDP) problem. The proposed approach advocates using classical first-order variational constraints for this problem. While state-of-the-art deep neural network methods for SIDP learn the scene depth from images in a supervised setting, they often overlook the invaluable invariances and priors in the rigid scene ...Conference Paper -
The First Visual Object Tracking Segmentation VOTS2023 Challenge Results
(2023)2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)The Visual Object Tracking Segmentation VOTS2023 challenge is the eleventh annual tracker benchmarking activity of the VOT initiative. This challenge is the first to merge short-term and long-term as well as single-target and multiple-target tracking with segmentation masks as the only target location specification. A new dataset was created; the ground truth has been withheld to prevent overfitting. New performance measures and evaluation ...Conference Paper -
Spatio-Temporal Convolution-Attention Video Network
(2023)2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)In this paper, we present a hierarchical neural network based on convolutional and attention modeling for short and long-range video reasoning, called Spatio-Temporal Convolution-Attention Video Network (STCA). The proposed method is capable of learning appearance and temporal cues in two stages with different temporal depths to maximize engagement of the short-range and long-range video sequences. It has the benefits of convolutional and ...Conference Paper