Search
Results
-
EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation
(2024)2023 IEEE/CVF International Conference on Computer Vision (ICCV)With autonomous industries on the rise, domain adaptation of the visual perception stack is an important research direction due to the cost savings promise. Much prior art was dedicated to domain-adaptive semantic segmentation in the synthetic-to-real context. Despite being a crucial output of the perception stack, panoptic segmentation has been largely overlooked by the domain adaptation community. Therefore, we revisit well-performing ...Conference Paper -
MultiVT: Multiple-Task Framework for Dentistry
(2024)Lecture Notes in Computer Science ~ Domain Adaptation and Representation TransferCurrent image understanding methods on dental data are often trained end-to-end on inputs and labels, with focus on using state-of-the-art neural architectures. Such approaches, however, typically ignore domain specific peculiarities and lack the ability to generalize outside their training dataset. We observe that, in RGB images, teeth display a weak or unremarkable texture while exhibiting strong boundaries; similarly, in panoramic ...Conference Paper -
Replay-Based Online Adaptation for Unsupervised Deep Visual Odometry
(2024)Lecture Notes in Computer Science ~ Progress in Pattern Recognition, Image Analysis, Computer Vision, and ApplicationsOnline adaptation is a promising paradigm that enables dynamic adaptation to new environments. In recent years, there has been a growing interest in exploring online adaptation for various problems, including visual odometry, a crucial task in robotics, autonomous systems, and driver assistance applications. In this work, we leverage experience replay, a potent technique for enhancing online adaptation, to explore the replay-based online ...Conference Paper -
LocalViT: Analyzing Locality in Vision Transformers
(2023)2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)The aim of this paper is to study the influence of locality mechanisms in vision transformers. Transformers originated from machine translation and are particularly good at modelling long-range dependencies within a long sequence. Although the global interaction between the token embeddings could be well modelled by the self-attention mechanism of transformers, what is lacking is a locality mechanism for information exchange within a local ...Conference Paper -
Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions
(2023)2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)The task of predicting 3D eye gaze from eye images can be performed either by (a) end-to-end learning for image-to-gaze mapping or by (b) fitting a 3D eye model onto images. The former case requires 3D gaze labels, while the latter requires eye semantics or landmarks to facilitate the model fitting. Although obtaining eye semantics and landmarks is relatively easy, fitting an accurate 3D eye model on them remains to be very challenging ...Conference Paper -
Optimizing Long-Term Player Tracking and Identification in NAO Robot Soccer by fusing Game-state and External Video
(2023)RoboLetics: Workshop on Robot Learning in Athletics @CoRL 2023Monitoring a fleet of robots requires stable long-term tracking with re-identification, which is yet an unsolved challenge in many scenarios. One application of this is the analysis of autonomous robotic soccer games at RoboCup. Tracking in these games requires handling of identically looking players, strong occlusions, and non-professional video recordings, but also offers state information estimated by the robots. In order to make ...Conference Paper -
Multi-Domain Referee Dataset: Enabling Recognition of Referee Signals on Robotic Platforms
(2023)RoboLetics: Workshop on Robot Learning in Athletics @CoRL 2023Recognizing referee signals is crucial in human and RoboCup soccer games, where an emphasis currently lies on full robot autonomy through understanding referee signals. To advance towards this goal, we introduce the Multi-Domain Referee Dataset aimed at high-efficiency action recognition in RoboCup and examine the transfer between simulated and real domains in strongly structured settings. Our dataset includes 3,108 action sequences across ...Conference Paper -
Token-consistent Dropout for Calibrated Vision Transformers
(2023)2023 IEEE International Conference on Image Processing (ICIP)We introduce token-consistent dropout in vision transformers, which improves network calibration without causing any severe drop in performance. We use linear layers with token-consistent stochastic parameters inside the multilayer perceptron blocks, without altering the architecture of the transformer. The stochastic parameters are sampled from the uniform distribution, both during training and inference. The applied linear operations ...Conference Paper -
VA-DepthNet: A Variational Approach to Single Image Depth Prediction
(2023)We introduce VA-DepthNet, a simple, effective, and accurate deep neural network approach for the single-image depth prediction (SIDP) problem. The proposed approach advocates using classical first-order variational constraints for this problem. While state-of-the-art deep neural network methods for SIDP learn the scene depth from images in a supervised setting, they often overlook the invaluable invariances and priors in the rigid scene ...Conference Paper -
CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution
(2023)2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Learning continuous image representations is recently gaining popularity for image super-resolution (SR) because of its ability to reconstruct high-resolution images with arbitrary scales from low-resolution inputs. Existing methods mostly ensemble nearby features to predict the new pixel at any queried coordinate in the SR image. Such a local ensemble suffers from some limitations: i) it has no learnable parameters and it neglects the ...Conference Paper