Search

JavaScript is disabled for your browser. Some features of this site may not work without it.

Now showing items 11-20 of 657

Stereo Risk: A Continuous Modeling Approach to Stereo Matching

Liu, Ce; Kumar, Suryansh; Gu, Shuhang; et al. (2024)

Proceedings of Machine Learning Research ~ Proceedings of the 41st International Conference on Machine Learning

We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization often fails to capture the nuanced, continuous ...

Conference Paper

LocalViT: Analyzing Locality in Vision Transformers

Li, Yawei; Zhang, Kai; Cao, Jiezhang; et al. (2023)

2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

The aim of this paper is to study the influence of locality mechanisms in vision transformers. Transformers originated from machine translation and are particularly good at modelling long-range dependencies within a long sequence. Although the global interaction between the token embeddings could be well modelled by the self-attention mechanism of transformers, what is lacking is a locality mechanism for information exchange within a local ...

Conference Paper

Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions

Popovic, Nikola; Christodoulou, Dimitrios; Paudel, Danda Pani; et al. (2023)

2023 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct)

The task of predicting 3D eye gaze from eye images can be performed either by (a) end-to-end learning for image-to-gaze mapping or by (b) fitting a 3D eye model onto images. The former case requires 3D gaze labels, while the latter requires eye semantics or landmarks to facilitate the model fitting. Although obtaining eye semantics and landmarks is relatively easy, fitting an accurate 3D eye model on them remains to be very challenging ...

Conference Paper

Optimizing Long-Term Player Tracking and Identification in NAO Robot Soccer by fusing Game-state and External Video

Albanese, Giuliano; Mitra, Arka; Zaech, Jan-Nico; et al. (2023)

RoboLetics: Workshop on Robot Learning in Athletics @CoRL 2023

Monitoring a fleet of robots requires stable long-term tracking with re-identification, which is yet an unsolved challenge in many scenarios. One application of this is the analysis of autonomous robotic soccer games at RoboCup. Tracking in these games requires handling of identically looking players, strong occlusions, and non-professional video recordings, but also offers state information estimated by the robots. In order to make ...

Conference Paper

Multi-Domain Referee Dataset: Enabling Recognition of Referee Signals on Robotic Platforms

Mitra, Arka; Molnar, Lukas; Zaech, Jan-Nico; et al. (2023)

RoboLetics: Workshop on Robot Learning in Athletics @CoRL 2023

Recognizing referee signals is crucial in human and RoboCup soccer games, where an emphasis currently lies on full robot autonomy through understanding referee signals. To advance towards this goal, we introduce the Multi-Domain Referee Dataset aimed at high-efficiency action recognition in RoboCup and examine the transfer between simulated and real domains in strongly structured settings. Our dataset includes 3,108 action sequences across ...

Conference Paper

Token-consistent Dropout for Calibrated Vision Transformers

Popovic, Nikola; Paudel, Danda Pani; Probst, Thomas; et al. (2023)

2023 IEEE International Conference on Image Processing (ICIP)

We introduce token-consistent dropout in vision transformers, which improves network calibration without causing any severe drop in performance. We use linear layers with token-consistent stochastic parameters inside the multilayer perceptron blocks, without altering the architecture of the transformer. The stochastic parameters are sampled from the uniform distribution, both during training and inference. The applied linear operations ...

Conference Paper

Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency

Ma, Liqian; Jia, Xu; Georgoulis, Stamatios; et al. (2023)

International Conference on Learning Representations (ICLR 2019)

Image-to-image translation has recently received significant attention due to advances in deep learning. Most works focus on learning either a one-to-one mapping in an unsupervised way or a many-to-many mapping in a supervised way. However, a more practical setting is many-to-many mapping in an unsupervised way, which is harder due to the lack of supervision and the complex inner- and cross-domain variations. To alleviate these issues, ...

Conference Paper

VA-DepthNet: A Variational Approach to Single Image Depth Prediction

Liu, Ce; Kumar, Suryansh; Gu, Shuhang; et al. (2023)

We introduce VA-DepthNet, a simple, effective, and accurate deep neural network approach for the single-image depth prediction (SIDP) problem. The proposed approach advocates using classical first-order variational constraints for this problem. While state-of-the-art deep neural network methods for SIDP learn the scene depth from images in a supervised setting, they often overlook the invaluable invariances and priors in the rigid scene ...

Conference Paper

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results

Kristan, Matej; Matas, Jiří; Danelljan, Martin; et al. (2023)

2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

The Visual Object Tracking Segmentation VOTS2023 challenge is the eleventh annual tracker benchmarking activity of the VOT initiative. This challenge is the first to merge short-term and long-term as well as single-target and multiple-target tracking with segmentation masks as the only target location specification. A new dataset was created; the ground truth has been withheld to prevent overfitting. New performance measures and evaluation ...

Conference Paper

Spatio-Temporal Convolution-Attention Video Network

Diba, Ali; Sharma, Vivek; Arzani, Mohammad.M; et al. (2023)

2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

In this paper, we present a hierarchical neural network based on convolutional and attention modeling for short and long-range video reasoning, called Spatio-Temporal Convolution-Attention Video Network (STCA). The proposed method is capable of learning appearance and temporal cues in two stages with different temporal depths to maximize engagement of the short-range and long-range video sequences. It has the benefits of convolutional and ...

Conference Paper

Research Collection

Search

Results

Stereo Risk: A Continuous Modeling Approach to Stereo Matching ﻿

LocalViT: Analyzing Locality in Vision Transformers ﻿

Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions ﻿

Optimizing Long-Term Player Tracking and Identification in NAO Robot Soccer by fusing Game-state and External Video ﻿

Multi-Domain Referee Dataset: Enabling Recognition of Referee Signals on Robotic Platforms ﻿

Token-consistent Dropout for Calibrated Vision Transformers ﻿

Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency ﻿

VA-DepthNet: A Variational Approach to Single Image Depth Prediction ﻿

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results ﻿

Spatio-Temporal Convolution-Attention Video Network ﻿

Refine by

Stereo Risk: A Continuous Modeling Approach to Stereo Matching

LocalViT: Analyzing Locality in Vision Transformers

Model-aware 3D Eye Gaze from Weak and Few-shot Supervisions

Optimizing Long-Term Player Tracking and Identification in NAO Robot Soccer by fusing Game-state and External Video

Multi-Domain Referee Dataset: Enabling Recognition of Referee Signals on Robotic Platforms

Token-consistent Dropout for Calibrated Vision Transformers

Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency

VA-DepthNet: A Variational Approach to Single Image Depth Prediction

The First Visual Object Tracking Segmentation VOTS2023 Challenge Results

Spatio-Temporal Convolution-Attention Video Network