Search
Results
-
Facial Emotion Recognition with Noisy Multi-task Annotations
(2021)2021 IEEE Winter Conference on Applications of Computer Vision (WACV)Human emotions can be inferred from facial expressions. However, the annotations of facial expressions are often highly noisy in common emotion coding models, including categorical and dimensional ones. To reduce human labelling effort on multi-task labels, we introduce a new problem of facial emotion recognition with noisy multi-task annotations. For this new problem, we suggest a formulation from the point of joint distribution match ...Conference Paper -
SRFlow: Learning the Super-Resolution Space with Normalizing Flow
(2020)Lecture Notes in Computer Science ~ Computer Vision – ECCV 2020 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VConference Paper -
GANmut: Learning Interpretable Conditional Space for Gamut of Emotions
(2021)2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Humans can communicate emotions through a plethora of facial expressions, each with its own intensity, nuances and ambiguities. The generation of such variety by means of conditional GANs is limited to the expressions encoded in the used label system. These limitations are caused either due to burdensome labelling demand or the confounded label space. On the other hand, learning from inexpensive and intuitive basic categorical emotion ...Conference Paper -
3D CNNs with Adaptive Temporal Feature Resolutions
(2021)2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)While state-of-the-art 3D Convolutional Neural Networks (CNN) achieve very good results on action recognition datasets, they are computationally very expensive and require many GFLOPs. While the GFLOPs of a 3D CNN can be decreased by reducing the temporal feature resolution within the network, there is no setting that is optimal for all input clips. In this work, we therefore introduce a differentiable Similarity Guided Sampling (SGS) ...Conference Paper -
The Heterogeneity Hypothesis: Finding Layer-Wise Differentiated Network Architectures
(2021)2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)In this paper, we tackle the problem of convolutional neural network design. Instead of focusing on the design of the overall architecture, we investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks. We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance. Based on that, we articulate the "heterogeneity hypothesis": ...Conference Paper -
Depth Estimation from Monocular Images and Sparse Radar Data
(2020)2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)In this paper, we explore the possibility of achieving a more accurate depth estimation by fusing monocular images and Radar points using a deep neural network. We give a comprehensive study of the fusion between RGB images and Radar measurements from different aspects and proposed a working solution based on the observations. We find that the noise existing in Radar measurements is one of the main key reasons that prevents one from ...Conference Paper -
Learning Accurate and Human-Like Driving using Semantic Maps and Attention
(2020)2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)This paper investigates how end-to-end driving models can be improved to drive more accurately and human-like. To tackle the first issue we exploit semantic and visual maps from HERE Technologies and augment the existing Drive360 dataset with such. The maps are used in an attention mechanism that promotes segmentation confidence masks, thus focusing the network on semantic classes in the image that are important for the current driving ...Conference Paper -
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
(2021)2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner. It is a compact, efficient and powerful framework that exploits structural information over different human granularities and eases the difficulty of person partitioning. Specifically, a dense-to-sparse projection ...Conference Paper -
Uncalibrated Neural Inverse Rendering for Photometric Stereo of General Surfaces
(2021)2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)This paper presents an uncalibrated deep neural network framework for the photometric stereo problem. For training models to solve the problem, existing neural network-based methods either require exact light directions or ground-truth surface normals of the object or both. However, in practice, it is challenging to procure both of this information precisely, which restricts the broader adoption of photometric stereo algorithms for vision ...Conference Paper -
VisDrone-MOT2021: The Vision Meets Drone Multiple Object Tracking Challenge Results
(2021)2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)Vision Meets Drone: Multiple Object Tracking (VisDrone-MOT2021) challenge - the forth annual activity organized by the VisDrone team - focuses on benchmarking UAV MOT algorithms in realistic challenging environments. It is held in conjunction with ICCV 2021. VisDrone-MOT2021 contains 96 video sequences in total, including 56 sequences (similar to 24K frames) for training, 7 sequences (similar to 3K frames) for validation and 33 sequences ...Conference Paper