Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence
Loading...
Abbreviation
IEEE Trans. Pattern Anal. Mach. Intell.
Publisher
IEEE
162 results
Search Results
Publications1 - 10 of 162
- What Is Optimized in Convex Relaxations for Multilabel ProblemsItem type: Journal Article
IEEE Transactions on Pattern Analysis and Machine IntelligenceZach, Christopher; Häne, Christian; Pollefeys, Marc (2014) - GCoNet plus : A Stronger Group Collaborative Co-Salient Object DetectorItem type: Journal Article
IEEE Transactions on Pattern Analysis and Machine IntelligenceZheng, Peng; Fu, Huazhu; Fan, Deng-Ping; et al. (2023)In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes. The proposed GCoNet+ achieves the new state-of-the-art performance for co-salient object detection (CoSOD) through mining consensus representations based on the following two essential criteria: 1) intra-group compactness to better formulate the consistency among co-salient objects by capturing their inherent shared attributes using our novel group affinity module (GAM); 2) inter-group separability to effectively suppress the influence of noisy objects on the output by introducing our new group collaborating module (GCM) conditioning on the inconsistent consensus. To further improve the accuracy, we design a series of simple yet effective components as follows: i) a recurrent auxiliary classification module (RACM) promoting model learning at the semantic level; ii) a confidence enhancement module (CEM) assisting the model in improving the quality of the final predictions; and iii) a group-based symmetric triplet (GST) loss guiding the model to learn more discriminative features. Extensive experiments on three challenging benchmarks, i.e., CoCA, CoSOD3k, and CoSal2015, demonstrate that our GCoNet+ outperforms the existing 12 cutting-edge models. Code has been released at https://github.com/ZhengPeng7/GCoNet_plus. - Semantic Hierarchy-Aware SegmentationItem type: Journal Article
IEEE Transactions on Pattern Analysis and Machine IntelligenceLi, Liulei; Wang, Wenguan; Zhou, Tianfei; et al. (2024)Humans are able to recognize structured relations in observation, allowing us to decompose complex scenes into simpler parts and abstract the visual world at multiple levels. However, such hierarchical reasoning ability of human perception remains largely unexplored in current literature of semantic segmentation. Existing works are often aware of flatten labels and distinguish all the semantic categories exclusively for each pixel. In this work, we instead address hierarchical semantic segmentation (HSS), with the aim of providing a structured, pixel-wise description of visual observation in terms of a class hierarchy. We devise Hssn, a general HSS framework that tackles two critical issues in this task: i) how to efficiently adapt existing hierarchy-agnostic segmentation networks to the HSS setting, and ii) how to leverage the class hierarchy to regularize HSS network learning. To address i), Hssn directly casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models. To solve ii), Hssn first explores inherent properties of the hierarchy as a training objective, which enforces segmentation predictions to obey the hierarchy structure. Furthermore, with a set of hierarchy-induced margin constraints, Hssn efficiently reshapes the learned pixel embedding space, so as to generate hierarchy-aware pixel representations and facilitate structured segmentation eventually. Building upon Hssn, we further exploit the mutual exclusion relation between semantic labels and strengthen the margin based regularization strategy with more meaningful constrains, leading to Hssn+, a more effective framework for HSS. We conduct extensive experiments on six semantic segmentation datasets (i.e., Mapillary Vistas 2.0, Cityscapes, LIP, PASCAL-Person-Part, PASCAL-Part-58, and PASCAL-Part-108), with different class hierarchies, network architectures, and backbones, and the results confirm the generalization and superiority of our algorithms. - Multi-Channel Attention Selection GANs for Guided Image-to-Image TranslationItem type: Journal Article
IEEE Transactions on Pattern Analysis and Machine IntelligenceTang, Hao; Torr, Philip H.S.; Sebe, Nicu (2023)We propose a novel model named Multi-Channel Attention Selection Generative Adversarial Network (SelectionGAN) for guided image-to-image translation, where we translate an input image into another while respecting an external semantic guidance. The proposed SelectionGAN explicitly utilizes the semantic guidance information and consists of two stages. In the first stage, the input image and the conditional semantic guidance are fed into a cycled semantic-guided generation network to produce initial coarse results. In the second stage, we refine the initial results by using the proposed multi-scale spatial pooling & channel selection module and the multi-channel attention selection module. Moreover, uncertainty maps automatically learned from attention maps are used to guide the pixel loss for better network optimization. Exhaustive experiments on four challenging guided image-to-image translation tasks (face, hand, body, and street view) demonstrate that our SelectionGAN is able to generate significantly better results than the state-of-the-art methods. Meanwhile, the proposed framework and modules are unified solutions and can be applied to solve other generation tasks such as semantic image synthesis. The code is available at https://github.com/Ha0Tang/SelectionGAN. - Gating Revisited: Deep Multi-layer RNNs That Can Be TrainedItem type: Journal Article
IEEE Transactions on Pattern Analysis and Machine IntelligenceTürkoglu, Mehmet Ö.; D'Aronco, Stefano; Wegner, Jan Dirk; et al. (2022)We propose a new stackable recurrent cell (STAR) for recurrent neural networks (RNNs) that has significantly less parameters than widely used LSTM and GRU while being more robust against vanishing or exploding gradients. Stacking multiple layers of recurrent units has two major drawbacks: i) many recurrent cells (e.g., LSTM cells) are extremely eager in terms of parameters and computation resources, ii) deep RNNs are prone to vanishing or exploding gradients during training. We investigate the training of multi-layer RNNs and examine the magnitude of the gradients as they propagate through the network in the "vertical" direction. We show that, depending on the structure of the basic recurrent unit, the gradients are systematically attenuated or amplified. Based on our analysis we design a new type of gated cell that better preserves gradient magnitude. We validate our design on a large number of sequence modelling tasks and demonstrate that the proposed STAR cell allows to build and train deeper recurrent architectures, ultimately leading to improved performance while being computationally efficient. - Globally Optimal Hand-Eye Calibration using Branch-and-BoundItem type: Journal Article
IEEE Transactions on Pattern Analysis and Machine IntelligenceHeller, Jan; Havlena, Michal; Pajdla, Tomas (2016) - Eye Movement Analysis for Activity Recognition Using ElectrooculographyItem type: Journal Article
IEEE Transactions on Pattern Analysis and Machine IntelligenceBulling, Andreas; Ward, Jamie A.; Gellersen, Hans; et al. (2011) - Monocular Visual Scene UnderstandingItem type: Journal Article
IEEE Transactions on Pattern Analysis and Machine IntelligenceWojek, C.; Walk, S.; Roth, S.; et al. (2013) - Towards Lightweight Super-Resolution With Dual Regression LearningItem type: Journal Article
IEEE Transactions on Pattern Analysis and Machine IntelligenceGuo, Yong; Tan, Mingkui; Deng, Zeshuai; et al. (2024)Deep neural networks have exhibited remarkable performance in image super-resolution (SR) tasks by learning a mapping from low-resolution (LR) images to high-resolution (HR) images. However, the SR problem is typically an ill-posed problem and existing methods would come with several limitations. First, the possible mapping space of SR can be extremely large since there may exist many different HR images that can be super-resolved from the same LR image. As a result, it is hard to directly learn a promising SR mapping from such a large space. Second, it is often inevitable to develop very large models with extremely high computational cost to yield promising SR performance. In practice, one can use model compression techniques to obtain compact models by reducing model redundancy. Nevertheless, it is hard for existing model compression methods to accurately identify the redundant components due to the extremely large SR mapping space. To alleviate the first challenge, we propose a dual regression learning scheme to reduce the space of possible SR mappings. Specifically, in addition to the mapping from LR to HR images, we learn an additional dual regression mapping to estimate the downsampling kernel and reconstruct LR images. In this way, the dual mapping acts as a constraint to reduce the space of possible mappings. To address the second challenge, we propose a dual regression compression (DRC) method to reduce model redundancy in both layer-level and channel-level based on channel pruning. Specifically, we first develop a channel number search method that minimizes the dual regression loss to determine the redundancy of each layer. Given the searched channel numbers, we further exploit the dual regression manner to evaluate the importance of channels and prune the redundant ones. Extensive experiments show the effectiveness of our method in obtaining accurate and efficient SR models. - Learning-Based Multi-View Stereo: A SurveyItem type: Journal Article
IEEE Transactions on Pattern Analysis and Machine IntelligenceWang, Fangjinhua; Zhu, Qingtian; Chang, Di; et al. (2026)3D reconstruction aims to recover the dense 3D structure of a scene. It plays an essential role in various applications such as Augmented/Virtual Reality (AR/VR), autonomous driving and robotics. Leveraging multiple views of a scene captured from different viewpoints, Multi-View Stereo (MVS) algorithms synthesize a comprehensive 3D representation, enabling precise reconstruction in complex environments. Due to its efficiency and effectiveness, MVS has become a pivotal method for image-based 3D reconstruction. Recently, with the success of deep learning, many learning-based MVS methods have been proposed, achieving impressive performance against traditional methods. We categorize these learning-based methods as: depth map-based, voxel-based, NeRF-based, 3D Gaussian Splatting-based, and large feed-forward methods. Among these, we focus significantly on depth map-based methods, which are the main family of MVS due to their conciseness, flexibility and scalability. In this survey, we provide a comprehensive review of the literature at the time of this writing. We investigate these learning-based methods, summarize their performances on popular benchmarks, and discuss promising future research directions in this area.
Publications1 - 10 of 162