Journal: Computational Visual Media

Loading...

Abbreviation

Comp. Visual Media

Publisher

Springer

Journal Volumes

ISSN

2096-0662
2096-0433

Description

Search Results

Publications1 - 5 of 5
  • Lin, Zheng; Zhang, Zhao; Zhu, Zi-Yue; et al. (2023)
    Computational Visual Media
    Interactive image segmentation (IIS) is an important technique for obtaining pixel-level annotations. In many cases, target objects share similar semantics. However, IIS methods neglect this connection and in particular the cues provided by representations of previously segmented objects, previous user interaction, and previous prediction masks, which can all provide suitable priors for the current annotation. In this paper, we formulate a sequential interactive image segmentation (SIIS) task for minimizing user interaction when segmenting sequences of related images, and we provide a practical approach to this task using two pertinent designs. The first is a novel interaction mode. When annotating a new sample, our method can automatically propose an initial click proposal based on previous annotation. This dramatically helps to reduce the interaction burden on the user. The second is an online optimization strategy, with the goal of providing semantic information when annotating specific targets, optimizing the model with dense supervision from previously labeled samples. Experiments demonstrate the effectiveness of regarding SIIS as a particular task, and our methods for addressing it.
  • Ji, Ge-Peng; Fan, Dengping; Fu, Keren; et al. (2023)
    Computational Visual Media
    Previous video object segmentation approaches mainly focus on simplex solutions linking appearance and motion, limiting effective feature collaboration between these two cues. In this work, we study a novel and efficient full-duplex strategy network (FSNet) to address this issue, by considering a better mutual restraint scheme linking motion and appearance allowing exploitation of cross-modal features from the fusion and decoding stage. Specifically, we introduce a relational cross-attention module (RCAM) to achieve bidirectional message propagation across embedding sub-spaces. To improve the model's robustness and update inconsistent features from the spatiotemporal embeddings, we adopt a bidirectional purification module after the RCAM. Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios (e.g., motion blur and occlusion), and compares well to leading methods both for video object segmentation and video salient object detection. The project is publicly available at https://github.com/GewelsJI/FSNet.
  • Wang, Wenhai; Xie, Enze; Li, Xiang; et al. (2022)
    Computational Visual Media
    Transformers have recently lead to encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an overlapping patch embedding, and (iii) a convolutional feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linearity and provides significant improvements on fundamental vision tasks such as classification, detection, and segmentation. In particular, PVT v2 achieves comparable or better performance than recent work such as the Swin transformer. We hope this work will facilitate state-of-the-art transformer research in computer vision. Code is available at https://github.com/whai362/PVT.
  • Zhou, Tao; Fan, Deng-Ping; Chen, Geng; et al. (2023)
    Computational Visual Media
    Salient object detection (SOD) in RGB and depth images has attracted increasing research interest. Existing RGB-D SOD models usually adopt fusion strategies to learn a shared representation from RGB and depth modalities, while few methods explicitly consider how to preserve modality-specific characteristics. In this study, we propose a novel framework, the specificity-preserving network (SPNet), which improves SOD performance by exploring both the shared information and modality-specific properties. Specifically, we use two modality-specific networks and a shared learning network to generate individual and shared saliency prediction maps. To effectively fuse cross-modal features in the shared learning network, we propose a cross-enhanced integration module (CIM) and propagate the fused feature to the next layer to integrate cross-level information. Moreover, to capture rich complementary multi-modal information to boost SOD performance, we use a multi-modal feature aggregation (MFA) module to integrate the modality-specific features from each individual decoder into the shared decoder. By using skip connections between encoder and decoder layers, hierarchical features can be fully combined. Extensive experiments demonstrate that our SPNet outperforms cutting-edge approaches on six popular RGB-D SOD and three camouflaged object detection benchmarks. The project is publicly available at https://github.com/taozh2017/SPNet.
  • Zhang, Qian; Laffont, Pierre-Yves; Sim, Terence (2017)
    Computational Visual Media
    We present a method for transferring lighting between photographs of a static scene. Our method takes as input a photo collection depicting a scene with varying viewpoints and lighting conditions. We cast lighting transfer as an edit propagation problem, where the transfer of local illumination across images is guided by sparse correspondences obtained through multi-view stereo. Instead of directly propagating color, we learn local color transforms from corresponding patches in pairs of images and propagate these transforms in an edge-aware manner to regions with no correspondences. Our color transforms model the large variability of appearance changes in local regions of the scene, and are robust to missing or inaccurate correspondences. The method is fully automatic and can transfer strong shadows between images. We show applications of our image relighting method for enhancing photographs, browsing photo collections with harmonized lighting, and generating synthetic time-lapse sequences.
Publications1 - 5 of 5