Journal: International Journal of Computer Vision

Loading...

Abbreviation

Int J Comput Vis

Publisher

Springer

Journal Volumes

ISSN

0920-5691
1573-1405

Description

Search Results

Publications1 - 10 of 96
  • Zia, M. Zeeshan; Stark, Michael; Schindler, Konrad (2015)
    International Journal of Computer Vision
  • Xu, Shuang; Zhao, Zixiang; Cao, Xiangyong; et al. (2025)
    International Journal of Computer Vision
    Factorization models and nuclear norms, two prominent methods for characterizing the low-rank prior, encounter challenges in accurately retrieving low-rank data under severe degradation and lack generalization capabilities. To mitigate these limitations, we propose a Parameterized Low-Rank Regularizer (PLRR), which models low-rank visual data through matrix factorization by utilizing neural networks to parameterize the factor matrices, whose feasible domains are essentially constrained. This approach can be interpreted as imposing an automatically learned penalty on factor matrices. More significantly, the knowledge encoded in network parameters enhances generalization. As a versatile low-rank modeling tool, PLRR exhibits superior performance in various inverse problems, including video foreground extraction, hyperspectral image (HSI) denoising, HSI inpainting, multi-temporal multispectral image (MSI) decloud, and MSI guided blind HSI super-resolution. More significantly, PLRR demonstrates robust generalization capabilities for images with diverse degradations, temporal variations, and scene contexts.
  • Sakaridis, Christos; Dai, Dengxin; Van Gool, Luc (2018)
    International Journal of Computer Vision
    This work addresses the problem of semantic foggy scene understanding (SFSU). Although extensive research has been performed on image dehazing and on semantic scene understanding with clear-weather images, little attention has been paid to SFSU. Due to the difficulty of collecting and annotating foggy images, we choose to generate synthetic fog on real images that depict clear-weather outdoor scenes, and then leverage these partially synthetic data for SFSU by employing state-of-theart convolutional neural networks (CNN). In particular, a complete pipeline to add synthetic fog to real, clear-weather images using incomplete depth information is developed. We apply our fog synthesis on the Cityscapes dataset and generate Foggy Cityscapes with 20,550 images. SFSU is tackled in two ways: (1) with typical supervised learning, and (2) with a novel type of semi-supervised learning, which combines (1) with an unsupervised supervision transfer from clear-weather images to their synthetic foggy counterparts. In addition, we carefully study the usefulness of image dehazing for SFSU. For evaluation, we present Foggy Driving, a dataset with 101 real-world images depicting foggy driving scenes, which come with ground truth annotations for semantic segmentation and object detection. Extensive experiments show that (1) supervised learning with our synthetic data significantly improves the performance of state-of-the-art CNN for SFSU on Foggy Driving; (2) our semi-supervised learning strategy further improves performance; and (3) image dehazing marginally advances SFSU with our learning strategy. The datasets, models and code are made publicly available.
  • Pritts, James; Kukelova, Zuzana; Larsson, Viktor; et al. (2020)
    International Journal of Computer Vision
  • Fast PRISM
    Item type: Journal Article
    Lehmann, Alain; Leibe, Bastian; Van Gool, Luc (2011)
    International Journal of Computer Vision
  • Wang, Limin; Wang, Zhe; Qiao, Yu; et al. (2018)
    International Journal of Computer Vision
    This paper addresses the problem of image-based event recognition by transferring deep representations learned from object and scene datasets. First we empirically investigate the correlation of the concepts of object, scene, and event, thus motivating our representation transfer methods. Based on this empirical study, we propose an iterative selection method to identify a subset of object and scene classes deemed most relevant for representation transfer. Afterwards, we develop three transfer techniques: (1) initialization-based transfer, (2) knowledge-based transfer, and (3) data-based transfer. These newly designed transfer techniques exploit multitask learning frameworks to incorporate extra knowledge from other networks or additional datasets into the fine-tuning procedure of event CNNs. These multitask learning frameworks turn out to be effective in reducing the effect of over-fitting and improving the generalization ability of the learned CNNs. We perform experiments on four event recognition benchmarks: the ChaLearn LAP Cultural Event Recognition dataset, the Web Image Dataset for Event Recognition, the UIUC Sports Event dataset, and the Photo Event Collection dataset. The experimental results show that our proposed algorithm successfully transfers object and scene representations towards the event dataset and achieves the current state-of-the-art performance on all considered datasets.
  • Naeem, Muhammad Ferjad; Xian, Yongqin; Van Gool, Luc; et al. (2024)
    International Journal of Computer Vision
    Despite the tremendous progress in zero-shot learning (ZSL), the majority of existing methods still rely on human-annotated attributes, which are difficult to annotate and scale. An unsupervised alternative is to represent each class using the word embedding associated with its semantic class name. However, word embeddings extracted from pre-trained language models do not necessarily capture visual similarities, resulting in poor zero-shot performance. In this work, we argue that online textual documents, e.g., Wikipedia, contain rich visual descriptions about object classes, therefore can be used as powerful unsupervised side information for ZSL. To this end, we propose I2DFormer+, a novel transformer-based ZSL framework that jointly learn to encode images and documents by aligning both modalities in a shared embedding space. I2DFormer+ utilizes our novel Document Summary Transformer (DSTransformer), a text transformer, that learns to encode a sequence of text into a fixed set of summary tokens. These summary tokens are utilized by a cross-model attention module that learns finegrained interactions between image patches and the summary of the document. Consequently, our I2DFormer+ not only learns highly discriminative document embeddings that capture visual similarities but also gains the ability to explain what regions of the image are important for the decision. Quantitatively, we demonstrate that I2DFormer+ significantly outperforms previous unsupervised semantic embeddings under both zero-shot and generalized zero-shot learning settings on three public datasets. Qualitatively, we show that our methods lead to highly interpretable results. Furthermore, we scale our model to the large scale zero-shot learning setting and show state-of-the-art performance on two challenging ImageNet benchmarks.
  • Baatz, Georges; Köser, Kevin; Chen, David; et al. (2012)
    International Journal of Computer Vision
  • Radial multi-focal tensors
    Item type: Journal Article
    Thirthala, SriRam; Pollefeys, Marc (2012)
    International Journal of Computer Vision
  • Scaramuzza, Davide (2011)
    International Journal of Computer Vision
Publications1 - 10 of 96