Suche - Research Collection

Ergebnisse

Anzeige der Einträge 1-1 von 1

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation

Xing, Yiran; Shi, Zai; Meng, Zhao; et al. (2021)

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing

We present Knowledge Enhanced Multimodal BART (KM-BART), which is a Transformer-based sequence-to-sequence model capable of reasoning about commonsense knowledge from multimodal inputs of images and texts. We adapt the generative BART architecture (Lewis et al., 2020) to a multimodal model with visual and textual inputs. We further develop novel pretraining tasks to improve the model performance on the Visual Commonsense Generation (VCG) ...

Conference Paper