error
Kurzer Serviceunterbruch am Donnerstag, 15. Januar 2026, 12 bis 13 Uhr. Sie können in diesem Zeitraum keine neuen Dokumente hochladen oder bestehende Einträge bearbeiten. Das Login wird in diesem Zeitraum deaktiviert. Grund: Wartungsarbeiten // Short service interruption on Thursday, January 15, 2026, 12.00 – 13.00. During this time, you won’t be able to upload new documents or edit existing records. The login will be deactivated during this time. Reason: maintenance work
 

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation


Loading...

Date

2021-08

Publication Type

Conference Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

We present Knowledge Enhanced Multimodal BART (KM-BART), which is a Transformer-based sequence-to-sequence model capable of reasoning about commonsense knowledge from multimodal inputs of images and texts. We adapt the generative BART architecture (Lewis et al., 2020) to a multimodal model with visual and textual inputs. We further develop novel pretraining tasks to improve the model performance on the Visual Commonsense Generation (VCG) task. In particular, our pretraining task of Knowledge-based Commonsense Generation (KCG) boosts model performance on the VCG task by leveraging commonsense knowledge from a large language model pretrained on external commonsense knowledge graphs. To the best of our knowledge, we are the first to propose a dedicated task for improving model performance on the VCG task. Experimental results show that our model reaches state-of-the-art performance on the VCG task (Park et al., 2020) by applying these novel pretraining tasks.

Publication status

published

Book title

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing

Journal / series

Volume

1

Pages / Article No.

525 - 535

Publisher

Association for Computational Linguistics

Event

59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03604 - Wattenhofer, Roger / Wattenhofer, Roger check_circle

Notes

Funding

Related publications and datasets