Zur Kurzanzeige

dc.contributor.author
Rimle, Philipp
dc.contributor.author
Dogan-Schönberger, Pelin
dc.contributor.author
Gross, Markus
dc.date.accessioned
2021-08-30T11:56:29Z
dc.date.available
2021-08-26T10:38:16Z
dc.date.available
2021-08-30T11:56:29Z
dc.date.issued
2021
dc.identifier.isbn
978-1-7281-8808-9
en_US
dc.identifier.isbn
978-1-7281-8809-6
en_US
dc.identifier.other
10.1109/ICPR48806.2021.9412008
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/502284
dc.description.abstract
Understanding video content and generating caption with context is an important and challenging task. Unlike prior methods that typically attempt to generate generic video captions without context, our architecture contextualizes captioning by infusing extracted information from relevant text data. We propose an end-to-end sequence-to-sequence model which generates video captions based on visual input, and mines relevant knowledge such as names and locations from contextual text. In contrast to previous approaches, we do not preprocess the text further, and let the model directly learn to attend over it. Guided by the visual input, the model is able to copy words from the contextual text via a pointer-generator network, allowing to produce more specific video captions. We show competitive performance on the News Video Dataset and, through ablation studies, validate the efficacy of contextual video captioning as well as individual design choices in our model architecture.
en_US
dc.language.iso
en
en_US
dc.publisher
IEEE
en_US
dc.title
Enriching Video Captions With Contextual Text
en_US
dc.type
Conference Paper
dc.date.published
2021-05-05
ethz.book.title
2020 25th International Conference on Pattern Recognition (ICPR)
en_US
ethz.pages.start
5474
en_US
ethz.pages.end
5481
en_US
ethz.event
25th International Conference on Pattern Recognition (ICPR 2020) (virtual)
en_US
ethz.event.location
Milan, Italy
en_US
ethz.event.date
January 10-15, 2021
en_US
ethz.notes
Due to the Coronavirus (COVID-19) the conference was conducted virtually.
en_US
ethz.identifier.wos
ethz.publication.place
Piscataway, NJ
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02154 - Media Technology Center (MTC) / Media Technology Center (MTC)
ethz.date.deposited
2021-08-26T10:38:34Z
ethz.source
WOS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2021-08-30T11:56:37Z
ethz.rosetta.lastUpdated
2021-08-30T11:56:37Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Enriching%20Video%20Captions%20With%20Contextual%20Text&rft.date=2021&rft.spage=5474&rft.epage=5481&rft.au=Rimle,%20Philipp&Dogan-Sch%C3%B6nberger,%20Pelin&Gross,%20Markus&rft.isbn=978-1-7281-8808-9&978-1-7281-8809-6&rft.genre=proceeding&rft_id=info:doi/10.1109/ICPR48806.2021.9412008&rft.btitle=2020%2025th%20International%20Conference%20on%20Pattern%20Recognition%20(ICPR)
 Printexemplar via ETH-Bibliothek suchen

Dateien zu diesem Eintrag

DateienGrößeFormatIm Viewer öffnen

Zu diesem Eintrag gibt es keine Dateien.

Publikationstyp

Zur Kurzanzeige