research-article

Narrative Dataset: Towards Goal-Driven Narrative Generation

Authors:
Karen Stephen

NEC Corporation, Tokyo, Japan

NEC Corporation, Tokyo, Japan
View Profile

,
Rishabh Sheoran

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore
View Profile

,
Satoshi Yamazaki

NEC Corporation, Tokyo, Japan

NEC Corporation, Tokyo, Japan
View Profile

NarSUM '22: Proceedings of the 1st Workshop on User-centric Narrative Summarization of Long VideosOctober 2022Pages 7–12https://doi.org/10.1145/3552463.3557021

Published:10 October 2022Publication History

NarSUM '22: Proceedings of the 1st Workshop on User-centric Narrative Summarization of Long Videos

Pages 7–12

ABSTRACT

In this paper, we propose a new dataset called the Narrative dataset, which is a work in progress, towards generating video and text narratives of complex daily events from long videos, captured from multiple cameras. As most of the existing datasets are collected from publicly available videos such as YouTube videos, there are no datasets targeted towards the task of narrative summarization of complex videos which contains multiple narratives. Hence, we create story plots and conduct video shooting with hired actors to create complex video sets where 3 to 4 narratives happen in each video. In the story plot, a narrative composes of multiple events corresponding to video clips of key human activities. On top of the shot video sets and the story plot, the narrative dataset contains dense annotation of actors, objects, and their relationships for each frame as the facts of narratives. Therefore, narrative dataset richly contains holistic and hierarchical structure of facts, events, and narratives. Moreover, Narrative Graph, a collection of scene graphs of narrative events with their causal relationships, is introduced for bridging the gap between the collection of facts and generation of the summary sentences of a narrative. Beyond related subtasks such as scene graph generation, narrative dataset potentially provide challenges of subtasks for bridging human event clips to narratives.

References

Chunhui Gu, Chen Sun, David A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, et al. 2018. Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6047--6056.Google ScholarCross Ref
Michael Gygli, Helmut Grabner, and Luc Van Gool. 2015. Video summarization by learning submodular mixtures of objectives. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3090--3098.Google ScholarCross Ref
Jingwei Ji, Ranjay Krishna, Li Fei-Fei, and Juan Carlos Niebles. 2020. Action genome: Actions as compositions of spatio-temporal scene graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10236--10247.Google ScholarCross Ref
Jonathan Krause, Justin Johnson, Ranjay Krishna, and Li Fei-Fei. 2017. A hierarchical approach for generating descriptive image paragraphs. In Proceedings of the IEEE conference on computer vision and pattern recognition. 317--325.Google ScholarCross Ref
Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, and Juan Carlos Niebles. 2017. Dense-captioning events in videos. In Proceedings of the IEEE international conference on computer vision. 706--715.Google ScholarCross Ref
Yong Jae Lee and Kristen Grauman. 2015. Predicting important objects for egocentric video summarization. International Journal of Computer Vision , Vol. 114, 1 (2015), 38--55.Google ScholarDigital Library
Junnan Li, Yongkang Wong, Qi Zhao, and Mohan S. Kankanhalli. 2020. Video Storytelling: Textual Summaries for Events. IEEE Trans. Multim. , Vol. 22, 2 (2020), 554--565.Google ScholarDigital Library
Yu Liu, Jianlong Fu, Tao Mei, and Chang Wen Chen. 2017. Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.Google ScholarCross Ref
Zelun Luo, Wanze Xie, Siddharth Kapoor, Yiyun Liang, Michael Cooper, Juan Carlos Niebles, Ehsan Adeli, and Fei-Fei Li. 2021. MOMA: Multi-Object Multi-Actor Activity Parsing. Advances in Neural Information Processing Systems , Vol. 34 (2021), 17939--17955.Google Scholar
Cesc C Park and Gunhee Kim. 2015. Expressing an image stream with a sequence of natural sentences. Advances in neural information processing systems , Vol. 28 (2015).Google Scholar
Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-specific video summarization. In European conference on computer vision. Springer, 540--555.Google ScholarCross Ref
Nishant Rai, Haofeng Chen, Jingwei Ji, Rishi Desai, Kazuki Kozuka, Shun Ishizaka, Ehsan Adeli, and Juan Carlos Niebles. 2021. Home action genome: Cooperative compositional action understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11184--11193.Google ScholarCross Ref
Aidean Sharghi, Jacob S Laurel, and Boqing Gong. 2017. Query-focused video summarization: Dataset, evaluation, and a memory network based approach. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4788--4797.Google ScholarCross Ref
Gunnar A Sigurdsson, Gül Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta. 2016. Hollywood in homes: Crowdsourcing data collection for activity understanding. In European Conference on Computer Vision. Springer, 510--526.Google ScholarCross Ref
Yale Song, Jordi Vallmitjana, Amanda Stent, and Alejandro Jaimes. 2015. Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5179--5187.Google Scholar
Yongkang Wong, Shaojing Fan, Yangyang Guo, Ziwei Xu, Karen Stephen, Rishabh Sheoran, Anusha Bhamidipati, Vivek Barsopia, Jianquan Liu, and Mohan Kankanhalli. 2022. Compute to Tell the Tale: Goal-Driven Narrative Generation. In ACM Multimedia (to be published).Google Scholar

Index Terms

Narrative Dataset: Towards Goal-Driven Narrative Generation
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
2. General and reference
  1. Document types
    1. Surveys and overviews

Recommendations

Narrative prose generation

Narrative generation has historically suffered from poor writing quality, stemming from a narrow focus on story grammars and plot design. Moreover, to-date natural language generation systems have not been capable of faithfully reproducing either the ...
Read More
Narrative Generation for Suspense: Modeling and Evaluation
Interactive Storytelling
Abstract
Although suspense contributes significantly to the enjoyment of a narrative by its readers, there has been little research on the automated generation of stories that evoke specific cognitive and affective responses in their readers. The goal of ...
Read More
Narrative prose generation
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
NarSUM '22: Proceedings of the 1st Workshop on User-centric Narrative Summarization of Long Videos
October 2022
36 pages
ISBN:9781450394932
DOI:10.1145/3552463
General Chair:
Mohan S. Kankanhalli
National University of Singapore, Singapore
,
Program Chairs:
Jianquan Liu
NEC Corporation, Japan
,
Yongkang Wong
National University of Singapore, Singapore
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
datasets
narrative generation
video summarization
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 120
  Total Downloads
- Downloads (Last 12 months)55
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.