research-article

Neural caption generation over figures

Authors:
Charles Chen

Ohio University

Ohio University
View Profile

,
Ruiyi Zhang

Duke University

Duke University
View Profile

,
Sungchul Kim

Adobe Research

Adobe Research
View Profile

,
Scott Cohen

Adobe Research

Adobe Research
View Profile

,
Tong Yu

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Ryan Rossi

Adobe Research

Adobe Research
View Profile

,
Razvan Bunescu

Ohio University

Ohio University
View Profile

UbiComp/ISWC '19 Adjunct: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable ComputersSeptember 2019Pages 482–485https://doi.org/10.1145/3341162.3345601

Published:09 September 2019Publication History

UbiComp/ISWC '19 Adjunct: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers

Pages 482–485

ABSTRACT

Figures are human-friendly but difficult for computers to process automatically. In this work, we investigate the problem of figure captioning. The goal is to automatically generate a natural language description of a given figure. We create a new dataset for figure captioning, FigCAP. To achieve accurate generation of labels in figures, we propose the Label Maps Attention Model. Extensive experiments show that our method outperforms the baselines. A successful solution to this task allows figure content to be accessible to those with visual impairment by providing input to a text-to-speech system; and enables automatic parsing of vast repositories of documents where figures are pervasive.

References

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 65--72.Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248--255.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780. Google ScholarDigital Library
Kushal Kafle, Brian Price, Scott Cohen, and Christopher Kanan. 2018. DVQA: Understanding Data Visualizations via Question Answering. In CVPR. 5648--5656.Google Scholar
Samira Ebrahimi Kahou, Adam Atkinson, Vincent Michalski, Ákos Kádár, Adam Trischler, and Yoshua Bengio. 2017. Figureqa: An annotated figure dataset for visual reasoning. arXiv preprint arXiv:1710.07300 (2017).Google Scholar
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. 3128--3137.Google Scholar
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out (2004).Google Scholar
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. Springer, 740--755.Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In ACL. Association for Computational Linguistics, 311--318. Google ScholarDigital Library
Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2016. Self-critical sequence training for image captioning. In CVPR.Google Scholar
Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing result-figures in research papers. In ECCV. Springer, 664--680.Google Scholar
Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015. Cider: Consensus-based image description evaluation. In CVPR. 4566--4575.Google Scholar
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In ICML. 2048--2057. Google ScholarDigital Library

Index Terms

Neural caption generation over figures
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
    2. Natural language processing
      1. Natural language generation
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Attention based sequence-to-sequence framework for auto image caption generation

Caption generation using an encoder-decoder approach has recently been extensively studied and implemented in various domains, including image captioning and code captioning. In this research article, we propose one particular approach for completing a ...
Read More
Summarization-based Video Caption via Deep Neural Networks
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Generating appropriate descriptions for visual content draws increasing attention recently, where the promising progresses were obtained owing to the breakthroughs in deep neural networks. Different from the traditional SVO (subject, verb, object) based ...
Read More
Learning Multimodal Attention LSTM Networks for Video Captioning
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Automatic generation of video caption is a challenging task as video is an information-intensive media with complex variations. Most existing methods, either based on language templates or sequence learning, have treated video as a flat data sequence ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
UbiComp/ISWC '19 Adjunct: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers
September 2019
1234 pages
ISBN:9781450368698
DOI:10.1145/3341162
General Chairs:
Robert Harle
Cambridge University
,
Katayoun Farrahi
University Of Southampton
,
Nicholas Lane
University Of Oxford And Samsung Ai
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 September 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CNN
LSTM
figure captioning
neural networks
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate764of2,912submissions,26%
Upcoming Conference
UBICOMP '24

Sponsor:

sigchi

sigchi

UBICOMP '24: The 2024 ACM International Joint Conference on Pervasive and Ubiquitous Computing

October 5 - 9, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 355
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Neural caption generation over figures

UbiComp/ISWC '19 Adjunct: Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers

ABSTRACT

References

Cited By

Index Terms

Recommendations

Attention based sequence-to-sequence framework for auto image caption generation

Summarization-based Video Caption via Deep Neural Networks

Learning Multimodal Attention LSTM Networks for Video Captioning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media