research-article

Exploiting BERT for Multimodal Target Sentiment Classification through Input Space Translation

Authors:
Zaid Khan

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
Yun Fu

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 3034–3042https://doi.org/10.1145/3474085.3475692

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 3034–3042

ABSTRACT

Multimodal target/aspect sentiment classification combines multimodal sentiment analysis and aspect/target sentiment classification. The goal of the task is to combine vision and language to understand the sentiment towards a target entity in a sentence. Twitter is an ideal setting for the task because it is inherently multimodal, highly emotional, and affects real world events. However, multimodal tweets are short and accompanied by complex, possibly irrelevant images. We introduce a two-stream model that translates images in input space using an object-aware transformer followed by a single-pass non-autoregressive text generation approach. We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model. Our approach increases the amount of text available to the language model and distills the object-level information in complex images. We achieve state-of-the-art performance on two multimodal Twitter datasets without modifying the internals of the language model to accept multimodal data, demonstrating the effectiveness of our translation. In addition, we explain a failure mode of a popular approach for aspect sentiment analysis when applied to tweets. Our code is available at https://github.com/codezakh/exploiting-BERT-thru-translation.

References

AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Multimodal Language Analysis in the Wild: Carnegie Mellon University-MOSEI Dataset and Interpretable Dynamic Fusion Graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 2236--2246. https://doi.org/10.18653/v1/P18-1208Google Scholar
T. Baltruv saitis, C. Ahuja, and L. Morency. 2019. Multimodal Machine Learning: A Survey and Taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 2 (Feb. 2019), 423--443. https://doi.org/10.1109/TPAMI.2018.2798607 Google ScholarDigital Library
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008. IEMOCAP: Interactive Emotional Dyadic Motion Capture Database. Language Resources and Evaluation, Vol. 42, 4 (Dec. 2008), 335--359. https://doi.org/10.1007/s10579-008-9076-6Google ScholarCross Ref
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. arXiv:2005.12872 [cs] (May 2020). arxiv: 2005.12872 [cs]Google Scholar
Chaorui Deng, Ning Ding, Mingkui Tan, and Qi Wu. 2020. Length-Controllable Image Captioning. In Computer Vision textendash ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). Vol. 12358. Springer International Publishing, Cham, 712--729. https://doi.org/10.1007/978-3-030-58601-0_42Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre -Training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). arxiv: 1810.04805 [cs]Google Scholar
Feifan Fan, Yansong Feng, and Dongyan Zhao. 2018. Multi-Grained Attention Network for Aspect -Level Sentiment Classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3433--3442. https://doi.org/10.18653/v1/D18--1380Google ScholarCross Ref
Devamanyu Hazarika, Roger Zimmermann, and Soujanya Poria. 2020. MISA: Modality -Invariant and -Specific Representations for Multimodal Sentiment Analysis. arXiv:2005.03545 [cs] (Oct. 2020). arxiv: 2005.03545 [cs] Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs] (Dec. 2015). arxiv: 1512.03385 [cs]Google Scholar
Sarah J. Jackson, Moya Bailey, and Brooke Foucault Welles. 2020. #HashtagActivism: Networks of Race and Gender Justice. The MIT Press. https://doi.org/10.7551/mitpress/10858.001.0001Google Scholar
Ema Kuv sen and Mark Strembeck. 2018. Politics, Sentiments, and Misinformation: An Analysis of the Twitter Discussion on the 2016 Austrian Presidential Elections. Online Social Networks and Media, Vol. 5 (March 2018), 37--50. https://doi.org/10.1016/j.osnem.2017.12.002Google Scholar
H. Li, P. Wang, C. Shen, and A. v d Hengel. 2019 b. Visual Question Answering as Reading Comprehension. In 2019 IEEE /CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6312--6321. https://doi.org/10.1109/CVPR.2019.00648Google ScholarCross Ref
Xin Li, Lidong Bing, Wenxuan Zhang, and Wai Lam. 2019 a. Exploiting BERT for End -to-End Aspect -Based Sentiment Analysis. In Proceedings of the 5th Workshop on Noisy User -Generated Text (W -NUT 2019). Association for Computational Linguistics, Hong Kong, China, 34--41. https://doi.org/10.18653/v1/D19--5505Google ScholarCross Ref
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision textendash ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Vol. 8693. Springer International Publishing, Cham, 740--755. https://doi.org/10.1007/978-3-319-10602-1_48Google ScholarCross Ref
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:1907.11692 [cs] (July 2019). arxiv: 1907.11692 [cs]Google Scholar
Di Lu, Leonardo Neves, Vitor Carvalho, Ning Zhang, and Heng Ji. 2018. Visual Attention Model for Name Tagging in Multimodal Social Media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1990--1999. https://doi.org/10.18653/v1/P18--1185Google ScholarCross Ref
Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task -Agnostic Visiolinguistic Representations for Vision -and-Language Tasks. arXiv:1908.02265 [cs] (Aug. 2019). arxiv: 1908.02265 [cs]Google Scholar
Kevin Lu, Aditya Grover, Pieter Abbeel, and Igor Mordatch. 2021. Pretrained Transformers as Universal Computation Engines. arXiv:2103.05247 [cs] (March 2021). arxiv: 2103.05247 [cs]Google Scholar
Edison Marrese-Taylor, Jorge Balazs, and Yutaka Matsuo. 2017. Mining Fine-Grained Opinions on Closed Captions of YouTube Videos with an Attention-RNN. In Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, Copenhagen, Denmark, 102--111. https://doi.org/10.18653/v1/W17-5213Google ScholarCross Ref
Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, and Dinesh Manocha. 2019. M3ER: Multiplicative Multimodal Emotion Recognition Using Facial, Textual, and Speech Cues. arXiv:1911.05659 [cs, eess] (Nov. 2019). arxiv: 1911.05659 [cs, eess]Google Scholar
Aditya Mogadala, Marimuthu Kalimuthu, and Dietrich Klakow. 2020. Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods. arXiv:1907.09358 [cs] (Sept. 2020). arxiv: 1907.09358 [cs]Google Scholar
Rajdeep Mukherjee, Shreyas Shetty, Subrata Chattopadhyay, Subhadeep Maji, Samik Datta, and Pawan Goyal. 2021. Reproducibility, Replicability and Beyond: Assessing Production Readiness of Aspect Based Sentiment Analysis in the Wild. arXiv:2101.09449 [cs] (Jan. 2021). arxiv: 2101.09449 [cs]Google Scholar
Vinod Nair and Geoffrey E Hinton. [n.d.]. Rectified Linear Units Improve Restricted Boltzmann Machines. ([n.,d.]), 8. Google ScholarDigital Library
Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, and Colin Raffel. 2021. Do Transformer Modifications Transfer Across Implementations and Applications? arXiv:2102.11972 [cs] (Feb. 2021). arxiv: 2102.11972 [cs]Google Scholar
Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. 2020. BERTweet: A Pre-Trained Language Model for English Tweets. arXiv:2005.10200 [cs] (Oct. 2020). arxiv: 2005.10200 [cs]Google Scholar
Andrei Paleyes, Raoul-Gabriel Urma, and Neil D. Lawrence. 2021. Challenges in Deploying Machine Learning: A Survey of Case Studies. arXiv:2011.09926 [cs] (Jan. 2021). arxiv: 2011.09926 [cs]Google Scholar
Sunghyun Park, Han Suk Shim, Moitreya Chatterjee, Kenji Sagae, and Louis-Philippe Morency. 2014. Computational Analysis of Persuasiveness in Social Multimedia: A Novel Dataset and Multimodal Prediction Approach. In Proceedings of the 16th International Conference on Multimodal Interaction. ACM, Istanbul Turkey, 50--57. https://doi.org/10.1145/2663204.2663260 Google ScholarDigital Library
Hai Pham, Paul Pu Liang, Thomas Manzini, Louis-Philippe Morency, and Barnabás Póczos. 2019. Found in Translation: Learning Robust Joint Representations by Cyclic Translations between Modalities. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33 (July 2019), 6892--6899. https://doi.org/10.1609/aaai.v33i01.33016892Google ScholarDigital Library
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, and Rada Mihalcea. 2020. Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research. arXiv:2005.00357 [cs] (Nov. 2020). arxiv: 2005.00357 [cs]Google Scholar
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. [n.d.]. Language Models Are Unsupervised Multitask Learners. ([n.,d.]), 24.Google Scholar
Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. 2020. Integrating Multimodal Information in Large Pretrained Transformers. arXiv:1908.05787 [cs, stat] (Nov. 2020). arxiv: 1908.05787 [cs, stat]Google Scholar
Robert J. Shiller. 2017. Narrative Economics. American Economic Review, Vol. 107, 4 (April 2017), 967--1004. https://doi.org/10.1257/aer.107.4.967Google ScholarCross Ref
Chi Sun, Luyao Huang, and Xipeng Qiu. 2019. Utilizing BERT for Aspect -Based Sentiment Analysis via Constructing Auxiliary Sentence. arXiv:1903.09588 [cs] (March 2019). arxiv: 1903.09588 [cs]Google Scholar
Alasdair Tran, Alexander Mathews, and Lexing Xie. 2020. Transform and Tell: Entity -Aware News Image Captioning. arXiv:2004.08070 [cs] (June 2020). arxiv: 2004.08070 [cs]Google Scholar
Sahil Uppal. 2021. Saahiluppal/Catr.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. arXiv:1706.03762 [cs] (Dec. 2017). arxiv: 1706.03762 [cs]Google Scholar
Alakananda Vempala and Daniel Preoc tiuc-Pietro. [n.d.]. Categorizing and Inferring the Relationship between the Text and Image of Twitter Posts. ([n.,d.]), 11.Google Scholar
Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The Spread of True and False News Online. Science, Vol. 359, 6380 (March 2018), 1146--1151. https://doi.org/10.1126/science.aap9559Google ScholarCross Ref
Hu Xu, Bing Liu, Lei Shu, and Philip Yu. [n.d.]. BERT Post -Training for Review Reading Comprehension and Aspect -Based Sentiment Analysis. ([n.,d.]), 12.Google Scholar
Nan Xu, Wenji Mao, and Guandan Chen. 2019. Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33 (July 2019), 371--378. https://doi.org/10.1609/aaai.v33i01.3301371Google ScholarDigital Library
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2020. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv:1906.08237 [cs] (Jan. 2020). arxiv: 1906.08237 [cs]Google Scholar
Jianfei Yu and Jing Jiang. 2019. Adapting BERT for Target -Oriented Multimodal Sentiment Classification. In Proceedings of the Twenty -Eighth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, Macao, China, 5408--5414. https://doi.org/10.24963/ijcai.2019/751 Google ScholarCross Ref
Amir Zadeh, Rown Zellers, and Eli Pincus. [n.d.]. MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos. ([n.,d.]), 10.Google Scholar
Biqing Zeng, Heng Yang, Ruyang Xu, Wu Zhou, and Xuli Han. 2019. LCF: A Local Context Focus Mechanism for Aspect -Based Sentiment Classification. Applied Sciences, Vol. 9, 16 (Aug. 2019), 3389. https://doi.org/10.3390/app9163389Google ScholarCross Ref
Qi Zhang, Jinlan Fu, Xiaoyu Liu, and Xuanjing Huang. [n.d.]. Adaptive Co -Attention Network for Named Entity Recognition in Tweets. ([n.,d.]), 8.Google Scholar

Index Terms

Exploiting BERT for Multimodal Target Sentiment Classification through Input Space Translation
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Sentiment analysis

Recommendations

Identification of Conflict Opinion in Aspect-Based Sentiment Analysis using BERT-based Method
IC3INA '22: Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications

Aspect-based sentiment analysis (ABSA) is an NLP task for predicting sentiment polarities of specific aspects in a given opinion sentence. Recent research shows that deep learning and language modeling like BERT has become state-of-the-art in NLP tasks,...
Read More
Contextual semantics for sentiment analysis of Twitter

We propose a semantic sentiment representation of words called SentiCircle.SentiCircle captures the contextual semantic of words from their co-occurrences.SentiCircle updates the sentiment of words based on their contextual semantics.SentiCircle can be ...
Read More
Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Twitter is one of the biggest platforms where massive instant messages (i.e. tweets) are published every day. Users tend to express their real feelings freely in Twitter, which makes it an ideal source for capturing the opinions towards various ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bert
deep learning
sentiment analysis
twitter
vision-language
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 658
  Total Downloads
- Downloads (Last 12 months)180
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploiting BERT for Multimodal Target Sentiment Classification through Input Space Translation

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identification of Conflict Opinion in Aspect-Based Sentiment Analysis using BERT-based Method

Contextual semantics for sentiment analysis of Twitter

Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media