research-article

A Unified Framework for Multimodal Domain Adaptation

Authors:
Fan Qi

Hefei University of Technology, HeFei, China

Hefei University of Technology, HeFei, China
View Profile

,
Xiaoshan Yang

Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences & University of Chinese Academy of Sciences, Beijing, China
View Profile

,
Changsheng Xu

Hefei University of Technology, Chinese Academy of Sciences, & University of Chinese Academy of Sciences, Beijing, China

Hefei University of Technology, Chinese Academy of Sciences, & University of Chinese Academy of Sciences, Beijing, China
View Profile

MM '18: Proceedings of the 26th ACM international conference on MultimediaOctober 2018Pages 429–437https://doi.org/10.1145/3240508.3240633

Published:15 October 2018Publication History

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 429–437

ABSTRACT

Domain adaptation aims to train a model on labeled data from a source domain while minimizing test error on a target domain. Most of existing domain adaptation methods only focus on reducing domain shift of single-modal data. In this paper, we consider a new problem of multimodal domain adaptation and propose a unified framework to solve it. The proposed multimodal domain adaptation neural networks(MDANN) consist of three important modules. (1) A covariant multimodal attention is designed to learn a common feature representation for multiple modalities. (2) A fusion module adaptively fuses attended features of different modalities. (3) Hybrid domain constraints are proposed to comprehensively learn domain-invariant features by constraining single modal features, fused features, and attention scores. Through jointly attending and fusing under an adversarial objective, the most discriminative and domain-adaptive parts of the features are adaptively fused together. Extensive experimental results on two real-world cross-domain applications (emotion recognition and cross-media retrieval) demonstrate the effectiveness of the proposed method.

References

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In ICCV. 2425--2433. Google ScholarDigital Library
Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. SoundNet: Learning Sound Representations from Unlabeled Video. In NIPS. Google ScholarDigital Library
Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2017. Multimodal Machine Learning: A Survey and Taxonomy. CoRR (2017).Google Scholar
Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2006. Analysis of Representations for Domain Adaptation. In NIPS. 137--144. Google ScholarDigital Library
Hedi Ben-younes, Rémi Cadène, Matthieu Cord, and Nicolas Thome. 2017. MUTAN: Multimodal Tucker Fusion for Visual Question Answering. In ICCV. 2631-- 2639.Google Scholar
John Blitzer, Ryan T. McDonald, and Fernando Pereira. 2006. Domain Adaptation with Structural Correspondence Learning. In EMNLP. 120--128. Google ScholarDigital Library
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth Narayanan. 2008. IEMOCAP: interactive emotional dyadic motion capture database. LRE (2008).Google Scholar
Minmin Chen, Zhixiang Eddie Xu, Kilian Q. Weinberger, and Fei Sha. 2012. Marginalized Denoising Autoencoders for Domain Adaptation. In ICML. Google ScholarDigital Library
Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang,Wan Ting Hsu, Jianlong Fu, and Min Sun. 2017. Show, Adapt and Tell: Adversarial Training of Cross- Domain Image Captioner. In ICCV.Google Scholar
Wen-Sheng Chu, Fernando De la Torre, and Jeffrey F. Cohn. 2013. Selective Transfer Machine for Personalized Facial Action Unit Detection. In CVPR. Google ScholarDigital Library
Abhinav Dhall, Roland Goecke, Jyoti Joshi, Michael Wagner, and Tom Gedeon. 2013. Emotion recognition in the wild challenge (EmotiW) challenge and workshop summary. In ICMI. Google ScholarDigital Library
Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML. Google ScholarDigital Library
Julien Epps, Fang Chen, Sharon Oviatt, Kenji Mase, Andrew Sears, Kristiina Jokinen, and Björn W. Schuller (Eds.). 2013. ICMI.Google Scholar
Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. In EMNLP.Google Scholar
Yaroslav Ganin and Victor S. Lempitsky. 2015. Unsupervised Domain Adaptation by Backpropagation. In ICML. Google ScholarDigital Library
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor S. Lempitsky. 2016. Domain-Adversarial Training of Neural Networks. JMLR (2016). Google ScholarDigital Library
Timnit Gebru, Judy Hoffman, and Li Fei-Fei. 2017. Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach. In ICCV.Google Scholar
Muhammad Ghifary,W. Bastiaan Kleijn, Mengjie Zhang, and David Balduzzi. 2015. Domain Generalization for Object Recognition with Multi-task Autoencoders. In ICCV. Google ScholarDigital Library
Boqing Gong, Kristen Grauman, and Fei Sha. 2013. Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation. In ICML. Google ScholarDigital Library
Matthieu Guillaumin, Jakob J. Verbeek, and Cordelia Schmid. 2010. Multimodal semi-supervised learning for image classification. In CVPR.Google Scholar
David R. Hardoon, Sandor R. Szedmak, and John R. Shawe-taylor. 2004. Canonical Correlation Analysis: AnOverviewwith Application to Learning Methods. Neural Comput. (2004). Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 (2015).Google Scholar
Jiayuan Huang, Alexander J. Smola, Arthur Gretton, Karsten M. Borgwardt, and Bernhard Schölkopf. 2006. Correcting Sample Selection Bias by Unlabeled Data. In NIPS. Google ScholarDigital Library
Jiayuan Huang, Alexander J. Smola, Arthur Gretton, Karsten M. Borgwardt, and Bernhard Schölkopf. 2007. Correcting Sample Selection Bias by Unlabeled Data. In NIPS. Google ScholarDigital Library
Xun Huang and Serge J. Belongie. 2017. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. In ICCV.Google Scholar
Gideon J., Khorram S., Aldeneh Z., Dimitriadis D., and Provost E. 2017. Progressive Neural Networks for Transfer Learning in Emotion Recognition. In Interspeech.Google Scholar
Bousmalis K., Trigeorgis G., Silberman N., Krishnan D., and Erhan D. 2016. Domain Separation Networks. In NIPS. Google ScholarDigital Library
Bousmalis K., Silberman N., Dohan D., Erhan D., and Krishnan D. 2017. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks. In CVPR.Google Scholar
Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Çaglar Gülçehre, Vincent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, Raul Chandias Ferrari, Mehdi Mirza, David Warde-Farley, Aaron C. Courville, Pascal Vincent, Roland Memisevic, Christopher Joseph Pal, and Yoshua Bengio. 2016. EmoNets: Multimodal deep learning approaches for emotion recognition in video. JMUI (2016).Google Scholar
Andrej Karpathy and Li Fei-Fei. 2017. Deep Visual-Semantic Alignments for Generating Image Descriptions. PAMI (2017). Google ScholarDigital Library
Jin-Hwa Kim, Kyoung Woon On, Woosang Lim, Jeonghee Kim, JungWoo Ha, and Byoung-Tak Zhang. 2016. Hadamard Product for Low-rank Bilinear Pooling. CoRR (2016).Google Scholar
Ryan Kiros, Ruslan Salakhutdinov, and Richard S. Zemel. 2014. Unifying Visual- Semantic Embeddings with Multimodal Neural Language Models. CoRR (2014).Google Scholar
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-Thought Vectors. In NIPS. Google ScholarDigital Library
Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 2016. NIPS.Google Scholar
Yanghao Li, Naiyan Wang, Jiaying Liu, and Xiaodi Hou. 2017. Demystifying Neural Style Transfer. In IJCAI. Google ScholarDigital Library
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV.Google Scholar
Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. 2015. Learning Transferable Features with Deep Adaptation Networks. In ICML. Google ScholarDigital Library
Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan. 2016. Unsupervised Domain Adaptation with Residual Transfer Networks. In NIPS. Google ScholarDigital Library
Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan. 2017. Deep Transfer Learning with Joint Adaptation Networks. In ICML.Google Scholar
Zhengdong Lu and Hang Li. 2013. A Deep Architecture for Matching Short Texts. In NIPS. Google ScholarDigital Library
Youssef Mroueh, Etienne Marcheret, and Vaibhava Goel. 2015. Deep multimodal learning for Audio-Visual Speech Recognition. In ICASSP. 2130--2134.Google Scholar
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In ICML. Google ScholarDigital Library
Shubham Pachori, Ameya Deshpande, and Shanmuganathan Raman. 2018. Hashing in the zero shot framework with domain adaptation. Neurocomputing (2018). Google ScholarDigital Library
Sinno Jialin Pan, Ivor W. Tsang, James T. Kwok, and Qiang Yang. 2011. Domain Adaptation via Transfer Component Analysis. TNN (2011). Google ScholarDigital Library
Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. (2010). Google ScholarDigital Library
Nikhil Rasiwasia, Pedro J. Moreno, and Nuno Vasconcelos. 2007. Bridging the Gap: Query by Semantic Example. TMM (2007). Google ScholarDigital Library
Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In ACM MM. Google ScholarDigital Library
Fabien Ringeval, Björn W. Schuller, Michel F. Valstar, Jonathan Gratch, Roddy Cowie, and Maja Pantic (Eds.). 2017. ACM MM.Google Scholar
Idan Schwartz, Alexander G. Schwing, and Tamir Hazan. 2017. High-Order Attention Models for Visual Question Answering. In NIPS.Google Scholar
Yikang Shen,Wenge Rong, Zhiwei Sun, Yuanxin Ouyang, and Zhang Xiong. 2015. AAAI. In AAAI.Google Scholar
Henri Theil and Ching-Fan Chung. 1988. Relations between two sets of variates: The bits of information provided by each variate in each set. Statistics & Probability Letters (1988).Google Scholar
Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep Domain Confusion: Maximizing for Domain Invariance. CoRR (2014).Google Scholar
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech- UCSD Birds-200--2011 Dataset. Technical Report.Google Scholar
Mei Wang and Weihong Deng. 2018. Deep Visual Domain Adaptation: A Survey. CoRR (2018).Google Scholar
Huijuan Xu and Kate Saenko. 2016. Ask, Attend and Answer: Exploring Question- Guided Spatial Attention for Visual Question Answering. In ECCV.Google Scholar
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In ICML. Google ScholarDigital Library
Xiaoshan Yang, Tianzhu Zhang, and Changsheng Xu. 2015. Cross-Domain Feature Learning in Multimedia. TMM (2015).Google Scholar
Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Shuicheng Yan, M. Shamim Hossain, and Ahmed Ghoneim. 2016. Deep Relative Attributes. TMM (2016). Google ScholarDigital Library

Index Terms

A Unified Framework for Multimodal Domain Adaptation
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia streaming

Recommendations

A Unified Adversarial Learning Framework for Semi-supervised Multi-target Domain Adaptation
Database Systems for Advanced Applications
Abstract
Machine learning algorithms have been criticized as difficult to apply to new tasks or datasets without sufficient annotations. Domain adaptation is expected to tackle this problem by establishing knowledge transfer from a labeled source domain to ...
Read More
Mutual Domain Adaptation
Highlights
- We tackle a realistic problem setting of domain adaptation, where most domains are label-deficient and need to be helped and recent data become more sparsely labeled which makes the learning even more difficult.
- To tackle this problem, ...
Abstract
To solve the label sparsity problem, domain adaptation has been well-established, suggesting various methods such as finding a common feature space of different domains using projection matrices or neural networks. Despite recent advances, domain ...
Read More
Cross-domain feature enhancement for unsupervised domain adaptation
Abstract
Till the present, the domain adaptation has been widely researched by transferring the knowledge from a labeled source domain to an unlabeled target domain. Adversarial adaptation methods have achieved great success, learning domain-invariant ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
domain adaptation
multimodal analysis
Qualifiers
- research-article
Conference

Acceptance Rates
MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 1,181
  Total Downloads
- Downloads (Last 12 months)172
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Unified Framework for Multimodal Domain Adaptation

MM '18: Proceedings of the 26th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Unified Adversarial Learning Framework for Semi-supervised Multi-target Domain Adaptation

Mutual Domain Adaptation

Cross-domain feature enhancement for unsupervised domain adaptation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media