ABSTRACT
Domain adaptation aims to train a model on labeled data from a source domain while minimizing test error on a target domain. Most of existing domain adaptation methods only focus on reducing domain shift of single-modal data. In this paper, we consider a new problem of multimodal domain adaptation and propose a unified framework to solve it. The proposed multimodal domain adaptation neural networks(MDANN) consist of three important modules. (1) A covariant multimodal attention is designed to learn a common feature representation for multiple modalities. (2) A fusion module adaptively fuses attended features of different modalities. (3) Hybrid domain constraints are proposed to comprehensively learn domain-invariant features by constraining single modal features, fused features, and attention scores. Through jointly attending and fusing under an adversarial objective, the most discriminative and domain-adaptive parts of the features are adaptively fused together. Extensive experimental results on two real-world cross-domain applications (emotion recognition and cross-media retrieval) demonstrate the effectiveness of the proposed method.
- Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In ICCV. 2425--2433. Google ScholarDigital Library
- Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. SoundNet: Learning Sound Representations from Unlabeled Video. In NIPS. Google ScholarDigital Library
- Tadas Baltrusaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2017. Multimodal Machine Learning: A Survey and Taxonomy. CoRR (2017).Google Scholar
- Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2006. Analysis of Representations for Domain Adaptation. In NIPS. 137--144. Google ScholarDigital Library
- Hedi Ben-younes, Rémi Cadène, Matthieu Cord, and Nicolas Thome. 2017. MUTAN: Multimodal Tucker Fusion for Visual Question Answering. In ICCV. 2631-- 2639.Google Scholar
- John Blitzer, Ryan T. McDonald, and Fernando Pereira. 2006. Domain Adaptation with Structural Correspondence Learning. In EMNLP. 120--128. Google ScholarDigital Library
- Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth Narayanan. 2008. IEMOCAP: interactive emotional dyadic motion capture database. LRE (2008).Google Scholar
- Minmin Chen, Zhixiang Eddie Xu, Kilian Q. Weinberger, and Fei Sha. 2012. Marginalized Denoising Autoencoders for Domain Adaptation. In ICML. Google ScholarDigital Library
- Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang,Wan Ting Hsu, Jianlong Fu, and Min Sun. 2017. Show, Adapt and Tell: Adversarial Training of Cross- Domain Image Captioner. In ICCV.Google Scholar
- Wen-Sheng Chu, Fernando De la Torre, and Jeffrey F. Cohn. 2013. Selective Transfer Machine for Personalized Facial Action Unit Detection. In CVPR. Google ScholarDigital Library
- Abhinav Dhall, Roland Goecke, Jyoti Joshi, Michael Wagner, and Tom Gedeon. 2013. Emotion recognition in the wild challenge (EmotiW) challenge and workshop summary. In ICMI. Google ScholarDigital Library
- Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML. Google ScholarDigital Library
- Julien Epps, Fang Chen, Sharon Oviatt, Kenji Mase, Andrew Sears, Kristiina Jokinen, and Björn W. Schuller (Eds.). 2013. ICMI.Google Scholar
- Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. In EMNLP.Google Scholar
- Yaroslav Ganin and Victor S. Lempitsky. 2015. Unsupervised Domain Adaptation by Backpropagation. In ICML. Google ScholarDigital Library
- Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor S. Lempitsky. 2016. Domain-Adversarial Training of Neural Networks. JMLR (2016). Google ScholarDigital Library
- Timnit Gebru, Judy Hoffman, and Li Fei-Fei. 2017. Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach. In ICCV.Google Scholar
- Muhammad Ghifary,W. Bastiaan Kleijn, Mengjie Zhang, and David Balduzzi. 2015. Domain Generalization for Object Recognition with Multi-task Autoencoders. In ICCV. Google ScholarDigital Library
- Boqing Gong, Kristen Grauman, and Fei Sha. 2013. Connecting the Dots with Landmarks: Discriminatively Learning Domain-Invariant Features for Unsupervised Domain Adaptation. In ICML. Google ScholarDigital Library
- Matthieu Guillaumin, Jakob J. Verbeek, and Cordelia Schmid. 2010. Multimodal semi-supervised learning for image classification. In CVPR.Google Scholar
- David R. Hardoon, Sandor R. Szedmak, and John R. Shawe-taylor. 2004. Canonical Correlation Analysis: AnOverviewwith Application to Learning Methods. Neural Comput. (2004). Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 (2015).Google Scholar
- Jiayuan Huang, Alexander J. Smola, Arthur Gretton, Karsten M. Borgwardt, and Bernhard Schölkopf. 2006. Correcting Sample Selection Bias by Unlabeled Data. In NIPS. Google ScholarDigital Library
- Jiayuan Huang, Alexander J. Smola, Arthur Gretton, Karsten M. Borgwardt, and Bernhard Schölkopf. 2007. Correcting Sample Selection Bias by Unlabeled Data. In NIPS. Google ScholarDigital Library
- Xun Huang and Serge J. Belongie. 2017. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. In ICCV.Google Scholar
- Gideon J., Khorram S., Aldeneh Z., Dimitriadis D., and Provost E. 2017. Progressive Neural Networks for Transfer Learning in Emotion Recognition. In Interspeech.Google Scholar
- Bousmalis K., Trigeorgis G., Silberman N., Krishnan D., and Erhan D. 2016. Domain Separation Networks. In NIPS. Google ScholarDigital Library
- Bousmalis K., Silberman N., Dohan D., Erhan D., and Krishnan D. 2017. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks. In CVPR.Google Scholar
- Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Çaglar Gülçehre, Vincent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, Raul Chandias Ferrari, Mehdi Mirza, David Warde-Farley, Aaron C. Courville, Pascal Vincent, Roland Memisevic, Christopher Joseph Pal, and Yoshua Bengio. 2016. EmoNets: Multimodal deep learning approaches for emotion recognition in video. JMUI (2016).Google Scholar
- Andrej Karpathy and Li Fei-Fei. 2017. Deep Visual-Semantic Alignments for Generating Image Descriptions. PAMI (2017). Google ScholarDigital Library
- Jin-Hwa Kim, Kyoung Woon On, Woosang Lim, Jeonghee Kim, JungWoo Ha, and Byoung-Tak Zhang. 2016. Hadamard Product for Low-rank Bilinear Pooling. CoRR (2016).Google Scholar
- Ryan Kiros, Ruslan Salakhutdinov, and Richard S. Zemel. 2014. Unifying Visual- Semantic Embeddings with Multimodal Neural Language Models. CoRR (2014).Google Scholar
- Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-Thought Vectors. In NIPS. Google ScholarDigital Library
- Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 2016. NIPS.Google Scholar
- Yanghao Li, Naiyan Wang, Jiaying Liu, and Xiaodi Hou. 2017. Demystifying Neural Style Transfer. In IJCAI. Google ScholarDigital Library
- Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV.Google Scholar
- Mingsheng Long, Yue Cao, Jianmin Wang, and Michael I. Jordan. 2015. Learning Transferable Features with Deep Adaptation Networks. In ICML. Google ScholarDigital Library
- Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan. 2016. Unsupervised Domain Adaptation with Residual Transfer Networks. In NIPS. Google ScholarDigital Library
- Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I. Jordan. 2017. Deep Transfer Learning with Joint Adaptation Networks. In ICML.Google Scholar
- Zhengdong Lu and Hang Li. 2013. A Deep Architecture for Matching Short Texts. In NIPS. Google ScholarDigital Library
- Youssef Mroueh, Etienne Marcheret, and Vaibhava Goel. 2015. Deep multimodal learning for Audio-Visual Speech Recognition. In ICASSP. 2130--2134.Google Scholar
- Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In ICML. Google ScholarDigital Library
- Shubham Pachori, Ameya Deshpande, and Shanmuganathan Raman. 2018. Hashing in the zero shot framework with domain adaptation. Neurocomputing (2018). Google ScholarDigital Library
- Sinno Jialin Pan, Ivor W. Tsang, James T. Kwok, and Qiang Yang. 2011. Domain Adaptation via Transfer Component Analysis. TNN (2011). Google ScholarDigital Library
- Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. (2010). Google ScholarDigital Library
- Nikhil Rasiwasia, Pedro J. Moreno, and Nuno Vasconcelos. 2007. Bridging the Gap: Query by Semantic Example. TMM (2007). Google ScholarDigital Library
- Nikhil Rasiwasia, Jose Costa Pereira, Emanuele Coviello, Gabriel Doyle, Gert R. G. Lanckriet, Roger Levy, and Nuno Vasconcelos. 2010. A new approach to cross-modal multimedia retrieval. In ACM MM. Google ScholarDigital Library
- Fabien Ringeval, Björn W. Schuller, Michel F. Valstar, Jonathan Gratch, Roddy Cowie, and Maja Pantic (Eds.). 2017. ACM MM.Google Scholar
- Idan Schwartz, Alexander G. Schwing, and Tamir Hazan. 2017. High-Order Attention Models for Visual Question Answering. In NIPS.Google Scholar
- Yikang Shen,Wenge Rong, Zhiwei Sun, Yuanxin Ouyang, and Zhang Xiong. 2015. AAAI. In AAAI.Google Scholar
- Henri Theil and Ching-Fan Chung. 1988. Relations between two sets of variates: The bits of information provided by each variate in each set. Statistics & Probability Letters (1988).Google Scholar
- Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. 2014. Deep Domain Confusion: Maximizing for Domain Invariance. CoRR (2014).Google Scholar
- C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech- UCSD Birds-200--2011 Dataset. Technical Report.Google Scholar
- Mei Wang and Weihong Deng. 2018. Deep Visual Domain Adaptation: A Survey. CoRR (2018).Google Scholar
- Huijuan Xu and Kate Saenko. 2016. Ask, Attend and Answer: Exploring Question- Guided Spatial Attention for Visual Question Answering. In ECCV.Google Scholar
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In ICML. Google ScholarDigital Library
- Xiaoshan Yang, Tianzhu Zhang, and Changsheng Xu. 2015. Cross-Domain Feature Learning in Multimedia. TMM (2015).Google Scholar
- Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Shuicheng Yan, M. Shamim Hossain, and Ahmed Ghoneim. 2016. Deep Relative Attributes. TMM (2016). Google ScholarDigital Library
Index Terms
- A Unified Framework for Multimodal Domain Adaptation
Recommendations
A Unified Adversarial Learning Framework for Semi-supervised Multi-target Domain Adaptation
Database Systems for Advanced ApplicationsAbstractMachine learning algorithms have been criticized as difficult to apply to new tasks or datasets without sufficient annotations. Domain adaptation is expected to tackle this problem by establishing knowledge transfer from a labeled source domain to ...
Mutual Domain Adaptation
Highlights- We tackle a realistic problem setting of domain adaptation, where most domains are label-deficient and need to be helped and recent data become more sparsely labeled which makes the learning even more difficult.
- To tackle this problem, ...
AbstractTo solve the label sparsity problem, domain adaptation has been well-established, suggesting various methods such as finding a common feature space of different domains using projection matrices or neural networks. Despite recent advances, domain ...
Cross-domain feature enhancement for unsupervised domain adaptation
AbstractTill the present, the domain adaptation has been widely researched by transferring the knowledge from a labeled source domain to an unlabeled target domain. Adversarial adaptation methods have achieved great success, learning domain-invariant ...
Comments