research-article

TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

Authors:
Zhengyi Liu

Anhui University, Hefei, China

Anhui University, Hefei, China
View Profile

,
Yuan Wang

Anhui University, Hefei, China

Anhui University, Hefei, China
View Profile

,
Zhengzheng Tu

Anhui University, Hefei, China

Anhui University, Hefei, China
View Profile

,
Yun Xiao

Anhui University, Hefei, China

Anhui University, Hefei, China
View Profile

,
Bin Tang

Hefei University, Hefei, China

Hefei University, Hefei, China
View Profile

MM '21: Proceedings of the 29th ACM International Conference on MultimediaOctober 2021Pages 4481–4490https://doi.org/10.1145/3474085.3475601

Published:17 October 2021Publication History

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 4481–4490

ABSTRACT

Salient object detection is the pixel-level dense prediction task which can highlight the prominent object in the scene. Recently U-Net framework is widely used, and continuous convolution and pooling operations generate multi-level features which are complementary with each other. In view of the more contribution of high-level features for the performance, we propose a triplet transformer embedding module to enhance them by learning long-range dependencies across layers. It is the first to use three transformer encoders with shared weights to enhance multi-level features. By further designing scale adjustment module to process the input, devising three-stream decoder to process the output and attaching depth features to color features for the multi-modal fusion, the proposed triplet transformer embedding network (TriTransNet) achieves the state-of-the-art performance in RGB-D salient object detection, and pushes the performance to a new level. Experimental results demonstrate the effectiveness of the proposed modules and the competition of TriTransNet.

References

Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. 2009. Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1597--1604.Google ScholarCross Ref
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).Google Scholar
Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE transactions on image processing, Vol. 24, 12 (2015), 5706--5722.Google Scholar
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. Springer, 213--229.Google ScholarDigital Library
Chenglizhao Chen, Jipeng Wei, Chong Peng, and Hong Qin. 2021 c. Depth-Quality-Aware Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 2350--2363.Google ScholarDigital Library
Hao Chen, Yongjian Deng, Youfu Li, Tzu-Yi Hung, and Guosheng Lin. 2020 b. RGBD salient object detection via disentangled cross-modal fusion. IEEE Transactions on Image Processing, Vol. 29 (2020), 8407--8416.Google ScholarCross Ref
Hao Chen and Youfu Li. 2018. Progressively complementarity-aware fusion network for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3051--3060.Google ScholarCross Ref
Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. 2021 b. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021).Google Scholar
Qian Chen, Keren Fu, Ze Liu, Geng Chen, Hongwei Du, Bensheng Qiu, and Ling Shao. 2020 c. EF-Net: A novel enhancement and fusion network for RGB-D saliency detection. Pattern Recognition (2020), 107740.Google Scholar
Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, and Hongwei Du. 2021 a. RGB-D Salient Object Detection via 3D Convolutional Neural. AAAI (2021).Google Scholar
Shuhan Chen and Yun Fu. 2020. Progressively guided alternate refinement network for RGB-D salient object detection. In European Conference on Computer Vision. Springer, 520--538.Google ScholarDigital Library
Sihan Chen, Xinxin Zhu, Wei Liu, Xingjian He, and Jing Liu. 2021 e. Global-Local Propagation Network for RGB-D Semantic Segmentation. arXiv preprint arXiv:2101.10801 (2021).Google Scholar
Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2021 d. Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8126--8135.Google ScholarCross Ref
Zuyao Chen, Runmin Cong, Qianqian Xu, and Qingming Huang. 2020 a. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing (2020).Google Scholar
Yupeng Cheng, Huazhu Fu, Xingxing Wei, Jiangjian Xiao, and Xiaochun Cao. 2014. Depth enhanced saliency detection method. In Proceedings of international conference on internet multimedia computing and service. 23--27. Google ScholarDigital Library
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google Scholar
Xiangxiang Chu, Bo Zhang, Zhi Tian, Xiaolin Wei, and Huaxia Xia. 2021. Do We Really Need Explicit Position Encodings for Vision Transformers? arXiv preprint arXiv:2102.10882 (2021).Google Scholar
Michael Donoser, Martin Urschler, Martin Hirzer, and Horst Bischof. 2009. Saliency driven total variation segmentation. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 817--824.Google ScholarCross Ref
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.Google Scholar
Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision. 4548--4557.Google ScholarCross Ref
Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018). Google ScholarDigital Library
Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. 2020 a. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks. IEEE Transactions on Neural Networks and Learning Systems (2020).Google Scholar
Deng-Ping Fan, Yingjie Zhai, Ali Borji, Jufeng Yang, and Ling Shao. 2020 b. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In European Conference on Computer Vision. Springer, 275--292.Google ScholarDigital Library
Keren Fu, Deng-Ping Fan, Ge-Peng Ji, and Qijun Zhao. 2020. JL-DCF: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3052--3062.Google ScholarCross Ref
Yuan Gao, Miaojing Shi, Dacheng Tao, and Chao Xu. 2015. Database saliency for fast image retrieval. IEEE Transactions on Multimedia, Vol. 17, 3 (2015), 359--369.Google ScholarDigital Library
Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R Martin, and Shi-Min Hu. 2020. PCT: Point Cloud Transformer. arXiv preprint arXiv:2012.09688 (2020).Google Scholar
Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, and Yunhe Wang. 2021. Transformer in transformer. arXiv preprint arXiv:2103.00112(2021).Google Scholar
Ali Hassani, Steven Walton, Nikhil Shah, Abulikemu Abuduweili, Jiachen Li, and Humphrey Shi. 2021. Escaping the Big Data Paradigm with Compact Transformers. arXiv preprint arXiv:2104.05704 (2021).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015. Online tracking by learning discriminative saliency map with convolutional neural network. In International conference on machine learning. 597--606. Google ScholarDigital Library
Nianchang Huang, Yang Yang, Dingwen Zhang, Qiang Zhang, and Jungong Han. 2021. Employing Bilinear Fusion and Saliency Prior Information for RGB-D Salient Object Detection. IEEE Transactions on Multimedia (2021).Google ScholarCross Ref
Qing-Ge Ji, Zhi-Dang Fang, Zhen-Hua Xie, and Zhe-Ming Lu. 2013. Video abstraction based on the visual attention model and online clustering. Signal Processing: Image Communication, Vol. 28, 3 (2013), 241--253. Google ScholarDigital Library
Wei Ji, Jingjing Li, Miao Zhang, Yongri Piao, and Huchuan Lu. 2020. Accurate rgb-d salient object detection via collaborative learning. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XVIII 16. Springer, 52--69.Google Scholar
Qiuping Jiang, Feng Shao, Weisi Lin, Ke Gu, Gangyi Jiang, and Huifang Sun. 2017. Optimizing multistage discriminative dictionaries for blind image quality assessment. IEEE Transactions on Multimedia, Vol. 20, 8 (2017), 2035--2048.Google ScholarCross Ref
Wen-Da Jin, Jun Xu, Qi Han, Yi Zhang, and Ming-Ming Cheng. 2021. CDNet: Complementary Depth Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 3376--3390.Google ScholarDigital Library
Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. 2014. Depth saliency based on anisotropic center-surround difference. In 2014 IEEE international conference on image processing (ICIP). IEEE, 1115--1119.Google ScholarCross Ref
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. 2020 a. ASIF-Net: Attention steered interweave fusion network for RGB-D salient object detection. IEEE Transactions on Cybernetics (2020).Google Scholar
Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, and Chen Change Loy. 2020 b. RGB-D salient object detection with cross-modality modulation and selection. In European Conference on Computer Vision. Springer, 225--241.Google ScholarDigital Library
Gongyang Li, Zhi Liu, Minyu Chen, Zhen Bai, Weisi Lin, and Haibin Ling. 2021 a. Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 3528--3542.Google ScholarDigital Library
Gongyang Li, Zhi Liu, and Haibin Ling. 2020 c. ICNet: Information Conversion Network for RGB-D Based Salient Object Detection. IEEE Transactions on Image Processing, Vol. 29 (2020), 4873--4884.Google ScholarCross Ref
Yawei Li, Kai Zhang, Jiezhang Cao, Radu Timofte, and Luc Van Gool. 2021 b. LocalViT: Bringing Locality to Vision Transformers. arXiv preprint arXiv:2104.05707 (2021).Google Scholar
Nian Liu, Ni Zhang, and Junwei Han. 2020 a. Learning Selective Self-Mutual Attention for RGB-D Saliency Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13756--13765.Google ScholarCross Ref
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021).Google Scholar
Zhengyi Liu, Song Shi, Quntao Duan, Wei Zhang, and Peng Zhao. 2019. Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing, Vol. 363 (2019), 46--57.Google ScholarDigital Library
Zhengyi Liu, Wei Zhang, and Peng Zhao. 2020 b. A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection. Neurocomputing, Vol. 387 (2020), 210--220.Google ScholarDigital Library
Cong Ma, Zhenjiang Miao, Xiao-Ping Zhang, and Min Li. 2017. A saliency prior context model for real-time object tracking. IEEE Transactions on Multimedia, Vol. 19, 11 (2017), 2415--2424.Google ScholarCross Ref
Fuyan Ma, Bin Sun, and Shutao Li. 2021. Robust Facial Expression Recognition with Convolutional Visual Transformers. arXiv preprint arXiv:2103.16854 (2021).Google Scholar
Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, and Christoph Feichtenhofer. 2021. TrackFormer: Multi-Object Tracking with Transformers. arXiv preprint arXiv:2101.02702 (2021).Google Scholar
Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 454--461. Google ScholarDigital Library
Nabil Ouerhani and Heinz Hugli. 2000. Computing visual attention from scene depth. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, Vol. 1. IEEE, 375--378. Google ScholarDigital Library
Liang Pan, Xiaofei Zhou, Ran Shi, Jiyong Zhang, and Chenggang Yan. 2020. Cross-modal feature extraction and integration based RGBD saliency detection. Image and Vision Computing, Vol. 101 (2020), 103964.Google ScholarCross Ref
Youwei Pang, Lihe Zhang, Xiaoqi Zhao, and Huchuan Lu. 2020. Hierarchical dynamic filtering network for rgb-d salient object detection. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXV 16. Springer, 235--252.Google ScholarCross Ref
Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. 2014. Rgbd salient object detection: a benchmark and algorithms. In European conference on computer vision. Springer, 92--109.Google ScholarCross Ref
Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 733--740. Google ScholarDigital Library
Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE International Conference on Computer Vision. 7254--7263.Google ScholarCross Ref
Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, and Huchuan Lu. 2020. A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9060--9069.Google ScholarCross Ref
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google ScholarCross Ref
Lucas Stoffl, Maxime Vidal, and Alexander Mathis. 2021. End-to-End Trainable Multi-Instance Pose Estimation with Transformers. arXiv preprint arXiv:2103.12115 (2021).Google Scholar
Lei Sun, Kailun Yang, Xinxin Hu, Weijian Hu, and Kaiwei Wang. 2020. Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robotics and Automation Letters, Vol. 5, 4 (2020), 5558--5565.Google ScholarCross Ref
Peng Sun, Wenhu Zhang, Huanyu Wang, Songyuan Li, and Xi Li. 2021. Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1407--1417.Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc., 5998--6008. Google ScholarDigital Library
Ziyu Wan, Jingbo Zhang, Dongdong Chen, and Jing Liao. 2021. High-Fidelity Pluralistic Image Completion with Transformers. arXiv preprint arXiv:2103.14031 (2021).Google Scholar
Ningning Wang and Xiaojin Gong. 2019. Adaptive fusion for RGB-D salient object detection. IEEE Access, Vol. 7 (2019), 55277--55284.Google ScholarCross Ref
Wenguan Wang, Jianbing Shen, and Haibin Ling. 2018. A deep network solution for attention and aesthetics aware photo cropping. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 7 (2018), 1531--1544.Google Scholar
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122 (2021).Google Scholar
Xuehao Wang, Shuai Li, Chenglizhao Chen, Yuming Fang, Aimin Hao, and Hong Qin. 2020 a. Data-level recombination and lightweight fusion scheme for RGB-D salient object detection. IEEE Transactions on Image Processing, Vol. 30 (2020), 458--471.Google ScholarDigital Library
Yue Wang, Yuke Li, James H Elder, Runmin Wu, Huchuan Lu, and Lu Zhang. 2020 b. Synergistic saliency and depth prediction for RGB-D saliency detection. In Proceedings of the Asian Conference on Computer Vision. 1--17.Google Scholar
Jun Wei, Shuhui Wang, and Qingming Huang. 2020. F3Net: Fusion, Feedback and Focus for Salient Object Detection. In Proceedings of the AAAI Conference on Artificial Intelligence. 12321--12328.Google ScholarCross Ref
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV). 3--19.Google ScholarDigital Library
Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Masayoshi Tomizuka, Kurt Keutzer, and Peter Vajda. 2020 b. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 (2020).Google Scholar
Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. 2021. CvT: Introducing Convolutions to Vision Transformers. arXiv preprint arXiv:2103.15808 (2021).Google Scholar
Yu-Huan Wu, Yun Liu, Jun Xu, Jia-Wang Bian, Yuchao Gu, and Ming-Ming Cheng. 2020 a. MobileSal: Extremely Efficient RGB-D Salient Object Detection. arXiv preprint arXiv:2012.13095 (2020).Google Scholar
Zhe Wu, Li Su, and Qingming Huang. 2019. Cascaded partial decoder for fast and accurate salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3907--3916.Google ScholarCross Ref
Yutong Xie, Jianpeng Zhang, Chunhua Shen, and Yong Xia. 2021. CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation. arXiv preprint arXiv:2103.03024 (2021).Google Scholar
Weijian Xu, Yifan Xu, Tyler Chang, and Zhuowen Tu. 2021 a. Co-Scale Conv-Attentional Image Transformers. arXiv preprint arXiv:2104.06399 (2021).Google Scholar
Yifan Xu, Weijian Xu, David Cheung, and Zhuowen Tu. 2021 b. Line segment detection using transformers without edges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4257--4266.Google ScholarCross Ref
Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. 2021. Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986 (2021).Google Scholar
Jin Zeng, Yanfeng Tong, Yunmu Huang, Qiong Yan, Wenxiu Sun, Jing Chen, and Yongtian Wang. 2019. Deep surface normal estimation with hierarchical rgb-d fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6153--6162.Google ScholarCross Ref
Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Sadat Saleh, Tong Zhang, and Nick Barnes. 2020 a. UC-Net: uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8582--8591.Google ScholarCross Ref
Miao Zhang, Weisong Ren, Yongri Piao, Zhengkun Rong, and Huchuan Lu. 2020 c. Select, Supplement and Focus for RGB-D Saliency Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3472--3481.Google ScholarCross Ref
Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, and Jianfeng Gao. 2021 a. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding. arXiv preprint arXiv:2103.15358 (2021).Google Scholar
Pingping Zhang, Wei Liu, Dong Wang, Yinjie Lei, Hongyu Wang, and Huchuan Lu. 2020 b. Non-rigid object tracking via deep multi-scale spatial-temporal discriminative saliency maps. Pattern Recognition, Vol. 100 (2020), 107130.Google ScholarCross Ref
Yundong Zhang, Huiye Liu, and Qiang Hu. 2021 c. Transfuse: Fusing transformers and cnns for medical image segmentation. arXiv preprint arXiv:2102.08005 (2021).Google Scholar
Zhao Zhang, Zheng Lin, Jun Xu, Wen-Da Jin, Shao-Ping Lu, and Deng-Ping Fan. 2021 b. Bilateral attention network for RGB-D salient object detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 1949--1961.Google ScholarDigital Library
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. 2020 a. Point transformer. arXiv preprint arXiv:2012.09164 (2020).Google Scholar
Jiaojiao Zhao, Xinyu Li, Chunhui Liu, Shuai Bing, Hao Chen, Cees GM Snoek, and Joseph Tighe. 2021. TubeR: Tube-Transformer for Action Detection. arXiv preprint arXiv:2104.00969 (2021).Google Scholar
Jiawei Zhao, Yifan Zhao, Jia Li, and Xiaowu Chen. 2020 c. Is depth really necessary for salient object detection?. In Proceedings of the 28th ACM International Conference on Multimedia. 1745--1754. Google ScholarDigital Library
Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3927--3936.Google ScholarCross Ref
Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, and Lei Zhang. 2020 b. A single stream network for robust and real-time rgb-d salient object detection. In European Conference on Computer Vision. Springer, 646--662.Google ScholarDigital Library
Chuanxia Zheng, Tat-Jen Cham, and Jianfei Cai. 2021 a. TFill: Image Completion via a Transformer-Based Architecture. arXiv preprint arXiv:2104.00845 (2021).Google Scholar
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, et al. 2021 b. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6881--6890.Google ScholarCross Ref
Chunbiao Zhu, Xing Cai, Kan Huang, Thomas H Li, and Ge Li. 2019. PDNet: Prior-model guided depth-enhanced network for salient object detection. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 199--204.Google ScholarCross Ref
Kuan Zhu, Haiyun Guo, Shiliang Zhang, Yaowei Wang, Gaopan Huang, Honglin Qiao, Jing Liu, Jinqiao Wang, and Ming Tang. 2021. AAformer: Auto-Aligned Transformer for Person Re-Identification. arXiv preprint arXiv:2104.00921 (2021).Google Scholar

Index Terms

TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections

Recommendations

MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Most existing RGB-D salient object detection (SOD) methods directly extract and fuse raw features from RGB and depth backbones. Such methods can be easily restricted by low-quality depth maps and redundant cross-modal features. To effectively capture ...
Read More
Salient object detection in RGB-D image based on saliency fusion and propagation
ICIMCS '15: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service

Automatic detection of salient objects in images attracts much research attention for its usage in numerous multimedia applications. In this paper, we propose a saliency fusion and propagation strategy based salient object detection method for RGB-D ...
Read More
Depth Information Fused Salient Object Detection
ICIMCS '14: Proceedings of International Conference on Internet Multimedia Computing and Service

Saliency Detection has emerged as a hot topic due to its potential application in image and video understanding. Most existing saliency detection algorithms focus on two-dimensional information while the depth information is often ignored. In this paper,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
rgb-d image
salient object detection
self-attention
shared weights
transformer
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 572
  Total Downloads
- Downloads (Last 12 months)133
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

Salient object detection in RGB-D image based on saliency fusion and propagation

Depth Information Fused Salient Object Detection