ABSTRACT
Salient object detection is the pixel-level dense prediction task which can highlight the prominent object in the scene. Recently U-Net framework is widely used, and continuous convolution and pooling operations generate multi-level features which are complementary with each other. In view of the more contribution of high-level features for the performance, we propose a triplet transformer embedding module to enhance them by learning long-range dependencies across layers. It is the first to use three transformer encoders with shared weights to enhance multi-level features. By further designing scale adjustment module to process the input, devising three-stream decoder to process the output and attaching depth features to color features for the multi-modal fusion, the proposed triplet transformer embedding network (TriTransNet) achieves the state-of-the-art performance in RGB-D salient object detection, and pushes the performance to a new level. Experimental results demonstrate the effectiveness of the proposed modules and the competition of TriTransNet.
- Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. 2009. Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1597--1604.Google ScholarCross Ref
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).Google Scholar
- Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE transactions on image processing, Vol. 24, 12 (2015), 5706--5722.Google Scholar
- Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. Springer, 213--229.Google ScholarDigital Library
- Chenglizhao Chen, Jipeng Wei, Chong Peng, and Hong Qin. 2021 c. Depth-Quality-Aware Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 2350--2363.Google ScholarDigital Library
- Hao Chen, Yongjian Deng, Youfu Li, Tzu-Yi Hung, and Guosheng Lin. 2020 b. RGBD salient object detection via disentangled cross-modal fusion. IEEE Transactions on Image Processing, Vol. 29 (2020), 8407--8416.Google ScholarCross Ref
- Hao Chen and Youfu Li. 2018. Progressively complementarity-aware fusion network for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3051--3060.Google ScholarCross Ref
- Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. 2021 b. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021).Google Scholar
- Qian Chen, Keren Fu, Ze Liu, Geng Chen, Hongwei Du, Bensheng Qiu, and Ling Shao. 2020 c. EF-Net: A novel enhancement and fusion network for RGB-D saliency detection. Pattern Recognition (2020), 107740.Google Scholar
- Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, and Hongwei Du. 2021 a. RGB-D Salient Object Detection via 3D Convolutional Neural. AAAI (2021).Google Scholar
- Shuhan Chen and Yun Fu. 2020. Progressively guided alternate refinement network for RGB-D salient object detection. In European Conference on Computer Vision. Springer, 520--538.Google ScholarDigital Library
- Sihan Chen, Xinxin Zhu, Wei Liu, Xingjian He, and Jing Liu. 2021 e. Global-Local Propagation Network for RGB-D Semantic Segmentation. arXiv preprint arXiv:2101.10801 (2021).Google Scholar
- Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2021 d. Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8126--8135.Google ScholarCross Ref
- Zuyao Chen, Runmin Cong, Qianqian Xu, and Qingming Huang. 2020 a. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing (2020).Google Scholar
- Yupeng Cheng, Huazhu Fu, Xingxing Wei, Jiangjian Xiao, and Xiaochun Cao. 2014. Depth enhanced saliency detection method. In Proceedings of international conference on internet multimedia computing and service. 23--27. Google ScholarDigital Library
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google Scholar
- Xiangxiang Chu, Bo Zhang, Zhi Tian, Xiaolin Wei, and Huaxia Xia. 2021. Do We Really Need Explicit Position Encodings for Vision Transformers? arXiv preprint arXiv:2102.10882 (2021).Google Scholar
- Michael Donoser, Martin Urschler, Martin Hirzer, and Horst Bischof. 2009. Saliency driven total variation segmentation. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 817--824.Google ScholarCross Ref
- Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.Google Scholar
- Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision. 4548--4557.Google ScholarCross Ref
- Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018). Google ScholarDigital Library
- Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. 2020 a. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks. IEEE Transactions on Neural Networks and Learning Systems (2020).Google Scholar
- Deng-Ping Fan, Yingjie Zhai, Ali Borji, Jufeng Yang, and Ling Shao. 2020 b. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In European Conference on Computer Vision. Springer, 275--292.Google ScholarDigital Library
- Keren Fu, Deng-Ping Fan, Ge-Peng Ji, and Qijun Zhao. 2020. JL-DCF: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3052--3062.Google ScholarCross Ref
- Yuan Gao, Miaojing Shi, Dacheng Tao, and Chao Xu. 2015. Database saliency for fast image retrieval. IEEE Transactions on Multimedia, Vol. 17, 3 (2015), 359--369.Google ScholarDigital Library
- Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R Martin, and Shi-Min Hu. 2020. PCT: Point Cloud Transformer. arXiv preprint arXiv:2012.09688 (2020).Google Scholar
- Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, and Yunhe Wang. 2021. Transformer in transformer. arXiv preprint arXiv:2103.00112(2021).Google Scholar
- Ali Hassani, Steven Walton, Nikhil Shah, Abulikemu Abuduweili, Jiachen Li, and Humphrey Shi. 2021. Escaping the Big Data Paradigm with Compact Transformers. arXiv preprint arXiv:2104.05704 (2021).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015. Online tracking by learning discriminative saliency map with convolutional neural network. In International conference on machine learning. 597--606. Google ScholarDigital Library
- Nianchang Huang, Yang Yang, Dingwen Zhang, Qiang Zhang, and Jungong Han. 2021. Employing Bilinear Fusion and Saliency Prior Information for RGB-D Salient Object Detection. IEEE Transactions on Multimedia (2021).Google ScholarCross Ref
- Qing-Ge Ji, Zhi-Dang Fang, Zhen-Hua Xie, and Zhe-Ming Lu. 2013. Video abstraction based on the visual attention model and online clustering. Signal Processing: Image Communication, Vol. 28, 3 (2013), 241--253. Google ScholarDigital Library
- Wei Ji, Jingjing Li, Miao Zhang, Yongri Piao, and Huchuan Lu. 2020. Accurate rgb-d salient object detection via collaborative learning. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XVIII 16. Springer, 52--69.Google Scholar
- Qiuping Jiang, Feng Shao, Weisi Lin, Ke Gu, Gangyi Jiang, and Huifang Sun. 2017. Optimizing multistage discriminative dictionaries for blind image quality assessment. IEEE Transactions on Multimedia, Vol. 20, 8 (2017), 2035--2048.Google ScholarCross Ref
- Wen-Da Jin, Jun Xu, Qi Han, Yi Zhang, and Ming-Ming Cheng. 2021. CDNet: Complementary Depth Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 3376--3390.Google ScholarDigital Library
- Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. 2014. Depth saliency based on anisotropic center-surround difference. In 2014 IEEE international conference on image processing (ICIP). IEEE, 1115--1119.Google ScholarCross Ref
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. 2020 a. ASIF-Net: Attention steered interweave fusion network for RGB-D salient object detection. IEEE Transactions on Cybernetics (2020).Google Scholar
- Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, and Chen Change Loy. 2020 b. RGB-D salient object detection with cross-modality modulation and selection. In European Conference on Computer Vision. Springer, 225--241.Google ScholarDigital Library
- Gongyang Li, Zhi Liu, Minyu Chen, Zhen Bai, Weisi Lin, and Haibin Ling. 2021 a. Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 3528--3542.Google ScholarDigital Library
- Gongyang Li, Zhi Liu, and Haibin Ling. 2020 c. ICNet: Information Conversion Network for RGB-D Based Salient Object Detection. IEEE Transactions on Image Processing, Vol. 29 (2020), 4873--4884.Google ScholarCross Ref
- Yawei Li, Kai Zhang, Jiezhang Cao, Radu Timofte, and Luc Van Gool. 2021 b. LocalViT: Bringing Locality to Vision Transformers. arXiv preprint arXiv:2104.05707 (2021).Google Scholar
- Nian Liu, Ni Zhang, and Junwei Han. 2020 a. Learning Selective Self-Mutual Attention for RGB-D Saliency Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13756--13765.Google ScholarCross Ref
- Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021).Google Scholar
- Zhengyi Liu, Song Shi, Quntao Duan, Wei Zhang, and Peng Zhao. 2019. Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing, Vol. 363 (2019), 46--57.Google ScholarDigital Library
- Zhengyi Liu, Wei Zhang, and Peng Zhao. 2020 b. A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection. Neurocomputing, Vol. 387 (2020), 210--220.Google ScholarDigital Library
- Cong Ma, Zhenjiang Miao, Xiao-Ping Zhang, and Min Li. 2017. A saliency prior context model for real-time object tracking. IEEE Transactions on Multimedia, Vol. 19, 11 (2017), 2415--2424.Google ScholarCross Ref
- Fuyan Ma, Bin Sun, and Shutao Li. 2021. Robust Facial Expression Recognition with Convolutional Visual Transformers. arXiv preprint arXiv:2103.16854 (2021).Google Scholar
- Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, and Christoph Feichtenhofer. 2021. TrackFormer: Multi-Object Tracking with Transformers. arXiv preprint arXiv:2101.02702 (2021).Google Scholar
- Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 454--461. Google ScholarDigital Library
- Nabil Ouerhani and Heinz Hugli. 2000. Computing visual attention from scene depth. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, Vol. 1. IEEE, 375--378. Google ScholarDigital Library
- Liang Pan, Xiaofei Zhou, Ran Shi, Jiyong Zhang, and Chenggang Yan. 2020. Cross-modal feature extraction and integration based RGBD saliency detection. Image and Vision Computing, Vol. 101 (2020), 103964.Google ScholarCross Ref
- Youwei Pang, Lihe Zhang, Xiaoqi Zhao, and Huchuan Lu. 2020. Hierarchical dynamic filtering network for rgb-d salient object detection. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXV 16. Springer, 235--252.Google ScholarCross Ref
- Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. 2014. Rgbd salient object detection: a benchmark and algorithms. In European conference on computer vision. Springer, 92--109.Google ScholarCross Ref
- Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 733--740. Google ScholarDigital Library
- Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE International Conference on Computer Vision. 7254--7263.Google ScholarCross Ref
- Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, and Huchuan Lu. 2020. A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9060--9069.Google ScholarCross Ref
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google ScholarCross Ref
- Lucas Stoffl, Maxime Vidal, and Alexander Mathis. 2021. End-to-End Trainable Multi-Instance Pose Estimation with Transformers. arXiv preprint arXiv:2103.12115 (2021).Google Scholar
- Lei Sun, Kailun Yang, Xinxin Hu, Weijian Hu, and Kaiwei Wang. 2020. Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robotics and Automation Letters, Vol. 5, 4 (2020), 5558--5565.Google ScholarCross Ref
- Peng Sun, Wenhu Zhang, Huanyu Wang, Songyuan Li, and Xi Li. 2021. Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1407--1417.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc., 5998--6008. Google ScholarDigital Library
- Ziyu Wan, Jingbo Zhang, Dongdong Chen, and Jing Liao. 2021. High-Fidelity Pluralistic Image Completion with Transformers. arXiv preprint arXiv:2103.14031 (2021).Google Scholar
- Ningning Wang and Xiaojin Gong. 2019. Adaptive fusion for RGB-D salient object detection. IEEE Access, Vol. 7 (2019), 55277--55284.Google ScholarCross Ref
- Wenguan Wang, Jianbing Shen, and Haibin Ling. 2018. A deep network solution for attention and aesthetics aware photo cropping. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 7 (2018), 1531--1544.Google Scholar
- Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122 (2021).Google Scholar
- Xuehao Wang, Shuai Li, Chenglizhao Chen, Yuming Fang, Aimin Hao, and Hong Qin. 2020 a. Data-level recombination and lightweight fusion scheme for RGB-D salient object detection. IEEE Transactions on Image Processing, Vol. 30 (2020), 458--471.Google ScholarDigital Library
- Yue Wang, Yuke Li, James H Elder, Runmin Wu, Huchuan Lu, and Lu Zhang. 2020 b. Synergistic saliency and depth prediction for RGB-D saliency detection. In Proceedings of the Asian Conference on Computer Vision. 1--17.Google Scholar
- Jun Wei, Shuhui Wang, and Qingming Huang. 2020. F3Net: Fusion, Feedback and Focus for Salient Object Detection. In Proceedings of the AAAI Conference on Artificial Intelligence. 12321--12328.Google ScholarCross Ref
- Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV). 3--19.Google ScholarDigital Library
- Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Masayoshi Tomizuka, Kurt Keutzer, and Peter Vajda. 2020 b. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 (2020).Google Scholar
- Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. 2021. CvT: Introducing Convolutions to Vision Transformers. arXiv preprint arXiv:2103.15808 (2021).Google Scholar
- Yu-Huan Wu, Yun Liu, Jun Xu, Jia-Wang Bian, Yuchao Gu, and Ming-Ming Cheng. 2020 a. MobileSal: Extremely Efficient RGB-D Salient Object Detection. arXiv preprint arXiv:2012.13095 (2020).Google Scholar
- Zhe Wu, Li Su, and Qingming Huang. 2019. Cascaded partial decoder for fast and accurate salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3907--3916.Google ScholarCross Ref
- Yutong Xie, Jianpeng Zhang, Chunhua Shen, and Yong Xia. 2021. CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation. arXiv preprint arXiv:2103.03024 (2021).Google Scholar
- Weijian Xu, Yifan Xu, Tyler Chang, and Zhuowen Tu. 2021 a. Co-Scale Conv-Attentional Image Transformers. arXiv preprint arXiv:2104.06399 (2021).Google Scholar
- Yifan Xu, Weijian Xu, David Cheung, and Zhuowen Tu. 2021 b. Line segment detection using transformers without edges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4257--4266.Google ScholarCross Ref
- Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. 2021. Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986 (2021).Google Scholar
- Jin Zeng, Yanfeng Tong, Yunmu Huang, Qiong Yan, Wenxiu Sun, Jing Chen, and Yongtian Wang. 2019. Deep surface normal estimation with hierarchical rgb-d fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6153--6162.Google ScholarCross Ref
- Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Sadat Saleh, Tong Zhang, and Nick Barnes. 2020 a. UC-Net: uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8582--8591.Google ScholarCross Ref
- Miao Zhang, Weisong Ren, Yongri Piao, Zhengkun Rong, and Huchuan Lu. 2020 c. Select, Supplement and Focus for RGB-D Saliency Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3472--3481.Google ScholarCross Ref
- Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, and Jianfeng Gao. 2021 a. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding. arXiv preprint arXiv:2103.15358 (2021).Google Scholar
- Pingping Zhang, Wei Liu, Dong Wang, Yinjie Lei, Hongyu Wang, and Huchuan Lu. 2020 b. Non-rigid object tracking via deep multi-scale spatial-temporal discriminative saliency maps. Pattern Recognition, Vol. 100 (2020), 107130.Google ScholarCross Ref
- Yundong Zhang, Huiye Liu, and Qiang Hu. 2021 c. Transfuse: Fusing transformers and cnns for medical image segmentation. arXiv preprint arXiv:2102.08005 (2021).Google Scholar
- Zhao Zhang, Zheng Lin, Jun Xu, Wen-Da Jin, Shao-Ping Lu, and Deng-Ping Fan. 2021 b. Bilateral attention network for RGB-D salient object detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 1949--1961.Google ScholarDigital Library
- Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. 2020 a. Point transformer. arXiv preprint arXiv:2012.09164 (2020).Google Scholar
- Jiaojiao Zhao, Xinyu Li, Chunhui Liu, Shuai Bing, Hao Chen, Cees GM Snoek, and Joseph Tighe. 2021. TubeR: Tube-Transformer for Action Detection. arXiv preprint arXiv:2104.00969 (2021).Google Scholar
- Jiawei Zhao, Yifan Zhao, Jia Li, and Xiaowu Chen. 2020 c. Is depth really necessary for salient object detection?. In Proceedings of the 28th ACM International Conference on Multimedia. 1745--1754. Google ScholarDigital Library
- Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3927--3936.Google ScholarCross Ref
- Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, and Lei Zhang. 2020 b. A single stream network for robust and real-time rgb-d salient object detection. In European Conference on Computer Vision. Springer, 646--662.Google ScholarDigital Library
- Chuanxia Zheng, Tat-Jen Cham, and Jianfei Cai. 2021 a. TFill: Image Completion via a Transformer-Based Architecture. arXiv preprint arXiv:2104.00845 (2021).Google Scholar
- Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, et al. 2021 b. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6881--6890.Google ScholarCross Ref
- Chunbiao Zhu, Xing Cai, Kan Huang, Thomas H Li, and Ge Li. 2019. PDNet: Prior-model guided depth-enhanced network for salient object detection. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 199--204.Google ScholarCross Ref
- Kuan Zhu, Haiyun Guo, Shiliang Zhang, Yaowei Wang, Gaopan Huang, Honglin Qiao, Jing Liu, Jinqiao Wang, and Ming Tang. 2021. AAformer: Auto-Aligned Transformer for Person Re-Identification. arXiv preprint arXiv:2104.00921 (2021).Google Scholar
Index Terms
- TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
Recommendations
MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection
MM '20: Proceedings of the 28th ACM International Conference on MultimediaMost existing RGB-D salient object detection (SOD) methods directly extract and fuse raw features from RGB and depth backbones. Such methods can be easily restricted by low-quality depth maps and redundant cross-modal features. To effectively capture ...
Salient object detection in RGB-D image based on saliency fusion and propagation
ICIMCS '15: Proceedings of the 7th International Conference on Internet Multimedia Computing and ServiceAutomatic detection of salient objects in images attracts much research attention for its usage in numerous multimedia applications. In this paper, we propose a saliency fusion and propagation strategy based salient object detection method for RGB-D ...
Depth Information Fused Salient Object Detection
ICIMCS '14: Proceedings of International Conference on Internet Multimedia Computing and ServiceSaliency Detection has emerged as a hot topic due to its potential application in image and video understanding. Most existing saliency detection algorithms focus on two-dimensional information while the depth information is often ignored. In this paper,...
Comments