skip to main content
10.1145/3474085.3475601acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

Published:17 October 2021Publication History

ABSTRACT

Salient object detection is the pixel-level dense prediction task which can highlight the prominent object in the scene. Recently U-Net framework is widely used, and continuous convolution and pooling operations generate multi-level features which are complementary with each other. In view of the more contribution of high-level features for the performance, we propose a triplet transformer embedding module to enhance them by learning long-range dependencies across layers. It is the first to use three transformer encoders with shared weights to enhance multi-level features. By further designing scale adjustment module to process the input, devising three-stream decoder to process the output and attaching depth features to color features for the multi-modal fusion, the proposed triplet transformer embedding network (TriTransNet) achieves the state-of-the-art performance in RGB-D salient object detection, and pushes the performance to a new level. Experimental results demonstrate the effectiveness of the proposed modules and the competition of TriTransNet.

References

  1. Radhakrishna Achanta, Sheila Hemami, Francisco Estrada, and Sabine Susstrunk. 2009. Frequency-tuned salient region detection. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 1597--1604.Google ScholarGoogle ScholarCross RefCross Ref
  2. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).Google ScholarGoogle Scholar
  3. Ali Borji, Ming-Ming Cheng, Huaizu Jiang, and Jia Li. 2015. Salient object detection: A benchmark. IEEE transactions on image processing, Vol. 24, 12 (2015), 5706--5722.Google ScholarGoogle Scholar
  4. Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. Springer, 213--229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chenglizhao Chen, Jipeng Wei, Chong Peng, and Hong Qin. 2021 c. Depth-Quality-Aware Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 2350--2363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hao Chen, Yongjian Deng, Youfu Li, Tzu-Yi Hung, and Guosheng Lin. 2020 b. RGBD salient object detection via disentangled cross-modal fusion. IEEE Transactions on Image Processing, Vol. 29 (2020), 8407--8416.Google ScholarGoogle ScholarCross RefCross Ref
  7. Hao Chen and Youfu Li. 2018. Progressively complementarity-aware fusion network for RGB-D salient object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3051--3060.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. 2021 b. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021).Google ScholarGoogle Scholar
  9. Qian Chen, Keren Fu, Ze Liu, Geng Chen, Hongwei Du, Bensheng Qiu, and Ling Shao. 2020 c. EF-Net: A novel enhancement and fusion network for RGB-D saliency detection. Pattern Recognition (2020), 107740.Google ScholarGoogle Scholar
  10. Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, and Hongwei Du. 2021 a. RGB-D Salient Object Detection via 3D Convolutional Neural. AAAI (2021).Google ScholarGoogle Scholar
  11. Shuhan Chen and Yun Fu. 2020. Progressively guided alternate refinement network for RGB-D salient object detection. In European Conference on Computer Vision. Springer, 520--538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sihan Chen, Xinxin Zhu, Wei Liu, Xingjian He, and Jing Liu. 2021 e. Global-Local Propagation Network for RGB-D Semantic Segmentation. arXiv preprint arXiv:2101.10801 (2021).Google ScholarGoogle Scholar
  13. Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, and Huchuan Lu. 2021 d. Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8126--8135.Google ScholarGoogle ScholarCross RefCross Ref
  14. Zuyao Chen, Runmin Cong, Qianqian Xu, and Qingming Huang. 2020 a. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing (2020).Google ScholarGoogle Scholar
  15. Yupeng Cheng, Huazhu Fu, Xingxing Wei, Jiangjian Xiao, and Xiaochun Cao. 2014. Depth enhanced saliency detection method. In Proceedings of international conference on internet multimedia computing and service. 23--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google ScholarGoogle Scholar
  17. Xiangxiang Chu, Bo Zhang, Zhi Tian, Xiaolin Wei, and Huaxia Xia. 2021. Do We Really Need Explicit Position Encodings for Vision Transformers? arXiv preprint arXiv:2102.10882 (2021).Google ScholarGoogle Scholar
  18. Michael Donoser, Martin Urschler, Martin Hirzer, and Horst Bischof. 2009. Saliency driven total variation segmentation. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 817--824.Google ScholarGoogle ScholarCross RefCross Ref
  19. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  20. Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, Tao Li, and Ali Borji. 2017. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision. 4548--4557.Google ScholarGoogle ScholarCross RefCross Ref
  21. Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, and Ali Borji. 2018. Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421 (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Deng-Ping Fan, Zheng Lin, Zhao Zhang, Menglong Zhu, and Ming-Ming Cheng. 2020 a. Rethinking RGB-D Salient Object Detection: Models, Data Sets, and Large-Scale Benchmarks. IEEE Transactions on Neural Networks and Learning Systems (2020).Google ScholarGoogle Scholar
  23. Deng-Ping Fan, Yingjie Zhai, Ali Borji, Jufeng Yang, and Ling Shao. 2020 b. BBS-Net: RGB-D salient object detection with a bifurcated backbone strategy network. In European Conference on Computer Vision. Springer, 275--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Keren Fu, Deng-Ping Fan, Ge-Peng Ji, and Qijun Zhao. 2020. JL-DCF: Joint learning and densely-cooperative fusion framework for rgb-d salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3052--3062.Google ScholarGoogle ScholarCross RefCross Ref
  25. Yuan Gao, Miaojing Shi, Dacheng Tao, and Chao Xu. 2015. Database saliency for fast image retrieval. IEEE Transactions on Multimedia, Vol. 17, 3 (2015), 359--369.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R Martin, and Shi-Min Hu. 2020. PCT: Point Cloud Transformer. arXiv preprint arXiv:2012.09688 (2020).Google ScholarGoogle Scholar
  27. Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, and Yunhe Wang. 2021. Transformer in transformer. arXiv preprint arXiv:2103.00112(2021).Google ScholarGoogle Scholar
  28. Ali Hassani, Steven Walton, Nikhil Shah, Abulikemu Abuduweili, Jiachen Li, and Humphrey Shi. 2021. Escaping the Big Data Paradigm with Compact Transformers. arXiv preprint arXiv:2104.05704 (2021).Google ScholarGoogle Scholar
  29. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  30. Seunghoon Hong, Tackgeun You, Suha Kwak, and Bohyung Han. 2015. Online tracking by learning discriminative saliency map with convolutional neural network. In International conference on machine learning. 597--606. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Nianchang Huang, Yang Yang, Dingwen Zhang, Qiang Zhang, and Jungong Han. 2021. Employing Bilinear Fusion and Saliency Prior Information for RGB-D Salient Object Detection. IEEE Transactions on Multimedia (2021).Google ScholarGoogle ScholarCross RefCross Ref
  32. Qing-Ge Ji, Zhi-Dang Fang, Zhen-Hua Xie, and Zhe-Ming Lu. 2013. Video abstraction based on the visual attention model and online clustering. Signal Processing: Image Communication, Vol. 28, 3 (2013), 241--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wei Ji, Jingjing Li, Miao Zhang, Yongri Piao, and Huchuan Lu. 2020. Accurate rgb-d salient object detection via collaborative learning. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XVIII 16. Springer, 52--69.Google ScholarGoogle Scholar
  34. Qiuping Jiang, Feng Shao, Weisi Lin, Ke Gu, Gangyi Jiang, and Huifang Sun. 2017. Optimizing multistage discriminative dictionaries for blind image quality assessment. IEEE Transactions on Multimedia, Vol. 20, 8 (2017), 2035--2048.Google ScholarGoogle ScholarCross RefCross Ref
  35. Wen-Da Jin, Jun Xu, Qi Han, Yi Zhang, and Ming-Ming Cheng. 2021. CDNet: Complementary Depth Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 3376--3390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ran Ju, Ling Ge, Wenjing Geng, Tongwei Ren, and Gangshan Wu. 2014. Depth saliency based on anisotropic center-surround difference. In 2014 IEEE international conference on image processing (ICIP). IEEE, 1115--1119.Google ScholarGoogle ScholarCross RefCross Ref
  37. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  38. Chongyi Li, Runmin Cong, Sam Kwong, Junhui Hou, Huazhu Fu, Guopu Zhu, Dingwen Zhang, and Qingming Huang. 2020 a. ASIF-Net: Attention steered interweave fusion network for RGB-D salient object detection. IEEE Transactions on Cybernetics (2020).Google ScholarGoogle Scholar
  39. Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, and Chen Change Loy. 2020 b. RGB-D salient object detection with cross-modality modulation and selection. In European Conference on Computer Vision. Springer, 225--241.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Gongyang Li, Zhi Liu, Minyu Chen, Zhen Bai, Weisi Lin, and Haibin Ling. 2021 a. Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 3528--3542.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Gongyang Li, Zhi Liu, and Haibin Ling. 2020 c. ICNet: Information Conversion Network for RGB-D Based Salient Object Detection. IEEE Transactions on Image Processing, Vol. 29 (2020), 4873--4884.Google ScholarGoogle ScholarCross RefCross Ref
  42. Yawei Li, Kai Zhang, Jiezhang Cao, Radu Timofte, and Luc Van Gool. 2021 b. LocalViT: Bringing Locality to Vision Transformers. arXiv preprint arXiv:2104.05707 (2021).Google ScholarGoogle Scholar
  43. Nian Liu, Ni Zhang, and Junwei Han. 2020 a. Learning Selective Self-Mutual Attention for RGB-D Saliency Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13756--13765.Google ScholarGoogle ScholarCross RefCross Ref
  44. Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021).Google ScholarGoogle Scholar
  45. Zhengyi Liu, Song Shi, Quntao Duan, Wei Zhang, and Peng Zhao. 2019. Salient object detection for RGB-D image by single stream recurrent convolution neural network. Neurocomputing, Vol. 363 (2019), 46--57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Zhengyi Liu, Wei Zhang, and Peng Zhao. 2020 b. A cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection. Neurocomputing, Vol. 387 (2020), 210--220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Cong Ma, Zhenjiang Miao, Xiao-Ping Zhang, and Min Li. 2017. A saliency prior context model for real-time object tracking. IEEE Transactions on Multimedia, Vol. 19, 11 (2017), 2415--2424.Google ScholarGoogle ScholarCross RefCross Ref
  48. Fuyan Ma, Bin Sun, and Shutao Li. 2021. Robust Facial Expression Recognition with Convolutional Visual Transformers. arXiv preprint arXiv:2103.16854 (2021).Google ScholarGoogle Scholar
  49. Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixe, and Christoph Feichtenhofer. 2021. TrackFormer: Multi-Object Tracking with Transformers. arXiv preprint arXiv:2101.02702 (2021).Google ScholarGoogle Scholar
  50. Yuzhen Niu, Yujie Geng, Xueqing Li, and Feng Liu. 2012. Leveraging stereopsis for saliency analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 454--461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Nabil Ouerhani and Heinz Hugli. 2000. Computing visual attention from scene depth. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, Vol. 1. IEEE, 375--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Liang Pan, Xiaofei Zhou, Ran Shi, Jiyong Zhang, and Chenggang Yan. 2020. Cross-modal feature extraction and integration based RGBD saliency detection. Image and Vision Computing, Vol. 101 (2020), 103964.Google ScholarGoogle ScholarCross RefCross Ref
  53. Youwei Pang, Lihe Zhang, Xiaoqi Zhao, and Huchuan Lu. 2020. Hierarchical dynamic filtering network for rgb-d salient object detection. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXV 16. Springer, 235--252.Google ScholarGoogle ScholarCross RefCross Ref
  54. Houwen Peng, Bing Li, Weihua Xiong, Weiming Hu, and Rongrong Ji. 2014. Rgbd salient object detection: a benchmark and algorithms. In European conference on computer vision. Springer, 92--109.Google ScholarGoogle ScholarCross RefCross Ref
  55. Federico Perazzi, Philipp Krähenbühl, Yael Pritch, and Alexander Hornung. 2012. Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 733--740. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yongri Piao, Wei Ji, Jingjing Li, Miao Zhang, and Huchuan Lu. 2019. Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE International Conference on Computer Vision. 7254--7263.Google ScholarGoogle ScholarCross RefCross Ref
  57. Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, and Huchuan Lu. 2020. A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9060--9069.Google ScholarGoogle ScholarCross RefCross Ref
  58. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.Google ScholarGoogle ScholarCross RefCross Ref
  59. Lucas Stoffl, Maxime Vidal, and Alexander Mathis. 2021. End-to-End Trainable Multi-Instance Pose Estimation with Transformers. arXiv preprint arXiv:2103.12115 (2021).Google ScholarGoogle Scholar
  60. Lei Sun, Kailun Yang, Xinxin Hu, Weijian Hu, and Kaiwei Wang. 2020. Real-time fusion network for RGB-D semantic segmentation incorporating unexpected obstacle detection for road-driving images. IEEE Robotics and Automation Letters, Vol. 5, 4 (2020), 5558--5565.Google ScholarGoogle ScholarCross RefCross Ref
  61. Peng Sun, Wenhu Zhang, Huanyu Wang, Songyuan Li, and Xi Li. 2021. Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1407--1417.Google ScholarGoogle ScholarCross RefCross Ref
  62. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc., 5998--6008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Ziyu Wan, Jingbo Zhang, Dongdong Chen, and Jing Liao. 2021. High-Fidelity Pluralistic Image Completion with Transformers. arXiv preprint arXiv:2103.14031 (2021).Google ScholarGoogle Scholar
  64. Ningning Wang and Xiaojin Gong. 2019. Adaptive fusion for RGB-D salient object detection. IEEE Access, Vol. 7 (2019), 55277--55284.Google ScholarGoogle ScholarCross RefCross Ref
  65. Wenguan Wang, Jianbing Shen, and Haibin Ling. 2018. A deep network solution for attention and aesthetics aware photo cropping. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 7 (2018), 1531--1544.Google ScholarGoogle Scholar
  66. Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122 (2021).Google ScholarGoogle Scholar
  67. Xuehao Wang, Shuai Li, Chenglizhao Chen, Yuming Fang, Aimin Hao, and Hong Qin. 2020 a. Data-level recombination and lightweight fusion scheme for RGB-D salient object detection. IEEE Transactions on Image Processing, Vol. 30 (2020), 458--471.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Yue Wang, Yuke Li, James H Elder, Runmin Wu, Huchuan Lu, and Lu Zhang. 2020 b. Synergistic saliency and depth prediction for RGB-D saliency detection. In Proceedings of the Asian Conference on Computer Vision. 1--17.Google ScholarGoogle Scholar
  69. Jun Wei, Shuhui Wang, and Qingming Huang. 2020. F3Net: Fusion, Feedback and Focus for Salient Object Detection. In Proceedings of the AAAI Conference on Artificial Intelligence. 12321--12328.Google ScholarGoogle ScholarCross RefCross Ref
  70. Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV). 3--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Masayoshi Tomizuka, Kurt Keutzer, and Peter Vajda. 2020 b. Visual transformers: Token-based image representation and processing for computer vision. arXiv preprint arXiv:2006.03677 (2020).Google ScholarGoogle Scholar
  72. Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, and Lei Zhang. 2021. CvT: Introducing Convolutions to Vision Transformers. arXiv preprint arXiv:2103.15808 (2021).Google ScholarGoogle Scholar
  73. Yu-Huan Wu, Yun Liu, Jun Xu, Jia-Wang Bian, Yuchao Gu, and Ming-Ming Cheng. 2020 a. MobileSal: Extremely Efficient RGB-D Salient Object Detection. arXiv preprint arXiv:2012.13095 (2020).Google ScholarGoogle Scholar
  74. Zhe Wu, Li Su, and Qingming Huang. 2019. Cascaded partial decoder for fast and accurate salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3907--3916.Google ScholarGoogle ScholarCross RefCross Ref
  75. Yutong Xie, Jianpeng Zhang, Chunhua Shen, and Yong Xia. 2021. CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation. arXiv preprint arXiv:2103.03024 (2021).Google ScholarGoogle Scholar
  76. Weijian Xu, Yifan Xu, Tyler Chang, and Zhuowen Tu. 2021 a. Co-Scale Conv-Attentional Image Transformers. arXiv preprint arXiv:2104.06399 (2021).Google ScholarGoogle Scholar
  77. Yifan Xu, Weijian Xu, David Cheung, and Zhuowen Tu. 2021 b. Line segment detection using transformers without edges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4257--4266.Google ScholarGoogle ScholarCross RefCross Ref
  78. Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. 2021. Tokens-to-token vit: Training vision transformers from scratch on imagenet. arXiv preprint arXiv:2101.11986 (2021).Google ScholarGoogle Scholar
  79. Jin Zeng, Yanfeng Tong, Yunmu Huang, Qiong Yan, Wenxiu Sun, Jing Chen, and Yongtian Wang. 2019. Deep surface normal estimation with hierarchical rgb-d fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6153--6162.Google ScholarGoogle ScholarCross RefCross Ref
  80. Jing Zhang, Deng-Ping Fan, Yuchao Dai, Saeed Anwar, Fatemeh Sadat Saleh, Tong Zhang, and Nick Barnes. 2020 a. UC-Net: uncertainty inspired rgb-d saliency detection via conditional variational autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8582--8591.Google ScholarGoogle ScholarCross RefCross Ref
  81. Miao Zhang, Weisong Ren, Yongri Piao, Zhengkun Rong, and Huchuan Lu. 2020 c. Select, Supplement and Focus for RGB-D Saliency Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3472--3481.Google ScholarGoogle ScholarCross RefCross Ref
  82. Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, and Jianfeng Gao. 2021 a. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding. arXiv preprint arXiv:2103.15358 (2021).Google ScholarGoogle Scholar
  83. Pingping Zhang, Wei Liu, Dong Wang, Yinjie Lei, Hongyu Wang, and Huchuan Lu. 2020 b. Non-rigid object tracking via deep multi-scale spatial-temporal discriminative saliency maps. Pattern Recognition, Vol. 100 (2020), 107130.Google ScholarGoogle ScholarCross RefCross Ref
  84. Yundong Zhang, Huiye Liu, and Qiang Hu. 2021 c. Transfuse: Fusing transformers and cnns for medical image segmentation. arXiv preprint arXiv:2102.08005 (2021).Google ScholarGoogle Scholar
  85. Zhao Zhang, Zheng Lin, Jun Xu, Wen-Da Jin, Shao-Ping Lu, and Deng-Ping Fan. 2021 b. Bilateral attention network for RGB-D salient object detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 1949--1961.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, and Vladlen Koltun. 2020 a. Point transformer. arXiv preprint arXiv:2012.09164 (2020).Google ScholarGoogle Scholar
  87. Jiaojiao Zhao, Xinyu Li, Chunhui Liu, Shuai Bing, Hao Chen, Cees GM Snoek, and Joseph Tighe. 2021. TubeR: Tube-Transformer for Action Detection. arXiv preprint arXiv:2104.00969 (2021).Google ScholarGoogle Scholar
  88. Jiawei Zhao, Yifan Zhao, Jia Li, and Xiaowu Chen. 2020 c. Is depth really necessary for salient object detection?. In Proceedings of the 28th ACM International Conference on Multimedia. 1745--1754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Jia-Xing Zhao, Yang Cao, Deng-Ping Fan, Ming-Ming Cheng, Xuan-Yi Li, and Le Zhang. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3927--3936.Google ScholarGoogle ScholarCross RefCross Ref
  90. Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, and Lei Zhang. 2020 b. A single stream network for robust and real-time rgb-d salient object detection. In European Conference on Computer Vision. Springer, 646--662.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Chuanxia Zheng, Tat-Jen Cham, and Jianfei Cai. 2021 a. TFill: Image Completion via a Transformer-Based Architecture. arXiv preprint arXiv:2104.00845 (2021).Google ScholarGoogle Scholar
  92. Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, et al. 2021 b. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6881--6890.Google ScholarGoogle ScholarCross RefCross Ref
  93. Chunbiao Zhu, Xing Cai, Kan Huang, Thomas H Li, and Ge Li. 2019. PDNet: Prior-model guided depth-enhanced network for salient object detection. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 199--204.Google ScholarGoogle ScholarCross RefCross Ref
  94. Kuan Zhu, Haiyun Guo, Shiliang Zhang, Yaowei Wang, Gaopan Huang, Honglin Qiao, Jing Liu, Jinqiao Wang, and Ming Tang. 2021. AAformer: Auto-Aligned Transformer for Person Re-Identification. arXiv preprint arXiv:2104.00921 (2021).Google ScholarGoogle Scholar

Index Terms

  1. TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '21: Proceedings of the 29th ACM International Conference on Multimedia
      October 2021
      5796 pages
      ISBN:9781450386517
      DOI:10.1145/3474085

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2021

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader