6D Pose Estimation Based on the Adaptive Weight of RGB-D Feature

Zhang, Gengshen; Ning, Li; Feng, Liangbing

doi:10.1007/978-3-030-69244-5_12

Gengshen Zhang^11,12,
Li Ning^11,12 &
Liangbing Feng¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12606))

Included in the following conference series:

International Conference on Parallel and Distributed Computing: Applications and Technologies

1053 Accesses

Abstract

In the task of 6D pose estimation by RGB-D image, the crucial problem is how to make the most of two types of features respectively from RGB and depth input. As far as we know, prior approaches treat those two sources equally, which may overlook that the different combinations of those two properties could have varying degrees of impact. Therefore, we propose a Feature Selecting Mechanism (FSM) in this paper to find the most suitable ratio of feature dimension from RGB image and point cloud (converted from depth image) to predict the 6D pose more effectively. We first conduct artificial selection in our Feature Selecting Mechanism (FSM) to prove the potential for the weight of the RGB-D feature. Afterward, the neural network is deployed in our FSM to adaptively pick out features from RGB-D input. Through our experiments on the LINEMOD dataset, YCB-Video dataset, and our multi-pose synthetic image dataset, we show that there is an up to 2% improvement in the accuracy by utilizing our FSM, compared to the state-of-the-art method.

This work is supported by NSFC 12071460, Shenzhen research grant (KQJSCX20180330170311901, JCYJ20180305180840138 and GGFW2017073114031767).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Deng, X., Xiang, Y., Mousavian, A., Eppner, C., Bretl, T., Fox, D.: Self-supervised 6d object pose estimation for robot manipulation. CoRR abs/1909.10159 (2019)
Google Scholar
Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., Birchfield, S.: Deep object pose estimation for semantic robotic grasping of household objects. CoRR abs/1809.10790 (2018)
Google Scholar
Marchand, É., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. IEEE Trans. Vis. Comput. Graph. 22, 2633–2651 (2016)
Article Google Scholar
Shahrokni, A., Vacchetti, L., Lepetit, V., Fua, P.: Polyhedral object detection and pose estimation for augmented reality applications. In: Computer Animation 2002, CA 2002, pp. 65–72 (2002)
Google Scholar
Brachmann, E., Michel, F., Krull, A., Yang, M.Y., Gumhold, S., Rother, C.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 3364–3372 (2016)
Google Scholar
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE International Conference on Computer Vision, ICCV 2017, pp. 1530–1538 (2017)
Google Scholar
Kehl, W., Milletari, F., Tombari, F., Ilic, S., Navab, N.: Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 205–220. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_13
Chapter Google Scholar
Li, C., Bai, J., Hager, G.D.: A unified framework for multi-view multi-class object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 263–281. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_16
Chapter Google Scholar
Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 3343–3352 (2019)
Google Scholar
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35
Chapter Google Scholar
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Chapter Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: Robotics: Science and Systems XIV, Carnegie Mellon University (2018)
Google Scholar
Wu, C., Fraundorfer, F., Frahm, J., Pollefeys, M.: 3D model search and pose estimation from single images using VIP features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2008, pp. 1–8 (2008)
Google Scholar
Zhu, M., et al.: Single image 3D object detection and pose estimation for grasping. In: 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, pp. 3936–3943 (2014)
Google Scholar
Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: IEEE International Conference on Computer Vision, ICCV 2011, pp. 858–865 (2011)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Wu, C., Clipp, B., Li, X., Frahm, J., Pollefeys, M.: 3D model matching with viewpoint-invariant patches (VIP). In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2008 (2008)
Google Scholar
Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 3109–3118 (2015)
Google Scholar
Xu, D., Anguelov, D., Jain, A.: PointFusion: deep sensor fusion for 3D bounding box estimation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 244–253 (2018)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Article MathSciNet Google Scholar
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_7
Chapter Google Scholar
Liang, Y., Cai, Z., Jiguo, Yu., Han, Q., Li, Y.: Deep learning based inference of private information using embedded sensors in smart devices. IEEE Netw. Mag. 32(4), 8–14 (2018)
Article Google Scholar
Cai, Z., Xu, Z., Yu, J.: A differential-private framework for urban traffic flows estimation via taxi companies. IEEE Trans. Ind. Inform. (TII) 15(12), 6492–6499 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Beijing, China
Gengshen Zhang & Li Ning
University of Chinese Academy of Sciences, Beijing, China
Gengshen Zhang & Li Ning
Shenzhen CosmosVision Technology Co., LTD., Shenzhen, China
Liangbing Feng

Authors

Gengshen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Li Ning
View author publications
You can also search for this author in PubMed Google Scholar
Liangbing Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Ning .

Editor information

Editors and Affiliations

Shenzhen Institutes of Advanced Technology, Shenzhen, China
Yong Zhang
Shenzhen Institutes of Advanced Technology, Shenzhen, China
Yicheng Xu
Griffith University, Gold Coast, QLD, Australia
Hui Tian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, G., Ning, L., Feng, L. (2021). 6D Pose Estimation Based on the Adaptive Weight of RGB-D Feature. In: Zhang, Y., Xu, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2020. Lecture Notes in Computer Science(), vol 12606. Springer, Cham. https://doi.org/10.1007/978-3-030-69244-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-69244-5_12
Published: 21 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69243-8
Online ISBN: 978-3-030-69244-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics