Abstract
In the task of 6D pose estimation by RGB-D image, the crucial problem is how to make the most of two types of features respectively from RGB and depth input. As far as we know, prior approaches treat those two sources equally, which may overlook that the different combinations of those two properties could have varying degrees of impact. Therefore, we propose a Feature Selecting Mechanism (FSM) in this paper to find the most suitable ratio of feature dimension from RGB image and point cloud (converted from depth image) to predict the 6D pose more effectively. We first conduct artificial selection in our Feature Selecting Mechanism (FSM) to prove the potential for the weight of the RGB-D feature. Afterward, the neural network is deployed in our FSM to adaptively pick out features from RGB-D input. Through our experiments on the LINEMOD dataset, YCB-Video dataset, and our multi-pose synthetic image dataset, we show that there is an up to 2% improvement in the accuracy by utilizing our FSM, compared to the state-of-the-art method.
This work is supported by NSFC 12071460, Shenzhen research grant (KQJSCX20180330170311901, JCYJ20180305180840138 and GGFW2017073114031767).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Deng, X., Xiang, Y., Mousavian, A., Eppner, C., Bretl, T., Fox, D.: Self-supervised 6d object pose estimation for robot manipulation. CoRR abs/1909.10159 (2019)
Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., Birchfield, S.: Deep object pose estimation for semantic robotic grasping of household objects. CoRR abs/1809.10790 (2018)
Marchand, É., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. IEEE Trans. Vis. Comput. Graph. 22, 2633–2651 (2016)
Shahrokni, A., Vacchetti, L., Lepetit, V., Fua, P.: Polyhedral object detection and pose estimation for augmented reality applications. In: Computer Animation 2002, CA 2002, pp. 65–72 (2002)
Brachmann, E., Michel, F., Krull, A., Yang, M.Y., Gumhold, S., Rother, C.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, 3364–3372 (2016)
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: IEEE International Conference on Computer Vision, ICCV 2017, pp. 1530–1538 (2017)
Kehl, W., Milletari, F., Tombari, F., Ilic, S., Navab, N.: Deep learning of local RGB-D patches for 3D object detection and 6D pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 205–220. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_13
Li, C., Bai, J., Hager, G.D.: A unified framework for multi-view multi-class object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 263–281. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_16
Wang, C., et al.: DenseFusion: 6D object pose estimation by iterative dense fusion. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 3343–3352 (2019)
Brachmann, E., Krull, A., Michel, F., Gumhold, S., Shotton, J., Rother, C.: Learning 6D object pose estimation using 3D object coordinates. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 536–551. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_35
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: PoseCNN: a convolutional neural network for 6D object pose estimation in cluttered scenes. In: Robotics: Science and Systems XIV, Carnegie Mellon University (2018)
Wu, C., Fraundorfer, F., Frahm, J., Pollefeys, M.: 3D model search and pose estimation from single images using VIP features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2008, pp. 1–8 (2008)
Zhu, M., et al.: Single image 3D object detection and pose estimation for grasping. In: 2014 IEEE International Conference on Robotics and Automation, ICRA 2014, pp. 3936–3943 (2014)
Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: IEEE International Conference on Computer Vision, ICCV 2011, pp. 858–865 (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Wu, C., Clipp, B., Li, X., Frahm, J., Pollefeys, M.: 3D model matching with viewpoint-invariant patches (VIP). In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2008 (2008)
Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 3109–3118 (2015)
Xu, D., Anguelov, D., Jain, A.: PointFusion: deep sensor fusion for 3D bounding box estimation. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 244–253 (2018)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)
Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-encoders for hierarchical feature extraction. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 52–59. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_7
Liang, Y., Cai, Z., Jiguo, Yu., Han, Q., Li, Y.: Deep learning based inference of private information using embedded sensors in smart devices. IEEE Netw. Mag. 32(4), 8–14 (2018)
Cai, Z., Xu, Z., Yu, J.: A differential-private framework for urban traffic flows estimation via taxi companies. IEEE Trans. Ind. Inform. (TII) 15(12), 6492–6499 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, G., Ning, L., Feng, L. (2021). 6D Pose Estimation Based on the Adaptive Weight of RGB-D Feature. In: Zhang, Y., Xu, Y., Tian, H. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2020. Lecture Notes in Computer Science(), vol 12606. Springer, Cham. https://doi.org/10.1007/978-3-030-69244-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-69244-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69243-8
Online ISBN: 978-3-030-69244-5
eBook Packages: Computer ScienceComputer Science (R0)