Quaternion Equivariant Capsule Networks for 3D Point Clouds

Zhao, Yongheng; Birdal, Tolga; Lenssen, Jan Eric; Menegatti, Emanuele; Guibas, Leonidas; Tombari, Federico

doi:10.1007/978-3-030-58452-8_1

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12346))

Included in the following conference series:

European Conference on Computer Vision

15k Accesses
36 Citations

Abstract

We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points. The operator receives a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end transformation equivariance through a novel dynamic routing procedure on quaternions. Further, we theoretically connect dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving iterative re-weighted least squares (IRLS) problems with provable convergence properties. It is shown that such group dynamic routing can be interpreted as robust IRLS rotation averaging on capsule votes, where information is routed based on the final inlier scores. Based on our operator, we build a capsule network that disentangles geometry from pose, paving the way for more informative descriptors and a structured latent space. Our architecture allows joint object classification and orientation estimation without explicit supervision of rotations. We validate our algorithm empirically on common benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Afshar, P., Mohammadi, A., Plataniotis, K.N.: Brain tumor type classification via capsule networks. In: 2018 25th IEEE International Conference on Image Processing (ICIP) (2018)
Google Scholar
Aftab, K., Hartley, R.: Convergence of iteratively re-weighted least squares to robust m-estimators. In: Winter Conference on Applications of Computer Vision. IEEE (2015)
Google Scholar
Aftab, K., Hartley, R., Trumpf, J.: Generalized Weiszfeld algorithms for Lq optimization. IEEE Trans. Pattern Anal. Mach. Intell. 37(4), 728–745 (2014)
Article Google Scholar
Aftab, K., Hartley, R., Trumpf, J.: \(l_q\) closest-point to affine subspaces using the generalized Weiszfeld algorithm. Int. J. Comput. Vis. 114, 1–15 (2015)
Article MathSciNet Google Scholar
Aoki, Y., Goforth, H., Srivatsan, R.A., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7163–7172 (2019)
Google Scholar
Bao, E., Song, L.: Equivariant neural networks and equivarification. arXiv preprint arXiv:1906.07172 (2019)
Becigneul, G., Ganea, O.E.: Riemannian adaptive optimization methods. In: International Conference on Learning Representations (2019)
Google Scholar
Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611, pp. 586–606. International Society for Optics and Photonics (1992)
Google Scholar
Birdal, T., Arbel, M., Simsekli, U., Guibas, L.J.: Synchronizing probability measures on rotations via optimal transport. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1569–1579 (2020)
Google Scholar
Birdal, T., Ilic, S.: Point pair features based object detection and pose estimation revisited. In: 2015 International Conference on 3D Vision, pp. 527–535. IEEE (2015)
Google Scholar
Birdal, T., Ilic, S.: A point sampling algorithm for 3D matching of irregular geometries. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (2017)
Google Scholar
Birdal, T., Simsekli, U., Eken, M.O., Ilic, S.: Bayesian pose graph optimization via Bingham distributions and tempered geodesic MCMC. In: Advances in Neural Information Processing Systems, pp. 308–319 (2018)
Google Scholar
Boomsma, W., Frellsen, J.: Spherical convolutions and their application in molecular modelling. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3433–3443 (2017)
Google Scholar
Burrus, C.S.: Iterative reweighted least squares. OpenStax CNX (2012). http://cnx.org/contents/92b90377-2b34-49e4-b26f-7fe572db78a1
Busam, B., Birdal, T., Navab, N.: Camera pose filtering with local regression geodesics on the Riemannian manifold of dual quaternions. In: IEEE International Conference on Computer Vision Workshop (ICCVW) (October 2017)
Google Scholar
Chakraborty, R., Banerjee, M., Vemuri, B.C.: H-CNNs: convolutional neural networks for Riemannian homogeneous spaces. arXiv preprint arXiv:1805.05487 (2018)
Cohen, T., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Proceedings of the 36th International Conference on Machine Learning, pp. 1321–1330 (2019)
Google Scholar
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning, pp. 2990–2999 (2016)
Google Scholar
Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. In: 6th International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Cohen, T.S., Geiger, M., Weiler, M.: A general theory of equivariant CNNs on homogeneous spaces. In: Advances in Neural Information Processing Systems, pp. 9145–9156 (2019)
Google Scholar
Cohen, T.S., Welling, M.: Steerable CNNs. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Cruz-Mota, J., Bogdanova, I., Paquier, B., Bierlaire, M., Thiran, J.P.: Scale invariant feature transform on the sphere: theory and applications. Int. J. Comput. Vis. 98(2), 217–241 (2012)
Article MathSciNet Google Scholar
Deng, H., Birdal, T., Ilic, S.: PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 620–638. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_37
Chapter Google Scholar
Deng, H., Birdal, T., Ilic, S.: PPFNet: global context aware local features for robust 3D point matching. In: Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning SO(3) equivariant representations with spherical CNNs. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 54–70. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_4
Chapter Google Scholar
Esteves, C., Sud, A., Luo, Z., Daniilidis, K., Makadia, A.: Cross-domain 3D equivariant image embeddings. In: International Conference on Machine Learning (ICML) (2019)
Google Scholar
Esteves, C., Xu, Y., Allen-Blanchette, C., Daniilidis, K.: Equivariant multi-view networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1568–1577 (2019)
Google Scholar
Fey, M., Eric Lenssen, J., Weichert, F., Müller, H.: SplineCNN: fast geometric deep learning with continuous B-spline kernels. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Google Scholar
Giles, C.L., Maxwell, T.: Learning, invariance, and generalization in high-order neural networks. Appl. Opt. 26(23), 4972–4978 (1987)
Article Google Scholar
Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 44–51. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_6
Chapter Google Scholar
Hoppe, H., DeRose, T., Duchamp, T., McDonald, J., Stuetzle, W.: Surface reconstruction from unorganized points. SIGGRAPH Comput. Graph. 26(2), 71–78 (1992)
Article Google Scholar
Jaiswal, A., AbdAlmageed, W., Wu, Y., Natarajan, P.: CapsuleGAN: generative adversarial capsule network. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 526–535. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_38
Chapter Google Scholar
Jiang, C.M., Huang, J., Kashinath, K., Prabhat, Marcus, P., Niessner, M.: Spherical CNNs on unstructured grids. In: International Conference on Learning Representations (2019)
Google Scholar
Khoury, M., Zhou, Q.Y., Koltun, V.: Learning compact geometric features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–161 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kondor, R., Lin, Z., Trivedi, S.: Clebsch-Gordan Nets: a fully Fourier space spherical convolutional neural network. In: Advances in Neural Information Processing Systems (2018)
Google Scholar
Kondor, R., Trivedi, S.: On the generalization of equivariance and convolution in neural networks to the action of compact groups. In: International Conference on Machine Learning, pp. 2747–2755 (2018)
Google Scholar
Kosiorek, A., Sabour, S., Teh, Y.W., Hinton, G.E.: Stacked capsule autoencoders. In: Advances in Neural Information Processing Systems, pp. 15512–15522 (2019)
Google Scholar
Laue, S., Mitterreiter, M., Giesen, J.: Computing higher order derivatives of matrix and tensor expressions. In: Advances in Neural Information Processing Systems (2018)
Google Scholar
Lenssen, J.E., Fey, M., Libuschewski, P.: Group equivariant capsule networks. In: Advances in Neural Information Processing Systems, pp. 8844–8853 (2018)
Google Scholar
Li, J., Chen, B.M., Hee Lee, G.: SO-Net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-transformed points. In: Advances in Neural Information Processing Systems (2018)
Google Scholar
Liao, S., Gavves, E., Snoek, C.G.: Spherical regression: learning viewpoints, surface normals and 3D rotations on n-spheres. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9759–9767 (2019)
Google Scholar
Liu, M., Yao, F., Choi, C., Ayan, S., Ramani, K.: Deep learning 3D shapes using alt-az anisotropic 2-sphere convolution. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Liu, X., Han, Z., Liu, Y.S., Zwicker, M.: Point2Sequence: learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. Proc. AAAI Conf. Artif. Intell. 33, 8778–8785 (2019)
Google Scholar
Magnus, J.R.: On differentiating eigenvalues and eigenvectors. Econom. Theor. 1(2), 179–191 (1985)
Article Google Scholar
Marcos, D., Volpi, M., Komodakis, N., Tuia, D.: Rotation equivariant vector field networks. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)
Google Scholar
Markley, F.L., Cheng, Y., Crassidis, J.L., Oshman, Y.: Averaging quaternions. J. Guid. Control Dyn. 30(4), 1193–1197 (2007)
Article Google Scholar
Maturana, D., Scherer, S.: VoxNet: A 3D convolutional neural network for real-time object recognition. In: Intelligent Robots and Systems (IROS). IEEE (2015)
Google Scholar
Mehr, E., Lieutier, A., Sanchez Bermudez, F., Guitteny, V., Thome, N., Cord, M.: Manifold learning in quotient spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9165–9174 (2018)
Google Scholar
Melzi, S., Spezialetti, R., Tombari, F., Bronstein, M.M., Stefano, L.D., Rodola, E.: GFrames: gradient-based local reference frame for 3D shape matching. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Google Scholar
Petrelli, A., Di Stefano, L.: On the repeatability of the local reference frame for partial shape matching. In: 2011 International Conference on Computer Vision. IEEE (2011)
Google Scholar
Petrelli, A., Di Stefano, L.: A repeatable and efficient canonical reference for surface matching. In: 2012 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 403–410. IEEE (2012)
Google Scholar
Poulenard, A., Ovsjanikov, M.: Multi-directional geodesic neural networks via equivariant convolution. In: SIGGRAPH Asia 2018 Technical Papers, p. 236. ACM (2018)
Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Google Scholar
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2016)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Google Scholar
Rezatofighi, S.H., Milan, A., Abbasnejad, E., Dick, A., Reid, I., et al.: DeepSetNet: predicting sets with deep neural networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5257–5266. IEEE (2017)
Google Scholar
Sabour, S., Frosst, N., Hinton, G.: Matrix capsules with EM routing. In: 6th International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)
Google Scholar
Schütt, K., Kindermans, P.J., Sauceda Felix, H.E., Chmiela, S., Tkatchenko, A., Müller, K.R.: SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Shen, Y., Feng, C., Yang, Y., Tian, D.: Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4548–4557 (2018)
Google Scholar
Spezialetti, R., Salti, S., Stefano, L.D.: Learning an effective equivariant 3D descriptor without supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6401–6410 (2019)
Google Scholar
Srivastava, N., Goh, H., Salakhutdinov, R.: Geometric capsule autoencoders for 3D point clouds. arXiv preprint arXiv:1912.03310 (2019)
Steenrod, N.E.: The Topology of Fibre Bundles, vol. 14. Princeton University Press, Princeton (1951)
Book Google Scholar
Thomas, N., et al.: Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219 (2018)
Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. Lecture Notes in Computer Science, vol. 6313. Springer, Heidelberg (2010)
Google Scholar
Wang, D., Liu, Q.: An optimization view on dynamic routing between capsules (2018). https://openreview.net/forum?id=HJjtFYJDf
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)
Google Scholar
Wang, S., Suo, S., Ma, W.C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–2 (2019)
Article Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)
Article Google Scholar
Weiler, M., Geiger, M., Welling, M., Boomsma, W., Cohen, T.: 3D Steerable CNNs: learning rotationally equivariant features in volumetric data. In: Advances in Neural Information Processing Systems, pp. 10381–10392 (2018)
Google Scholar
Weiler, M., Hamprecht, F.A., Storath, M.: Learning steerable filters for rotation equivariant CNNs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Google Scholar
Worrall, D., Brostow, G.: CubeNet: equivariance to 3D rotation and translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 585–602. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_35
Chapter Google Scholar
Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)
Google Scholar
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Google Scholar
Xinyi, Z., Chen, L.: Capsule graph neural network. In: International Conference on Learning Representations (ICLR) (2019). openreview.net/forum?id=Byl8BnRcYm
You, Y., Lou, Y., Liu, Q., Tai, Y.W., Ma, L., Lu, C., Wang, W.: Pointwise rotation-invariant network with adaptive sampling and 3D spherical voxel convolution. In: AAAI. pp. 12717–12724 (2020)
Google Scholar
Yuan, W., Held, D., Mertz, C., Hebert, M.: Iterative transformer network for 3D point cloud. arXiv preprint arXiv:1811.11209 (2018)
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Zhang, X., Qin, S., Xu, Y., Xu, H.: Quaternion product units for deep learning on 3D rotation groups. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7304–7313 (2020)
Google Scholar
Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Padova, Padova, Italy
Yongheng Zhao & Emanuele Menegatti
Stanford University, Stanford, USA
Tolga Birdal & Leonidas Guibas
TU Munich, Munich, Germany
Yongheng Zhao & Federico Tombari
TU Dortmund, Dortmund, Germany
Jan Eric Lenssen
Google, Mountain View, USA
Federico Tombari

Authors

Yongheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Tolga Birdal
View author publications
You can also search for this author in PubMed Google Scholar
Jan Eric Lenssen
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Menegatti
View author publications
You can also search for this author in PubMed Google Scholar
Leonidas Guibas
View author publications
You can also search for this author in PubMed Google Scholar
Federico Tombari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tolga Birdal .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3090 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Y., Birdal, T., Lenssen, J.E., Menegatti, E., Guibas, L., Tombari, F. (2020). Quaternion Equivariant Capsule Networks for 3D Point Clouds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12346. Springer, Cham. https://doi.org/10.1007/978-3-030-58452-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-58452-8_1
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58451-1
Online ISBN: 978-3-030-58452-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics