Skip to main content

Quaternion Equivariant Capsule Networks for 3D Point Clouds

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Abstract

We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points. The operator receives a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end transformation equivariance through a novel dynamic routing procedure on quaternions. Further, we theoretically connect dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving iterative re-weighted least squares (IRLS) problems with provable convergence properties. It is shown that such group dynamic routing can be interpreted as robust IRLS rotation averaging on capsule votes, where information is routed based on the final inlier scores. Based on our operator, we build a capsule network that disentangles geometry from pose, paving the way for more informative descriptors and a structured latent space. Our architecture allows joint object classification and orientation estimation without explicit supervision of rotations. We validate our algorithm empirically on common benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Afshar, P., Mohammadi, A., Plataniotis, K.N.: Brain tumor type classification via capsule networks. In: 2018 25th IEEE International Conference on Image Processing (ICIP) (2018)

    Google Scholar 

  2. Aftab, K., Hartley, R.: Convergence of iteratively re-weighted least squares to robust m-estimators. In: Winter Conference on Applications of Computer Vision. IEEE (2015)

    Google Scholar 

  3. Aftab, K., Hartley, R., Trumpf, J.: Generalized Weiszfeld algorithms for Lq optimization. IEEE Trans. Pattern Anal. Mach. Intell. 37(4), 728–745 (2014)

    Article  Google Scholar 

  4. Aftab, K., Hartley, R., Trumpf, J.: \(l_q\) closest-point to affine subspaces using the generalized Weiszfeld algorithm. Int. J. Comput. Vis. 114, 1–15 (2015)

    Article  MathSciNet  Google Scholar 

  5. Aoki, Y., Goforth, H., Srivatsan, R.A., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7163–7172 (2019)

    Google Scholar 

  6. Bao, E., Song, L.: Equivariant neural networks and equivarification. arXiv preprint arXiv:1906.07172 (2019)

  7. Becigneul, G., Ganea, O.E.: Riemannian adaptive optimization methods. In: International Conference on Learning Representations (2019)

    Google Scholar 

  8. Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611, pp. 586–606. International Society for Optics and Photonics (1992)

    Google Scholar 

  9. Birdal, T., Arbel, M., Simsekli, U., Guibas, L.J.: Synchronizing probability measures on rotations via optimal transport. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1569–1579 (2020)

    Google Scholar 

  10. Birdal, T., Ilic, S.: Point pair features based object detection and pose estimation revisited. In: 2015 International Conference on 3D Vision, pp. 527–535. IEEE (2015)

    Google Scholar 

  11. Birdal, T., Ilic, S.: A point sampling algorithm for 3D matching of irregular geometries. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (2017)

    Google Scholar 

  12. Birdal, T., Simsekli, U., Eken, M.O., Ilic, S.: Bayesian pose graph optimization via Bingham distributions and tempered geodesic MCMC. In: Advances in Neural Information Processing Systems, pp. 308–319 (2018)

    Google Scholar 

  13. Boomsma, W., Frellsen, J.: Spherical convolutions and their application in molecular modelling. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3433–3443 (2017)

    Google Scholar 

  14. Burrus, C.S.: Iterative reweighted least squares. OpenStax CNX (2012). http://cnx.org/contents/92b90377-2b34-49e4-b26f-7fe572db78a1

  15. Busam, B., Birdal, T., Navab, N.: Camera pose filtering with local regression geodesics on the Riemannian manifold of dual quaternions. In: IEEE International Conference on Computer Vision Workshop (ICCVW) (October 2017)

    Google Scholar 

  16. Chakraborty, R., Banerjee, M., Vemuri, B.C.: H-CNNs: convolutional neural networks for Riemannian homogeneous spaces. arXiv preprint arXiv:1805.05487 (2018)

  17. Cohen, T., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Proceedings of the 36th International Conference on Machine Learning, pp. 1321–1330 (2019)

    Google Scholar 

  18. Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning, pp. 2990–2999 (2016)

    Google Scholar 

  19. Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. In: 6th International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  20. Cohen, T.S., Geiger, M., Weiler, M.: A general theory of equivariant CNNs on homogeneous spaces. In: Advances in Neural Information Processing Systems, pp. 9145–9156 (2019)

    Google Scholar 

  21. Cohen, T.S., Welling, M.: Steerable CNNs. In: International Conference on Learning Representations (ICLR) (2017)

    Google Scholar 

  22. Cruz-Mota, J., Bogdanova, I., Paquier, B., Bierlaire, M., Thiran, J.P.: Scale invariant feature transform on the sphere: theory and applications. Int. J. Comput. Vis. 98(2), 217–241 (2012)

    Article  MathSciNet  Google Scholar 

  23. Deng, H., Birdal, T., Ilic, S.: PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 620–638. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_37

    Chapter  Google Scholar 

  24. Deng, H., Birdal, T., Ilic, S.: PPFNet: global context aware local features for robust 3D point matching. In: Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  25. Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning SO(3) equivariant representations with spherical CNNs. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 54–70. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_4

    Chapter  Google Scholar 

  26. Esteves, C., Sud, A., Luo, Z., Daniilidis, K., Makadia, A.: Cross-domain 3D equivariant image embeddings. In: International Conference on Machine Learning (ICML) (2019)

    Google Scholar 

  27. Esteves, C., Xu, Y., Allen-Blanchette, C., Daniilidis, K.: Equivariant multi-view networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1568–1577 (2019)

    Google Scholar 

  28. Fey, M., Eric Lenssen, J., Weichert, F., Müller, H.: SplineCNN: fast geometric deep learning with continuous B-spline kernels. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)

    Google Scholar 

  29. Giles, C.L., Maxwell, T.: Learning, invariance, and generalization in high-order neural networks. Appl. Opt. 26(23), 4972–4978 (1987)

    Article  Google Scholar 

  30. Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 44–51. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_6

    Chapter  Google Scholar 

  31. Hoppe, H., DeRose, T., Duchamp, T., McDonald, J., Stuetzle, W.: Surface reconstruction from unorganized points. SIGGRAPH Comput. Graph. 26(2), 71–78 (1992)

    Article  Google Scholar 

  32. Jaiswal, A., AbdAlmageed, W., Wu, Y., Natarajan, P.: CapsuleGAN: generative adversarial capsule network. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 526–535. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_38

    Chapter  Google Scholar 

  33. Jiang, C.M., Huang, J., Kashinath, K., Prabhat, Marcus, P., Niessner, M.: Spherical CNNs on unstructured grids. In: International Conference on Learning Representations (2019)

    Google Scholar 

  34. Khoury, M., Zhou, Q.Y., Koltun, V.: Learning compact geometric features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–161 (2017)

    Google Scholar 

  35. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  36. Kondor, R., Lin, Z., Trivedi, S.: Clebsch-Gordan Nets: a fully Fourier space spherical convolutional neural network. In: Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  37. Kondor, R., Trivedi, S.: On the generalization of equivariance and convolution in neural networks to the action of compact groups. In: International Conference on Machine Learning, pp. 2747–2755 (2018)

    Google Scholar 

  38. Kosiorek, A., Sabour, S., Teh, Y.W., Hinton, G.E.: Stacked capsule autoencoders. In: Advances in Neural Information Processing Systems, pp. 15512–15522 (2019)

    Google Scholar 

  39. Laue, S., Mitterreiter, M., Giesen, J.: Computing higher order derivatives of matrix and tensor expressions. In: Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  40. Lenssen, J.E., Fey, M., Libuschewski, P.: Group equivariant capsule networks. In: Advances in Neural Information Processing Systems, pp. 8844–8853 (2018)

    Google Scholar 

  41. Li, J., Chen, B.M., Hee Lee, G.: SO-Net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  42. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-transformed points. In: Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  43. Liao, S., Gavves, E., Snoek, C.G.: Spherical regression: learning viewpoints, surface normals and 3D rotations on n-spheres. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9759–9767 (2019)

    Google Scholar 

  44. Liu, M., Yao, F., Choi, C., Ayan, S., Ramani, K.: Deep learning 3D shapes using alt-az anisotropic 2-sphere convolution. In: International Conference on Learning Representations (ICLR) (2019)

    Google Scholar 

  45. Liu, X., Han, Z., Liu, Y.S., Zwicker, M.: Point2Sequence: learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. Proc. AAAI Conf. Artif. Intell. 33, 8778–8785 (2019)

    Google Scholar 

  46. Magnus, J.R.: On differentiating eigenvalues and eigenvectors. Econom. Theor. 1(2), 179–191 (1985)

    Article  Google Scholar 

  47. Marcos, D., Volpi, M., Komodakis, N., Tuia, D.: Rotation equivariant vector field networks. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)

    Google Scholar 

  48. Markley, F.L., Cheng, Y., Crassidis, J.L., Oshman, Y.: Averaging quaternions. J. Guid. Control Dyn. 30(4), 1193–1197 (2007)

    Article  Google Scholar 

  49. Maturana, D., Scherer, S.: VoxNet: A 3D convolutional neural network for real-time object recognition. In: Intelligent Robots and Systems (IROS). IEEE (2015)

    Google Scholar 

  50. Mehr, E., Lieutier, A., Sanchez Bermudez, F., Guitteny, V., Thome, N., Cord, M.: Manifold learning in quotient spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9165–9174 (2018)

    Google Scholar 

  51. Melzi, S., Spezialetti, R., Tombari, F., Bronstein, M.M., Stefano, L.D., Rodola, E.: GFrames: gradient-based local reference frame for 3D shape matching. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)

    Google Scholar 

  52. Petrelli, A., Di Stefano, L.: On the repeatability of the local reference frame for partial shape matching. In: 2011 International Conference on Computer Vision. IEEE (2011)

    Google Scholar 

  53. Petrelli, A., Di Stefano, L.: A repeatable and efficient canonical reference for surface matching. In: 2012 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 403–410. IEEE (2012)

    Google Scholar 

  54. Poulenard, A., Ovsjanikov, M.: Multi-directional geodesic neural networks via equivariant convolution. In: SIGGRAPH Asia 2018 Technical Papers, p. 236. ACM (2018)

    Google Scholar 

  55. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)

    Google Scholar 

  56. Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2016)

    Google Scholar 

  57. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)

    Google Scholar 

  58. Rezatofighi, S.H., Milan, A., Abbasnejad, E., Dick, A., Reid, I., et al.: DeepSetNet: predicting sets with deep neural networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5257–5266. IEEE (2017)

    Google Scholar 

  59. Sabour, S., Frosst, N., Hinton, G.: Matrix capsules with EM routing. In: 6th International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  60. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)

    Google Scholar 

  61. Schütt, K., Kindermans, P.J., Sauceda Felix, H.E., Chmiela, S., Tkatchenko, A., Müller, K.R.: SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  62. Shen, Y., Feng, C., Yang, Y., Tian, D.: Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4548–4557 (2018)

    Google Scholar 

  63. Spezialetti, R., Salti, S., Stefano, L.D.: Learning an effective equivariant 3D descriptor without supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6401–6410 (2019)

    Google Scholar 

  64. Srivastava, N., Goh, H., Salakhutdinov, R.: Geometric capsule autoencoders for 3D point clouds. arXiv preprint arXiv:1912.03310 (2019)

  65. Steenrod, N.E.: The Topology of Fibre Bundles, vol. 14. Princeton University Press, Princeton (1951)

    Book  Google Scholar 

  66. Thomas, N., et al.: Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219 (2018)

  67. Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. Lecture Notes in Computer Science, vol. 6313. Springer, Heidelberg (2010)

    Google Scholar 

  68. Wang, D., Liu, Q.: An optimization view on dynamic routing between capsules (2018). https://openreview.net/forum?id=HJjtFYJDf

  69. Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)

    Google Scholar 

  70. Wang, S., Suo, S., Ma, W.C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)

    Google Scholar 

  71. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–2 (2019)

    Article  Google Scholar 

  72. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)

    Article  Google Scholar 

  73. Weiler, M., Geiger, M., Welling, M., Boomsma, W., Cohen, T.: 3D Steerable CNNs: learning rotationally equivariant features in volumetric data. In: Advances in Neural Information Processing Systems, pp. 10381–10392 (2018)

    Google Scholar 

  74. Weiler, M., Hamprecht, F.A., Storath, M.: Learning steerable filters for rotation equivariant CNNs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)

    Google Scholar 

  75. Worrall, D., Brostow, G.: CubeNet: equivariance to 3D rotation and translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 585–602. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_35

    Chapter  Google Scholar 

  76. Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)

    Google Scholar 

  77. Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)

    Google Scholar 

  78. Xinyi, Z., Chen, L.: Capsule graph neural network. In: International Conference on Learning Representations (ICLR) (2019). openreview.net/forum?id=Byl8BnRcYm

  79. You, Y., Lou, Y., Liu, Q., Tai, Y.W., Ma, L., Lu, C., Wang, W.: Pointwise rotation-invariant network with adaptive sampling and 3D spherical voxel convolution. In: AAAI. pp. 12717–12724 (2020)

    Google Scholar 

  80. Yuan, W., Held, D., Mertz, C., Hebert, M.: Iterative transformer network for 3D point cloud. arXiv preprint arXiv:1811.11209 (2018)

  81. Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  82. Zhang, X., Qin, S., Xu, Y., Xu, H.: Quaternion product units for deep learning on 3D rotation groups. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7304–7313 (2020)

    Google Scholar 

  83. Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tolga Birdal .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3090 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, Y., Birdal, T., Lenssen, J.E., Menegatti, E., Guibas, L., Tombari, F. (2020). Quaternion Equivariant Capsule Networks for 3D Point Clouds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12346. Springer, Cham. https://doi.org/10.1007/978-3-030-58452-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58452-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58451-1

  • Online ISBN: 978-3-030-58452-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics