Abstract
We present a 3D capsule module for processing point clouds that is equivariant to 3D rotations and translations, as well as invariant to permutations of the input points. The operator receives a sparse set of local reference frames, computed from an input point cloud and establishes end-to-end transformation equivariance through a novel dynamic routing procedure on quaternions. Further, we theoretically connect dynamic routing between capsules to the well-known Weiszfeld algorithm, a scheme for solving iterative re-weighted least squares (IRLS) problems with provable convergence properties. It is shown that such group dynamic routing can be interpreted as robust IRLS rotation averaging on capsule votes, where information is routed based on the final inlier scores. Based on our operator, we build a capsule network that disentangles geometry from pose, paving the way for more informative descriptors and a structured latent space. Our architecture allows joint object classification and orientation estimation without explicit supervision of rotations. We validate our algorithm empirically on common benchmark datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afshar, P., Mohammadi, A., Plataniotis, K.N.: Brain tumor type classification via capsule networks. In: 2018 25th IEEE International Conference on Image Processing (ICIP) (2018)
Aftab, K., Hartley, R.: Convergence of iteratively re-weighted least squares to robust m-estimators. In: Winter Conference on Applications of Computer Vision. IEEE (2015)
Aftab, K., Hartley, R., Trumpf, J.: Generalized Weiszfeld algorithms for Lq optimization. IEEE Trans. Pattern Anal. Mach. Intell. 37(4), 728–745 (2014)
Aftab, K., Hartley, R., Trumpf, J.: \(l_q\) closest-point to affine subspaces using the generalized Weiszfeld algorithm. Int. J. Comput. Vis. 114, 1–15 (2015)
Aoki, Y., Goforth, H., Srivatsan, R.A., Lucey, S.: PointNetLK: robust & efficient point cloud registration using PointNet. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7163–7172 (2019)
Bao, E., Song, L.: Equivariant neural networks and equivarification. arXiv preprint arXiv:1906.07172 (2019)
Becigneul, G., Ganea, O.E.: Riemannian adaptive optimization methods. In: International Conference on Learning Representations (2019)
Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. In: Sensor Fusion IV: Control Paradigms and Data Structures, vol. 1611, pp. 586–606. International Society for Optics and Photonics (1992)
Birdal, T., Arbel, M., Simsekli, U., Guibas, L.J.: Synchronizing probability measures on rotations via optimal transport. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1569–1579 (2020)
Birdal, T., Ilic, S.: Point pair features based object detection and pose estimation revisited. In: 2015 International Conference on 3D Vision, pp. 527–535. IEEE (2015)
Birdal, T., Ilic, S.: A point sampling algorithm for 3D matching of irregular geometries. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (2017)
Birdal, T., Simsekli, U., Eken, M.O., Ilic, S.: Bayesian pose graph optimization via Bingham distributions and tempered geodesic MCMC. In: Advances in Neural Information Processing Systems, pp. 308–319 (2018)
Boomsma, W., Frellsen, J.: Spherical convolutions and their application in molecular modelling. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3433–3443 (2017)
Burrus, C.S.: Iterative reweighted least squares. OpenStax CNX (2012). http://cnx.org/contents/92b90377-2b34-49e4-b26f-7fe572db78a1
Busam, B., Birdal, T., Navab, N.: Camera pose filtering with local regression geodesics on the Riemannian manifold of dual quaternions. In: IEEE International Conference on Computer Vision Workshop (ICCVW) (October 2017)
Chakraborty, R., Banerjee, M., Vemuri, B.C.: H-CNNs: convolutional neural networks for Riemannian homogeneous spaces. arXiv preprint arXiv:1805.05487 (2018)
Cohen, T., Weiler, M., Kicanaoglu, B., Welling, M.: Gauge equivariant convolutional networks and the icosahedral CNN. In: Proceedings of the 36th International Conference on Machine Learning, pp. 1321–1330 (2019)
Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning, pp. 2990–2999 (2016)
Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. In: 6th International Conference on Learning Representations (ICLR) (2018)
Cohen, T.S., Geiger, M., Weiler, M.: A general theory of equivariant CNNs on homogeneous spaces. In: Advances in Neural Information Processing Systems, pp. 9145–9156 (2019)
Cohen, T.S., Welling, M.: Steerable CNNs. In: International Conference on Learning Representations (ICLR) (2017)
Cruz-Mota, J., Bogdanova, I., Paquier, B., Bierlaire, M., Thiran, J.P.: Scale invariant feature transform on the sphere: theory and applications. Int. J. Comput. Vis. 98(2), 217–241 (2012)
Deng, H., Birdal, T., Ilic, S.: PPF-FoldNet: unsupervised learning of rotation invariant 3D local descriptors. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 620–638. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_37
Deng, H., Birdal, T., Ilic, S.: PPFNet: global context aware local features for robust 3D point matching. In: Conference on Computer Vision and Pattern Recognition (2018)
Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: Learning SO(3) equivariant representations with spherical CNNs. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 54–70. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_4
Esteves, C., Sud, A., Luo, Z., Daniilidis, K., Makadia, A.: Cross-domain 3D equivariant image embeddings. In: International Conference on Machine Learning (ICML) (2019)
Esteves, C., Xu, Y., Allen-Blanchette, C., Daniilidis, K.: Equivariant multi-view networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1568–1577 (2019)
Fey, M., Eric Lenssen, J., Weichert, F., Müller, H.: SplineCNN: fast geometric deep learning with continuous B-spline kernels. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Giles, C.L., Maxwell, T.: Learning, invariance, and generalization in high-order neural networks. Appl. Opt. 26(23), 4972–4978 (1987)
Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 44–51. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_6
Hoppe, H., DeRose, T., Duchamp, T., McDonald, J., Stuetzle, W.: Surface reconstruction from unorganized points. SIGGRAPH Comput. Graph. 26(2), 71–78 (1992)
Jaiswal, A., AbdAlmageed, W., Wu, Y., Natarajan, P.: CapsuleGAN: generative adversarial capsule network. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11131, pp. 526–535. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11015-4_38
Jiang, C.M., Huang, J., Kashinath, K., Prabhat, Marcus, P., Niessner, M.: Spherical CNNs on unstructured grids. In: International Conference on Learning Representations (2019)
Khoury, M., Zhou, Q.Y., Koltun, V.: Learning compact geometric features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–161 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kondor, R., Lin, Z., Trivedi, S.: Clebsch-Gordan Nets: a fully Fourier space spherical convolutional neural network. In: Advances in Neural Information Processing Systems (2018)
Kondor, R., Trivedi, S.: On the generalization of equivariance and convolution in neural networks to the action of compact groups. In: International Conference on Machine Learning, pp. 2747–2755 (2018)
Kosiorek, A., Sabour, S., Teh, Y.W., Hinton, G.E.: Stacked capsule autoencoders. In: Advances in Neural Information Processing Systems, pp. 15512–15522 (2019)
Laue, S., Mitterreiter, M., Giesen, J.: Computing higher order derivatives of matrix and tensor expressions. In: Advances in Neural Information Processing Systems (2018)
Lenssen, J.E., Fey, M., Libuschewski, P.: Group equivariant capsule networks. In: Advances in Neural Information Processing Systems, pp. 8844–8853 (2018)
Li, J., Chen, B.M., Hee Lee, G.: SO-Net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on X-transformed points. In: Advances in Neural Information Processing Systems (2018)
Liao, S., Gavves, E., Snoek, C.G.: Spherical regression: learning viewpoints, surface normals and 3D rotations on n-spheres. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 9759–9767 (2019)
Liu, M., Yao, F., Choi, C., Ayan, S., Ramani, K.: Deep learning 3D shapes using alt-az anisotropic 2-sphere convolution. In: International Conference on Learning Representations (ICLR) (2019)
Liu, X., Han, Z., Liu, Y.S., Zwicker, M.: Point2Sequence: learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. Proc. AAAI Conf. Artif. Intell. 33, 8778–8785 (2019)
Magnus, J.R.: On differentiating eigenvalues and eigenvectors. Econom. Theor. 1(2), 179–191 (1985)
Marcos, D., Volpi, M., Komodakis, N., Tuia, D.: Rotation equivariant vector field networks. In: The IEEE International Conference on Computer Vision (ICCV) (October 2017)
Markley, F.L., Cheng, Y., Crassidis, J.L., Oshman, Y.: Averaging quaternions. J. Guid. Control Dyn. 30(4), 1193–1197 (2007)
Maturana, D., Scherer, S.: VoxNet: A 3D convolutional neural network for real-time object recognition. In: Intelligent Robots and Systems (IROS). IEEE (2015)
Mehr, E., Lieutier, A., Sanchez Bermudez, F., Guitteny, V., Thome, N., Cord, M.: Manifold learning in quotient spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9165–9174 (2018)
Melzi, S., Spezialetti, R., Tombari, F., Bronstein, M.M., Stefano, L.D., Rodola, E.: GFrames: gradient-based local reference frame for 3D shape matching. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
Petrelli, A., Di Stefano, L.: On the repeatability of the local reference frame for partial shape matching. In: 2011 International Conference on Computer Vision. IEEE (2011)
Petrelli, A., Di Stefano, L.: A repeatable and efficient canonical reference for surface matching. In: 2012 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 403–410. IEEE (2012)
Poulenard, A., Ovsjanikov, M.: Multi-directional geodesic neural networks via equivariant convolution. In: SIGGRAPH Asia 2018 Technical Papers, p. 236. ACM (2018)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2016)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5099–5108 (2017)
Rezatofighi, S.H., Milan, A., Abbasnejad, E., Dick, A., Reid, I., et al.: DeepSetNet: predicting sets with deep neural networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5257–5266. IEEE (2017)
Sabour, S., Frosst, N., Hinton, G.: Matrix capsules with EM routing. In: 6th International Conference on Learning Representations (ICLR) (2018)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)
Schütt, K., Kindermans, P.J., Sauceda Felix, H.E., Chmiela, S., Tkatchenko, A., Müller, K.R.: SchNet: a continuous-filter convolutional neural network for modeling quantum interactions. In: Advances in Neural Information Processing Systems (2017)
Shen, Y., Feng, C., Yang, Y., Tian, D.: Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4548–4557 (2018)
Spezialetti, R., Salti, S., Stefano, L.D.: Learning an effective equivariant 3D descriptor without supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6401–6410 (2019)
Srivastava, N., Goh, H., Salakhutdinov, R.: Geometric capsule autoencoders for 3D point clouds. arXiv preprint arXiv:1912.03310 (2019)
Steenrod, N.E.: The Topology of Fibre Bundles, vol. 14. Princeton University Press, Princeton (1951)
Thomas, N., et al.: Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds. arXiv preprint arXiv:1802.08219 (2018)
Tombari, F., Salti, S., Di Stefano, L.: Unique signatures of histograms for local surface description. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. Lecture Notes in Computer Science, vol. 6313. Springer, Heidelberg (2010)
Wang, D., Liu, Q.: An optimization view on dynamic routing between capsules (2018). https://openreview.net/forum?id=HJjtFYJDf
Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6D object pose and size estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)
Wang, S., Suo, S., Ma, W.C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–2 (2019)
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) 38(5), 1–12 (2019)
Weiler, M., Geiger, M., Welling, M., Boomsma, W., Cohen, T.: 3D Steerable CNNs: learning rotationally equivariant features in volumetric data. In: Advances in Neural Information Processing Systems, pp. 10381–10392 (2018)
Weiler, M., Hamprecht, F.A., Storath, M.: Learning steerable filters for rotation equivariant CNNs. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
Worrall, D., Brostow, G.: CubeNet: equivariance to 3D rotation and translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 585–602. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_35
Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Harmonic networks: deep translation and rotation equivariance. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017)
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Xinyi, Z., Chen, L.: Capsule graph neural network. In: International Conference on Learning Representations (ICLR) (2019). openreview.net/forum?id=Byl8BnRcYm
You, Y., Lou, Y., Liu, Q., Tai, Y.W., Ma, L., Lu, C., Wang, W.: Pointwise rotation-invariant network with adaptive sampling and 3D spherical voxel convolution. In: AAAI. pp. 12717–12724 (2020)
Yuan, W., Held, D., Mertz, C., Hebert, M.: Iterative transformer network for 3D point cloud. arXiv preprint arXiv:1811.11209 (2018)
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: Advances in Neural Information Processing Systems (2017)
Zhang, X., Qin, S., Xu, Y., Xu, H.: Quaternion product units for deep learning on 3D rotation groups. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7304–7313 (2020)
Zhao, Y., Birdal, T., Deng, H., Tombari, F.: 3D point capsule networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, Y., Birdal, T., Lenssen, J.E., Menegatti, E., Guibas, L., Tombari, F. (2020). Quaternion Equivariant Capsule Networks for 3D Point Clouds. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12346. Springer, Cham. https://doi.org/10.1007/978-3-030-58452-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-58452-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58451-1
Online ISBN: 978-3-030-58452-8
eBook Packages: Computer ScienceComputer Science (R0)