Abstract
We present a new local descriptor for 3D shapes, directly applicable to a wide range of shape analysis problems such as point correspondences, semantic segmentation, affordance prediction, and shape-to-scan matching. The descriptor is produced by a convolutional network that is trained to embed geometrically and semantically similar points close to one another in descriptor space. The network processes surface neighborhoods around points on a shape that are captured at multiple scales by a succession of progressively zoomed-out views, taken from carefully selected camera positions. We leverage two extremely large sources of data to train our network. First, since our network processes rendered views in the form of 2D images, we repurpose architectures pretrained on massive image datasets. Second, we automatically generate a synthetic dense point correspondence dataset by nonrigid alignment of corresponding shape parts in a large collection of segmented 3D models. As a result of these design choices, our network effectively encodes multiscale local context and fine-grained surface detail. Our network can be trained to produce either category-specific descriptors or more generic descriptors by learning from multiple shape categories. Once trained, at test time, the network extracts local descriptors for shapes without requiring any part segmentation as input. Our method can produce effective local descriptors even for shapes whose category is unknown or different from the ones used while training. We demonstrate through several experiments that our learned local descriptors are more discriminative compared to state-of-the-art alternatives and are effective in a variety of shape analysis applications.
Supplemental Material
- M. Ankerst, G. Kastenmüller, H.-P. Kriegel, and T. Seidl. 1999. 3D shape histograms for similarity search and classification in spatial databases. In Proceedings of the International Symposium on Advances in Spatial Databases. 207--226. Google ScholarDigital Library
- M. Aubry, U. Schlickewei, and D. Cremers. 2011. The wave kernel signature: A quantum mechanical approach to shape analysis. In 2011 IEEE International Conference on Computer Vision Workshops.Google Scholar
- S. Belongie, J. Malik, and J. Puzicha. 2002. Shape matching and object recognition using shape contexts. In IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 4 (2002), 509--522. Google ScholarDigital Library
- L. Bo, X. Ren, and D. Fox. 2014. Learning hierarchical sparse features for RGB-(D) object recognition. The International Journal of Robotics Research 33, 4 (2014), 581--599. Google ScholarDigital Library
- F. Bogo, J. Romero, M. Loper, and M. J. Black. 2014. FAUST: Dataset and evaluation for 3D mesh registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarDigital Library
- D. Boscaini, J. Masci, S. Melzi, M. M. Bronstein, U. Castellani, and P. Vandergheynst. 2015. Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks. In Proceedings of the Symposium on Geometry Processing (SGP’15). 13--23.Google Scholar
- D. Boscaini, J. Masci, E. Rodol, and M. M. Bronstein. 2016. Learning shape correspondence with anisotropic convolutional neural networks. The Conference and Workshop on Neural Information Processing Systems (NIPS’16). Google ScholarDigital Library
- J. Bromley, I. Guyon, Y. Lecun, E. Sackinger, and R. Shah. 1994. Signature Verification using a Siamese Time Delay Neural Network. Advances in Neural Information Processing Systems 6. Morgan-Kaufmann. 737--744. Google ScholarDigital Library
- A. M. Bronstein, M. M. Bronstein, L. J. Guibas, and M. Ovsjanikov. 2011. Shape Google: Geometric words and expressions for invariant shape retrieval. ACM Transactions on Graphics 30, 1 (2011), 1:1--1:20. Google ScholarDigital Library
- A. X. Chang, T. A. Funkhouser, L. J. Guibas, P. Hanrahan, Q.-X. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu. 2015. ShapeNet: An information-rich 3D model repository. CoRR.Google Scholar
- D.-Y. Chen, X.-P. Tian, Y.-T. Shen, and M. Ouhyoung. 2003. On visual similarity based 3D model retrieval. Computer Graphics Forum 22, 3 (2003), 223--232.Google ScholarCross Ref
- H. Fu, D. Cohen-Or, G. Dror, and A. Sheffer. 2008. Upright orientation of man-made objects. ACM Trans. Graph. 27, 3 (2008). Google ScholarDigital Library
- R. Gal and D. Cohen-Or. 2006. Salient geometric features for partial shape matching and similarity. ACM Transactions on Graphics 25, 1 (2006), 130--150. Google ScholarDigital Library
- K. Guo, D. Zou, and X. Chen. 2015. 3D mesh labeling via deep convolutional neural networks. ACM Transactions on Graphics 35, 1 (2015), 3:1--3:12. Google ScholarDigital Library
- R. Hadsell, S. Chopra, and Y. LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06). Google ScholarDigital Library
- X. Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg. 2015. MatchNet: Unifying feature and metric learning for patch-based matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).Google Scholar
- Q.-X. Huang, H. Su, and L. Guibas. 2013. Fine-grained semi-supervised labeling of large shape collections. ACM Transactions on Graphics 32, 6 (2013), 190:1--190:10. Google ScholarDigital Library
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. CoRR.Google ScholarDigital Library
- A. E. Johnson and M. Hebert. 1999. Using spin images for efficient object recognition in cluttered 3D scenes. In IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 5 (1999), 433--449. Google ScholarDigital Library
- E. Kalogerakis, M. Averkiou, S. Maji, and S. Chaudhuri. 2017. 3D shape segmentation with projective convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).Google Scholar
- E. Kalogerakis, A. Hertzmann, and K. Singh. 2010. Learning 3D mesh segmentation and labeling. ACM Transactions on Graphics 29, 4 (2010), 102:1--102:12. Google ScholarDigital Library
- M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz. 2004. Symmetry descriptors and 3D shape matching. In Proceedings of the Symposium on Geometry Processing (SGP’04). Google ScholarDigital Library
- V. G. Kim, S. Chaudhuri, L. Guibas, and T. Funkhouser. 2014. Shape2Pose: Human-centric shape analysis. ACM Transactions on Graphics 33, 4 (2014), 120:1--120:12. Google ScholarDigital Library
- V. G. Kim, W. Li, N. J. Mitra, S. Chaudhuri, S. DiVerdi, and T. Funkhouser. 2013. Learning part-based templates from large collections of 3D shapes. ACM Transactions on Graphics 32, 4 (2013), 70:1--70:12. Google ScholarDigital Library
- D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. CoRR.Google Scholar
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. The Conference and Workshop on Neural Information Processing Systems (NIPS’12). Google ScholarDigital Library
- K. Lai, L. Bo, and D. Fox. 2014. Unsupervised feature learning for 3D scene labeling. In IEEE International Conference on Robotics and Automation (ICRA’14).Google Scholar
- G. Lavoue. 2012. Combination of bag-of-words descriptors for robust partial shape retrieval. The Visual Computer 28, 9 (2012), 931--942. Google ScholarDigital Library
- R. Litman, A. Bronstein, M. Bronstein, and U. Castellani. 2014. Supervised learning of bag-of-features shape descriptors using sparse coding. Computer Graphics Forum 33, 5 (2014), 127--136.Google ScholarDigital Library
- Y. Liu, H. Zha, and H. Qin. 2006. Shape topics: A compact representation and new algorithms for 3D partial shape retrieval. IEEE Conference on Computer Vision and Pattern Recognition (CVPR’06). Google ScholarDigital Library
- J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst. 2015. Geodesic convolutional neural networks on Riemannian manifolds. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 37--45. Google ScholarDigital Library
- D. Maturana and S. Scherer. 2015. 3D convolutional neural networks for landing zone detection from LiDAR. In IEEE International Conference on Robotics and Automation (ICRA’15).Google Scholar
- F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein. 2017. Geometric deep learning on graphs and manifolds using mixture model CNNs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
- M. Novotni and R. Klein. 2003. 3D Zernike descriptors for content based shape retrieval. The 8th ACM Symposium on Solid Modeling and Applications. Google ScholarDigital Library
- R. Ohbuchi and T. Furuya. 2010. Distance metric learning and feature combination for shape-based 3D model retrieval. Proc. 3DOR. Google ScholarDigital Library
- R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin. 2002. Shape distributions. ACM Transactions on Graphics 21, 4 (2002), 807--832. Google ScholarDigital Library
- M. Ovsjanikov, W. Li, L. Guibas, and N. J. Mitra. 2011. Exploration of continuous variability in collections of 3D shapes. ACM Transactions on Graphics 30, 4 (2011), 33:1--33:10. Google ScholarDigital Library
- C. R. Qi, H. Su, K. Mo, and L. J. Guibas. 2017. PointNet: Deep learning on point sets for 3D classification and segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
- C. R. Qi, H. Su, M. Niener, A. Dai, M. Yan, and L. J. Guibas. 2016. Volumetric and multi-view CNNs for object classification on 3D data. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 5648--5656.Google Scholar
- E. Rodola, S. Bulo, T. Windheuser, M. Vestner, and D. Cremers. 2014. Dense non-rigid shape correspondence using random forests. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). Google ScholarDigital Library
- E. Rodola, L. Cosmo, O. Litany, M. M. Bronstein, A. M. Bronstein, N. Audebert, A. Ben Hamza, A. Boulch, U. Castellani, M. N. Do, A.-D. Duong, T. Furuya, A. Gasparetto, Y. Hong, J. Kim, B. Le Saux, R. Litman, M. Masoumi, G. Minello, H.-D. Nguyen, V.-T. Nguyen, R. Ohbuchi, V.-K. Pham, T. V. Phan, M. Rezaei, A. Torsello, M.-T. Tran, Q.-T. Tran, B. Truong, L. Wan, and C. Zou. 2017. Deformable shape retrieval with missing parts. In Eurographics Workshop on 3D Object Retrieval (3DOR’17).Google Scholar
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. Google ScholarDigital Library
- D. Saupe and D. V. Vranic. 2001. 3D model retrieval with spherical harmonics and moments. In Symposium on Pattern Recognition. 392--397. Google ScholarDigital Library
- M. Savva, F. Yu, Hao Su, M. Aono, B. Chen, D. Cohen-Or, W. Deng, Hang Su, S. Bai, X. Bai, N. Fish, J. Han, E. Kalogerakis, E. G. Learned-Miller, Y. Li, M. Liao, S. Maji, A. Tatsuma, Y. Wang, N. Zhang, and Z. Zhou. 2016. Large-scale 3D shape retrieval from shapenet core55. Eurographics Workshop on 3D Object Retrieval (3DOR’16). Google ScholarDigital Library
- L. Shapira, S. Shalom, A. Shamir, D. Cohen-Or, and H. Zhang. 2010. Contextual part analogies in 3D objects. International Journal of Computer Vision 89, 2--3 (2010), 309--326. Google ScholarDigital Library
- E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. 2015. Discriminative learning of deep convolutional feature point descriptors. In IEEE International Conference on Computer Vision (ICCV’15). 9. Google ScholarDigital Library
- K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR.Google Scholar
- A. Sinha, J. Bai, and K. Ramani. 2016. Deep learning 3D shape surfaces using geometry images. European Conference on Computer Vision (ECCV’16).Google Scholar
- R. Socher, B. Huval, B. Bhat, C. D. Manning, and A. Y. Ng. 2012. Convolutional-recursive deep learning for 3D object classification. The Conference and Workshop on Neural Information Processing Systems (NIPS’12). 656--664. Google ScholarDigital Library
- S. Song and J. Xiao. 2016. Deep sliding shapes for amodal 3d object detection in RGB-D images. European Conference on Computer Vision (ECCV’16).Google Scholar
- O. Sorkine and M. Alexa. 2007. As-rigid-as-possible surface modeling. In Proceedings of the Symposium on Geometry Processing (SGP’07). Google ScholarDigital Library
- H. Su, S. Maji, E. Kalogerakis, and E. G. Learned-Miller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of ICCV. Google ScholarDigital Library
- R. W. Sumner, J. Schmid, and M. Pauly. 2007. Embedded deformation for shape manipulation. ACM Trans. Graph. 26, 3 (2007). Google ScholarDigital Library
- F. Tombari, S. Salti, and L. Di Stefano. 2010. Unique signatures of histograms for local surface description. European Conference on Computer Vision (ECCV’10). Google ScholarDigital Library
- L. Wei, Q. Huang, D. Ceylan, E. Vouga, and H. Li. 2016. Dense human body correspondences using convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google Scholar
- Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 2015. 3D shapenets: A deep representation for volumetric shapes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1912--1920.Google Scholar
- Y. Xian, B. Schiele, and Z. Akata. 2017. Zero-shot learning - The good, the bad and the ugly. CoRR (2017).Google Scholar
- J. Xie, Y. Fang, F. Zhu, and E. Wong. 2015. Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15).Google Scholar
- K. Xu, V. G. Kim, Q. Huang, N. Mitra, and E. Kalogerakis. 2016. Data-driven shape analysis and processing. In SIGGRAPH ASIA 2016 Courses (SA’16). ACM. Google ScholarDigital Library
- K. M. Yi, E. Trulls, V. Lepetit, and P. Fua. 2016. LIFT: Learned invariant feature transform. European Conference on Computer Vision (ECCV’16).Google Scholar
- L. Yi, V. G. Kim, D. Ceylan, I.-C. Shen, M. Yan, H. Su, C. Lu, Q. Huang, A. Sheffer, and L. Guibas. 2016. A scalable active framework for region annotation in 3D shape collections. ACM Transactions on Graphics 35, 6 (2016), 210:1--210:12. Google ScholarDigital Library
- L. Yi, H. Su, X. Guo, and L. Guibas. 2017. Synchronized spectral CNN for 3D shape segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
- A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, and T. Funkhouser. 2016. 3DMatch: Learning local geometric descriptors from RGB-D reconstructions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google Scholar
- E. Zhang, K. Mischaikow, and G. Turk. 2005. Feature-based surface parameterization and texture mapping. ACM Transactions on Graphics 24, 1 (2005), 1--27. Google ScholarDigital Library
Index Terms
- Learning Local Shape Descriptors from Part Correspondences with Multiview Convolutional Networks
Recommendations
Invariant curvature-based Fourier shape descriptors
Shape descriptors have demonstrated encouraging potential for retrieving images based on image content, and a number of them have been reported in the literature. Nevertheless, most of the reported descriptors are still face accuracy and computational ...
A Performance Evaluation of Local Descriptors
In this paper, we compare the performance of descriptors computed for local interest regions, as, for example, extracted by the Harris-Affine detector [32]. Many different descriptors have been proposed in the literature. It is unclear which descriptors ...
Evaluation of MPEG-7 shape descriptors against other shape descriptors
Shape is an important image feature-it is one of the primary low level image features exploited in content-based image retrieval (CBIR). There are generally two types of shape descriptors in the literature: contour-based and region-based. In MPEG-7, the ...
Comments