skip to main content

Deep Learning Advances in Computer Vision with 3D Data: A Survey

Authors Info & Claims
Published:06 April 2017Publication History
Skip Abstract Section


Deep learning has recently gained popularity achieving state-of-the-art performance in tasks involving text, sound, or image processing. Due to its outstanding performance, there have been efforts to apply it in more challenging scenarios, for example, 3D data processing. This article surveys methods applying deep learning on 3D data and provides a classification based on how they exploit them. From the results of the examined works, we conclude that systems employing 2D views of 3D data typically surpass voxel-based (3D) deep models, which however, can perform better with more layers and severe data augmentation. Therefore, larger-scale datasets and increased resolutions are required.

Skip Supplemental Material Section

Supplemental Material


  1. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, R. Jozefowicz, Y. Jia, L. Kaiser, M. Kudlur, J. Levenberg, D. Man, M. Schuster, R. Monga, S. Moore, D. Murray, C. Olah, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vigas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). Software available from ScholarGoogle Scholar
  2. R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274--2282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Agarwal, E. Akchurin, C. Basoglu, G. Chen, S. Cyphers, J. Droppo, A. Eversole, B. Guenter, M. Hillebrand, R. Hoens, X. Huang, Z. Huang, V. Ivanov, A. Kamenev, P. Kranen, O. Kuchaiev, W. Manousek, A. May, B. Mitra, O. Nano, G. Navarro, A. Orlov, M. Padmilac, H. Parthasarathi, B. Peng, A. Reznichenko, F. Seide, M. L. Seltzer, M. Slaney, A. Stolcke, Y. Wang, H. Wang, K. Yao, D. Yu, Y. Zhang, and G. Zweig. 2014. An Introduction to Computational Networks and the Computational Network Toolkit. Technical Report MSR-TR-2014-112. Microsoft Research.Google ScholarGoogle Scholar
  4. A. K. Aijazi, P. Checchin, and L. Trassoudaine. 2013. Segmentation based classification of 3D urban point clouds: A super-voxel based approach with evaluation. Remote Sensing 5, 4 (2013), 1624--1650. Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Aldoma, F. Tombari, L. Di Stefano, and M. Vincze. 2012a. A global hypotheses verification method for 3D object recognition. In Proceedings of the 12th European Conference on Computer Vision. 511--524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Aldoma, F. Tombari, R. B. Rusu, and M. Vincze. 2012b. Pattern Recognition: Joint 34th DAGM and 36th OAGM Symposium. Chapter OUR-CVFH -- Oriented, Unique and Repeatable Clustered Viewpoint Feature Histogram for Object Recognition and 6DOF Pose Estimation, 113--122.Google ScholarGoogle Scholar
  7. A. Aldoma, M. Vincze, N. Blodow, D. Gossow, S. Gedikli, R. B. Rusu, and G. Bradski. 2011. CAD-model recognition and 6DOF pose estimation using 3D cues. In IEEE ICCV Workshops. 585--592.Google ScholarGoogle Scholar
  8. L. A. Alexandre. 2012. 3D descriptors for object and category recognition: A comparative evaluation. In Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RSJ IROS.Google ScholarGoogle Scholar
  9. L. A. Alexandre. 2014. 3D Object recognition using convolutional neural networks with transfer learning between input channels. In 13th International Conference on Intelligent Autonomous Systems, Vol. 301.Google ScholarGoogle Scholar
  10. S. Bahrampour, N. Ramakrishnan, L. Schott, and M. Shah. 2015. Comparative study of Caffe, Neon, Theano, and Torch for deep learning. CoRR abs/1511.06435 (2015).Google ScholarGoogle Scholar
  11. S. Bai, X. Bai, Z. Zhou, Z. Zhang, and L. Jan Latecki. 2016. GIFT: A real-time and scalable 3D shape search engine. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarGoogle ScholarCross RefCross Ref
  12. P. Baldi and P. J. Sadowski. 2013. Understanding dropout. In Advances in Neural Information Processing Systems 26. 2814--2822.Google ScholarGoogle Scholar
  13. F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, D. Warde-Farley, and Y. Bengio. 2012. Theano: New features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop.Google ScholarGoogle Scholar
  14. S. Bell, C. L. Zitnick, K. Bala, and R. B. Girshick. 2015. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. CoRR abs/1512.04143 (2015).Google ScholarGoogle Scholar
  15. J. A. Benediktsson, J. A. Palmason, and J. Sveinsson. 2005. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE TGRS 43, 3 (2005), 480--491. Google ScholarGoogle ScholarCross RefCross Ref
  16. Y. Bengio. 2012. Neural Networks: Tricks of the Trade: Second Edition. Chapter: Practical Recommendations for Gradient-Based Training of Deep Architectures, 437--478.Google ScholarGoogle Scholar
  17. Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. 2007. Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems 19. 153--160.Google ScholarGoogle Scholar
  18. J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. 2011. Algorithms for hyper-parameter optimization. In 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), Vol. 24.Google ScholarGoogle Scholar
  19. J. Bergstra and Y. Bengio. 2012. Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13 (2012), 281--305.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. 2010. Theano: A CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy). Oral Presentation.Google ScholarGoogle Scholar
  21. P. J. Besl and N. D. McKay. 1992. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 2 (1992), 239--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N. Nasrabadi, and J. Chanussot. 2013. Hyperspectral remote sensing data analysis and future challenges. IEEE Geoscience and Remote Sensing Magazine 1, 2 (2013), 6--36. Google ScholarGoogle ScholarCross RefCross Ref
  23. L. Bo, X. Ren, and D. Fox. 2013. Unsupervised feature learning for RGB-D based object recognition. In Experimental Robotics: The 13th International Symposium on Experimental Robotics. 387--402. Google ScholarGoogle ScholarCross RefCross Ref
  24. D. Borrmann, J. Elseberg, K. Lingemann, and A. Nüchter. 2011. The 3D Hough transform for plane detection in point clouds: A review and a new accumulator design. 3D Research 2, 2 (2011), 1--13.Google ScholarGoogle Scholar
  25. F. Bosche, Y. Turkan, C. Haas, and R. Haas. 2010. Fusing 4D modeling and laser scanning for automated construction progress control. 26th ARCOM Annual Conference and Annual General Meeting (2010).Google ScholarGoogle Scholar
  26. Y.-L. Boureau, J. Ponce, and Y. LeCun. 2010. A theoretical analysis of feature pooling in vision algorithms. In Proceedings of the International Conference on Machine learning (ICML’10).Google ScholarGoogle Scholar
  27. A. Brock, Th. Lim, J. M. Ritchie, and N. Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. CoRR abs/1608.04236 (2016).Google ScholarGoogle Scholar
  28. M. M. Bronstein and I. Kokkinos. 2010. Scale-invariant heat kernel signatures for non-rigid shape recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 1704--1711. Google ScholarGoogle ScholarCross RefCross Ref
  29. S. Bu, P. Han, Z. Liu, J. Han, and H. Lin. 2015. Local deep feature learning framework for 3D shape. Computers 8 Graphics 46 (2015), 117--129. Shape Modeling International 2014.Google ScholarGoogle Scholar
  30. S. Bu, Z. Liu, J. Han, J. Wu, and R. Ji. 2014. Learning high-level feature by deep belief networks for 3-D model retrieval and recognition. IEEE Transactions on Multimedia 16, 8 (2014), 2154--2167. Google ScholarGoogle ScholarCross RefCross Ref
  31. B. Bustos, D. Keim, D. Saupe, and T. Schreck. 2007. Content-based 3D object retrieval. IEEE Computer Graphics and Applications 27, 4 (2007), 22--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki. 2015. Scene labeling with LSTM recurrent neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3547--3555. Google ScholarGoogle ScholarCross RefCross Ref
  33. Z. Cai, J. Han, L. Liu, and L. Shao. 2016. RGB-D datasets using microsoft kinect or similar sensors: A survey. Multimedia Tools and Applications (2016), 1--43.Google ScholarGoogle Scholar
  34. N. Charbonneau, J. Burgess, and L. Robichaud. 2015. Using 4D modelling in a university-museum research partnership. In 2015 Digital Heritage, Vol. 2. 603--610.Google ScholarGoogle Scholar
  35. K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference. Google ScholarGoogle ScholarCross RefCross Ref
  36. D.-Y. Chen, X. P. Tian, Y.-T. Shen, and M. Ouhyoung. 2003. On visual similarity based 3D model retrieval. Computer Graphics Forum (EUROGRAPHICS’03) 22, 3 (2003), 223--232.Google ScholarGoogle Scholar
  37. H. Chen and B. Bhanu. 2007. 3D free-form object recognition in range images using local surface patches. Pattern Recognition Letters 28, 10 (2007), 1252--1262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. 2015a. Compressing neural networks with the hashing trick. CoRR abs/1504.04788 (2015).Google ScholarGoogle Scholar
  39. Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi. 2016. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE TGRS 54, 10 (2016), 6232--6251. Google ScholarGoogle ScholarCross RefCross Ref
  40. Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu. 2014. Deep learning-based classification of hyperspectral data. IEEE J-STARS 7, 6 (2014), 2094--2107. Google ScholarGoogle ScholarCross RefCross Ref
  41. Y. Chen, X. Zhao, and X. Jia. 2015b. Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J-STARS 8, 6 (2015), 2381--2392.Google ScholarGoogle Scholar
  42. R. Collobert, K. Kavukcuoglu, and C. Farabet. 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop.Google ScholarGoogle Scholar
  43. R. Collobert, K. Kavukcuoglu, and C. Farabet. 2012. Neural Networks: Tricks of the Trade: Second Edition. Chapter: Implementing Neural Networks Efficiently, 537--557.Google ScholarGoogle Scholar
  44. C. Cortes and V. Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273--297. Google ScholarGoogle ScholarCross RefCross Ref
  45. C. Couprie, C. Farabet, L. Najman, and Y. Lecun. 2013. Indoor semantic segmentation using depth information. CoRR abs/1301.3572 (2013).Google ScholarGoogle Scholar
  46. P. Daras and A. Axenopoulos. 2010. A 3D shape retrieval framework supporting multimodal queries. International Journal of Computer Vision 89, 2 (2010), 229--247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Computer Vision and Pattern Recognition (CVPR’09).Google ScholarGoogle Scholar
  48. L. Deng. 2014. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing 3 (2014), e5. Google ScholarGoogle ScholarCross RefCross Ref
  49. M. Denil, B. Shakibi, L. Dinh, M. A. Ranzato, and N. de Freitas. 2013. Predicting parameters in deep learning. CoRR abs/1306.0543 (2013).Google ScholarGoogle Scholar
  50. E. Denton, E. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. CoRR abs/1404.0736 (2014).Google ScholarGoogle Scholar
  51. B. Douillard, J. Underwood, N. Kuntz, V. Vlaskine, A. Quadros, P. Morton, and A. Frenkel. 2011. On the segmentation of 3D LIDAR point clouds. In IEEE ICRA. 2798--2805.Google ScholarGoogle Scholar
  52. A. Doulamis, M. Ioannides, N. Doulamis, A. Hadjiprocopis, D. Fritsch, O. Balet, M. Julien, E. Protopapadakis, and others. 2013. 4D reconstruction of the past. Proceedings of SPIE 8795 (2013), 87950J-1--87950J-11. Google ScholarGoogle ScholarCross RefCross Ref
  53. A. Doulamis, S. Soile, N. Doulamis, C. Chrisouli, N. Grammalidis, K. Dimitropoulos, C. Manesis, C. Potsiou, and C. Ioannidis. 2015. Selective 4D modelling framework for spatial-temporal land information management system. Proceedings of SPIE 9535, 3rd RSCy (2015).Google ScholarGoogle Scholar
  54. N. Doulamis and A. Doulamis. 2012. Fast and adaptive deep fusion learning for detecting visual objects. In Proceedings of ECCV 2012. Workshops and Demonstrations. 345--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard. 2015. Multimodal deep learning for robust RGB-D object recognition. In IEEE/RSJ International Conference on IROS. Google ScholarGoogle ScholarCross RefCross Ref
  56. Y. Fang, J. Xie, G. Dai, M. Wang, F. Zhu, T. Xu, and E. Wong. 2015. 3D deep shape descriptor. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2319--2328. Google ScholarGoogle ScholarCross RefCross Ref
  57. M. Fauvel, J. Chanussot, and J. A. Benediktsson. 2012. A spatial--spectral kernel-based approach for the classification of remote-sensing images. Pattern Recognition 45, 1 (2012), 381--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. J. Feng, Y. Wang, and S.-F. Chang. 2016. 3D shape retrieval using single depth image from low-cost sensors. In IEEE Winter Conference on Applications of Computer Vision (WACV’16). Google ScholarGoogle ScholarCross RefCross Ref
  59. S. Filipe and L. A. Alexandre. 2014. A comparative evaluation of 3D keypoint detectors in a RGB-D object dataset. In 9th International Conference on Computer Vision Theory and Applications. 476--483.Google ScholarGoogle Scholar
  60. S. Filipe, L. Itti, and L. A. Alexandre. 2015. BIK-BUS: Biologically motivated 3D keypoint based on bottom-up saliency. IEEE Transactions on Image Processing 24, 1 (2015), 163--175. Google ScholarGoogle ScholarCross RefCross Ref
  61. A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik. 2004. Recognizing objects in range data using regional point descriptors. In ECCV 2004. Lecture Notes in Computer Science, Vol. 3023. 224--237. Google ScholarGoogle ScholarCross RefCross Ref
  62. Y. Gao and Q. Dai. 2014. View-based 3D object retrieval: Challenges and approaches. IEEE MultiMedia 21, 3 (2014), 52--57. Google ScholarGoogle ScholarCross RefCross Ref
  63. Y. Gao, M. Wang, D. Tao, R. Ji, and Q. Dai. 2012. 3-D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing 21, 9 (2012), 4290--4303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Y. Gao, M. Wang, Z. J. Zha, Q. Tian, Q. Dai, and N. Zhang. 2011. Less is more: Efficient 3-D object retrieval with query view selection. IEEE Transactions on Multimedia 13, 5 (2011), 1007--1018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. D. Giorgi, S. Biasotti, and L. Paraboschi. 2007. Shape retrieval contest 2007: Watertight models track. SHREC Competition 8 (2007).Google ScholarGoogle Scholar
  66. A. Godil, H. Dutagaci, C. Akgul, A. Axenopoulos, B. Bustos, M. Chaouch, P. Daras, and others. 2009. SHREC’09 track: Generic shape retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 61--68.Google ScholarGoogle Scholar
  67. Y. Gong, L. Liu, M. Yang, and L. D. Bourdev. 2014. Compressing deep convolutional networks using vector quantization. CoRR abs/1412.6115 (2014).Google ScholarGoogle Scholar
  68. I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. 2013. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1319--1327.Google ScholarGoogle Scholar
  69. A. Graves, A. Mohamed, and G. E. Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). 6645--6649. Google ScholarGoogle ScholarCross RefCross Ref
  70. K. Gregor, I. Danihelka, A. Graves, and D. Wierstra. 2015. DRAW: A recurrent neural network for image generation. CoRR abs/1502.04623 (2015).Google ScholarGoogle Scholar
  71. Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan. 2014. 3D object recognition in cluttered scenes with local surface features: A survey. IEEE TPAMI 36, 11 (2014), 2270--2287. Google ScholarGoogle ScholarCross RefCross Ref
  72. Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. Kwok. 2016a. A comprehensive performance evaluation of 3D local feature descriptors. IJCV 116, 1 (2016), 66--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew. 2016b. Deep learning for visual understanding: A review. Neurocomputing 187 (2016), 27--48. Recent Developments on Deep Big Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Y. Guo, F. A. Sohel, M. Bennamoun, M. Lu, and J. Wan. 2013. Rotational projection statistics for 3D local surface description and object recognition. CoRR abs/1304.3192 (2013).Google ScholarGoogle Scholar
  75. Y. Guo, F. A. Sohel, M. Bennamoun, J. Wan, and M. Lu. 2015. A novel local surface feature for 3D object recognition under clutter and occlusion. Information Sciences 293 (2015), 196--213. Google ScholarGoogle ScholarCross RefCross Ref
  76. Y. Guo, J. Zhang, M. Lu, J. Wan, and Y. Ma. 2014. Benchmark datasets for 3D computer vision. In 9th IEEE Conference on Industrial Electronics and Applications (ICIEA’14). 1846--1851. Google ScholarGoogle ScholarCross RefCross Ref
  77. S. Gupta, R. Girshick, P. Arbelaez, and J. Malik. 2014. Learning rich features from RGB-D images for object detection and segmentation. In Proceedings of the 13th European Conference on Computer Vision. Google ScholarGoogle ScholarCross RefCross Ref
  78. Z. Han, Z. Liu, J. Han, C. M. Vong, S. Bu, and C. L. P. Chen. 2016. Mesh convolutional restricted Boltzmann machines for unsupervised learning of features with structure preservation on 3-D meshes. IEEE Transactions on Neural Networks and Learning Systems PP, 99 (2016), 1--14. Google ScholarGoogle ScholarCross RefCross Ref
  79. K. He, X. Zhang, S. Ren, and J. Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 (2014).Google ScholarGoogle Scholar
  80. K. He, X. Zhang, S. Ren, and J. Sun. 2015a. Deep residual learning for image recognition. CoRR abs/1512.03385 (2015).Google ScholarGoogle Scholar
  81. K. He, X. Zhang, S. Ren, and J. Sun. 2015b. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. CoRR abs/1502.01852 (2015).Google ScholarGoogle Scholar
  82. K. He, X. Zhang, S. Ren, and J. Sun. 2016. Identity mappings in deep residual networks. CoRR abs/1603.05027 (2016).Google ScholarGoogle Scholar
  83. V. Hegde and R. Zadeh. 2016. FusionNet: 3D object classification using multiple data representations. CoRR abs/1607.05695 (2016).Google ScholarGoogle Scholar
  84. M. Hilaga, Y. Shinagawa, T. Kohmura, and T. L. Kunii. 2001. Topology matching for fully automatic similarity estimation of 3D shapes. In Proceedings of the 28th SIGGRAPH. 203--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. G. E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 8 (2002), 1771--1800. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. G. E. Hinton, P. Dayan, B. Frey, and R. M. Neal. 1995. The wake-sleep algorithm for self-organizing neural networks. Science 268, 5124 (1995), 1158--1161. Google ScholarGoogle ScholarCross RefCross Ref
  87. G. E. Hinton, S. Osindero, and Y.-W. Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527--1554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. G. E. Hinton and R. R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507. Google ScholarGoogle ScholarCross RefCross Ref
  89. G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580 (2012).Google ScholarGoogle Scholar
  90. G. E. Hinton, O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015).Google ScholarGoogle Scholar
  91. S. Hochreiter. 1991. Untersuchungen Zu Dynamischen Neuronalen Netzen. Diploma thesis. Technical University Munich, Institute of Computer Science.Google ScholarGoogle Scholar
  92. S. Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. IJUFKS 6, 2 (1998), 107--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li. 2015. Deep convolutional neural networks for hyperspectral image classification. Journal of Sensors 2015, Article 258619 (2015). Google ScholarGoogle ScholarCross RefCross Ref
  95. G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. 2016. Deep networks with stochastic depth. CoRR abs/1603.09382 (2016).Google ScholarGoogle Scholar
  96. G. B. Huang, H. Zhou, X. Ding, and R. Zhang. 2012. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 2 (2012), 513--529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. G. B. Huang, Q.-Y. Zhu, and C.-K. Siew. 2006. Extreme learning machine: Theory and applications. Neurocomputing 70, 13 (2006), 489--501. Google ScholarGoogle ScholarCross RefCross Ref
  98. M. Ioannides, A. Hadjiprocopis, N. Doulamis, A. Doulamis, E. Protopapadakis, K. Makantasis, and others. 2013. Online 4D reconstruction using multi-images available under open access. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences 1 (2013), 169--174.Google ScholarGoogle Scholar
  99. S. Ioffe and C. Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015).Google ScholarGoogle Scholar
  100. M. Jaderberg, A. Vedaldi, and A. Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. CoRR abs/1405.3866 (2014).Google ScholarGoogle Scholar
  101. K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. 2009. What is the best multi-stage architecture for object recognition? In 12th IEEE International Conference on Computer Vision. 2146--2153. Google ScholarGoogle ScholarCross RefCross Ref
  102. S. Jayanti, Y. Kalyanaraman, N. Iyer, and K. Ramani. 2006. Developing an engineering shape benchmark for CAD models. Computer-Aided Design 38, 9 (2006), 939--953. Google ScholarGoogle ScholarCross RefCross Ref
  103. H. Jegou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014).Google ScholarGoogle Scholar
  105. E. Johns, S. Leutenegger, and A. J. Davison. 2016. Pairwise decomposition of image sequences for active multi-view recognition. In Proceedings of the IEEE Conference on CVPR. 3183--3822. Google ScholarGoogle ScholarCross RefCross Ref
  106. A. E. Johnson and M. Hebert. 1999. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 5 (1999), 433--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. N. Kalchbrenner, E. Grefenstette, and P. Blunsom. 2014. A convolutional neural network for modelling sentences. CoRR abs/1404.2188 (2014).Google ScholarGoogle Scholar
  108. L. L. C. Kasun, H. Zhou, G.-B. Huang, and C. M. Vong. 2013. Representational learning with extreme learning machine for big data. IEEE Intelligent Systems 28, 6 (2013), 31--34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. M. Kazhdan, Th. Funkhouser, and S. Rusinkiewicz. 2003. Rotation invariant spherical harmonic representation of 3D shape descriptors. In Symposium on Geometry Processing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. J. M. Khatib, N. Chileshe, and S. Sloan. 2007. Antecedents and benefits of 3D and 4D modelling for construction planners. Journal of Engineering, Design and Technology 5, 2 (2007), 159--172. Google ScholarGoogle ScholarCross RefCross Ref
  111. A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. 1097--1105.Google ScholarGoogle Scholar
  112. A. Krogh and J. A. Hertz. 1992. A simple weight decay can improve generalization. In Advances in Neural Information Processing Systems, Vol. 4. 950--957.Google ScholarGoogle Scholar
  113. G. Kyriakaki, A. Doulamis, N. Doulamis, M. Ioannides, K. Makantasis, E. Protopapadakis, A. Hadjiprocopis, K. Wenzel, and others. 2014. 4D reconstruction of tangible cultural heritage objects from web-retrieved images. International Journal of Heritage in the Digital Era 3, 2 (2014), 431--451. Google ScholarGoogle ScholarCross RefCross Ref
  114. L. Ladicky, C. Russell, P. Kohli, and P. H. S. Torr. 2009. Associative hierarchical CRFs for object class image segmentation. Proceedings of the IEEE 12th International Conference on Computer Vision (2009).Google ScholarGoogle ScholarCross RefCross Ref
  115. K. Lai, L. Bo, X. Ren, and D. Fox. 2011. A large-scale hierarchical multi-view RGB-D object dataset. In IEEE International Conference on on Robotics and Automation. Google ScholarGoogle ScholarCross RefCross Ref
  116. G. Lavoué. 2012. Combination of bag-of-words descriptors for robust partial shape retrieval. The Visual Computer 28, 9 (2012), 931--942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. V. Lebedev, Y. Ganin, M. Rakhuba, I. V. Oseledets, and V. S. Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned CP-decomposition. CoRR abs/1412.6553 (2014).Google ScholarGoogle Scholar
  118. Y. LeCun, Y. Bengio, and G. E. Hinton. 2015. Deep learning. Nature 521 (2015), 436--444. Google ScholarGoogle ScholarCross RefCross Ref
  119. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of IEEE 86, 11 (1998), 2278--2324. Google ScholarGoogle ScholarCross RefCross Ref
  120. Y. LeCun, K. Kavukcuoglu, and C. Farabet. 2010. Convolutional networks and applications in vision. In Proceedings of the 2010 IEEE International Symposium on Circuits and Systems (ISCAS’10). 253--256. Google ScholarGoogle ScholarCross RefCross Ref
  121. H. Lee, E. Chaitanya, and A. Y. Ng. 2008. Sparse deep belief net model for visual area V2. In Advances in Neural Information Processing Systems 20. 873--880.Google ScholarGoogle Scholar
  122. B. Leng, S. Guo, X. Zhang, and Z. Xiong. 2015. 3D object retrieval with stacked local convolutional autoencoder. Signal Processing 112, C (2015), 119--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. B. Leng, Y. Liu, K. Yu, X. Zhang, and Z. Xiong. 2016. 3D object understanding with 3D convolutional neural networks. Information Sciences 336, C (Oct. 2016), 188--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. B. Leng, X. Zhang, M. Yao, and Z. Xiong. 2014. MultiMedia Modeling: 20th Anniversary International Conference, Part II. Chapter: 3D Object Classification Using Deep Belief Networks, 128--139.Google ScholarGoogle Scholar
  125. B. Li, Y. Lu, A. Godil, T. Schreck, B. Bustos, A. Ferreira, and others. 2014a. A comparison of methods for sketch-based 3D shape retrieval. Computer Vision and Image Understanding 119 (2014), 57--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. B. Li, Y. Lu, C. Li, and others. 2015. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding 131 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. B. Li, E. Zhou, B. Huang, J. Duan, Y. Wang, N. Xu, J. Zhang, and H. Yang. 2014b. Large scale recurrent neural network on GPU. In International Joint Conference on Neural Networks (IJCNN’14). 4062--4069. Google ScholarGoogle ScholarCross RefCross Ref
  128. Z. Lian, A. Godil, B. Bustos, M. Daoudi, and others. 2011. SHREC’11 track: Shape retrieval on non-rigid 3D watertight meshes. In Proceedings of the 4th Eurographics Conference on 3D Object Retrieval. 79--88.Google ScholarGoogle Scholar
  129. M. Lin, Q. Chen, and S. Yan. 2013. Network in network. CoRR abs/1312.4400 (2013).Google ScholarGoogle Scholar
  130. Q. Liu. 2012. A survey of recent view-based 3D model retrieval methods. CoRR abs/1208.3670 (2012).Google ScholarGoogle Scholar
  131. Z. Liu, S. Chen, S. Bu, and K. Li. 2014. High-level semantic feature for 3D shape based on deep belief networks. In IEEE International Conference on Multimedia and Expo (ICME’14). 1--6. Google ScholarGoogle ScholarCross RefCross Ref
  132. D. G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision (ICCV’99), Vol. 2. 1150--1157. Google ScholarGoogle ScholarCross RefCross Ref
  133. A. Maas, A. Hannun, and A. Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In ICML Workshop on Deep Learning for Audio, Speech, and Language Processing.Google ScholarGoogle Scholar
  134. A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the ACL. 142--150.Google ScholarGoogle Scholar
  135. A. Mademlis, P. Daras, D. Tzovaras, and M. G. Strintzis. 2009. 3D object retrieval using the 3D shape impact descriptor. Pattern Recognition 42, 11 (2009), 2447--2459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. K. Makantasis, A. Doulamis, N. Doulamis, and M. Ioannides. 2016. In the wild image retrieval and clustering for 3D cultural heritage landmarks reconstruction. MTAP 75, 7 (2016), 3593--3629. Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis. 2015. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In IEEE IGARSS. 4959--4962. Google ScholarGoogle ScholarCross RefCross Ref
  138. J. Martens and I. Sutskever. 2011. Learning recurrent neural networks with Hessian-free optimization. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1033--1040.Google ScholarGoogle Scholar
  139. H. P. Martínez and G. N. Yannakakis. 2014. Deep multimodal fusion: Combining discrete events and continuous signals. In Proceedings of the 16th International Conference on Multimodal Interaction. 34--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. M. Mathieu, M. Henaff, and Y. LeCun. 2013. Fast training of convolutional networks through FFTs. CoRR abs/1312.5851 (2013).Google ScholarGoogle Scholar
  141. D. Maturana and S. Scherer. 2015. VoxNet: A 3D convolutional neural network for real-time object recognition. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 922--928. Google ScholarGoogle ScholarCross RefCross Ref
  142. W. McCulloch and W. Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 5, 4 (1943), 115--133. Google ScholarGoogle ScholarCross RefCross Ref
  143. A. Merentitis and C. Debes. 2015. Automatic fusion and classification using random forests and features extracted with deep learning. In International Geoscience and Remote Sensing Symposium. 2943--2946. Google ScholarGoogle ScholarCross RefCross Ref
  144. A. Mian, M. Bennamoun, and R. Owens. 2010. On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. IJCV 89, 2--3 (2010), 348--361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. K. Mikolajczyk and C. Schmid. 2005. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 10 (2005), 1615--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. 2005. A comparison of affine region detectors. IJCV 65, 1 (2005), 43--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. M. Muja and D. G. Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Application (VISSAPP’09). 331--340.Google ScholarGoogle Scholar
  148. V. Nair and G. E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 807--814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. A. Nguyen and B. Le. 2013. 3D point cloud segmentation: A survey. In Proceedings of the 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM). 225--230. Google ScholarGoogle ScholarCross RefCross Ref
  150. M. Niepert, M. Ahmed, and K. Kutzkov. 2016. Learning convolutional neural networks for graphs. CoRR abs/1605.05273 (2016).Google ScholarGoogle Scholar
  151. W. Ouyang, P. Luo, X. Zeng, S. Qiu, Y. Tian, H. Li, S. Yang, and others. 2014. DeepID-Net: Multi-stage and deformable deep convolutional neural networks for object detection. CoRR abs/1409.3505 (2014).Google ScholarGoogle Scholar
  152. J. Papon, A. Abramov, M. Schoeler, and F. Worgotter. 2013. Voxel cloud connectivity segmentation—Supervoxels for point clouds. In IEEE Conference on Computer Vision and Pattern Recognition. 2027--2034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. R. Pascanu, C. Gülçehre, K. Cho, and Y. Bengio. 2013a. How to construct deep recurrent neural networks. CoRR abs/1312.6026 (2013).Google ScholarGoogle Scholar
  154. R. Pascanu, T. Mikolov, and Y. Bengio. 2013b. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1310--1318.Google ScholarGoogle Scholar
  155. C. R. Qi, H. Su, M. Niessner, A. Dai, M. Yan, and L. J. Guibas. 2016. Volumetric and multi-view CNNs for object classification on 3D data. arXiv preprint arXiv:1604.03265v2 (2016).Google ScholarGoogle Scholar
  156. T. Rabbani, F. Van Den Heuvel, and G. Vosselmann. 2006. Segmentation of point clouds using smoothness constraint. ISPRS Archives 36, 5 (2006), 248--253.Google ScholarGoogle Scholar
  157. M. Ranzato, Y. Boureau, and Y. LeCun. 2008. Sparse feature learning for deep belief networks. In Advances in Neural Information Processing Systems 20. 1185--1192.Google ScholarGoogle Scholar
  158. S. Ren, K. He, R. Girshick, and J. Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.Google ScholarGoogle Scholar
  159. S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. 2011. Contracting auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th ICML. 833--840.Google ScholarGoogle Scholar
  160. A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. 2014. FitNets: Hints for thin deep nets. CoRR abs/1412.6550 (2014).Google ScholarGoogle Scholar
  161. F. Rosenblatt. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 6 (1958), 386--408. Google ScholarGoogle ScholarCross RefCross Ref
  162. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. 1986. Learning representations by back-propagating errors. Nature 323 (1986), 533--536. Google ScholarGoogle ScholarCross RefCross Ref
  163. R. B. Rusu, N. Blodow, Z. C. Marton, and M. Beetz. 2008. Aligning point cloud views using persistent feature histograms. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 3384--3391. Google ScholarGoogle ScholarCross RefCross Ref
  164. R. B. Rusu, G. Bradski, R. Thibaux, and J. Hsu. 2010. Fast 3D recognition and pose using the viewpoint feature histogram. In IEEE/RSJ International Conference on IROS. 2155--2162.Google ScholarGoogle Scholar
  165. R. B. Rusu and S. Cousins. 2011. 3D is here: Point Cloud Library (PCL). In IEEE International Conference on Robotics and Automation (ICRA’11). 1--4. Google ScholarGoogle ScholarCross RefCross Ref
  166. S. Salti, A. Petrelli, F. Tombari, and L. Di Stefano. 2012. On the affinity between 3D detectors and descriptors. In Proceedings of the 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission. 424--431.Google ScholarGoogle Scholar
  167. J. Sanchez-Riera, K.-L. Hua, Y.-S. Hsiao, T. Lim, S. C. Hidayati, and W.-H. Cheng. 2016. A comparative study of data fusion for RGB-D based visual recognition. Pattern Recognition Letters 73 (2016), 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. D. Scherer, A. Muller, and S. Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In 20th ICANN. Vol. 6354. 92--101. Google ScholarGoogle ScholarCross RefCross Ref
  169. G. Schindler and F. Dellaert. 2012. 4D cities: Analyzing visualizing and interacting with historical urban photo collections. Journal of Multimedia (2012).Google ScholarGoogle Scholar
  170. C. Schmid, R. Mohr, and C. Bauckhage. 2000. Evaluation of interest point detectors. International Journal of Computer Vision 37, 2 (2000), 151--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  171. J. Schmidhuber. 1992. Learning complex, extended sequences using the principle of history compression. Neural Computation 4, 2 (1992), 234--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  172. J. Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  173. R. Schnabel, R. Wahl, and R. Klein. 2007. Efficient RANSAC for point-cloud shape detection. Computer Graphics Forum 26, 2 (2007), 214--226. Google ScholarGoogle ScholarCross RefCross Ref
  174. M. Schwarz, H. Schulz, and S. Behnke. 2015. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In IEEE ICRA. 1329--1335. Google ScholarGoogle ScholarCross RefCross Ref
  175. N. Sedaghat, M. Zolfaghari, and Th. Brox. 2016. Orientation-boosted voxel nets for 3D object recognition. CoRR abs/1604.03351 (2016).Google ScholarGoogle Scholar
  176. P. Shilane, P. Min, M. Kazhdan, and T. Funkhouser. 2004. The Princeton shape benchmark. In Shape Modeling International. Google ScholarGoogle ScholarCross RefCross Ref
  177. K. Siddiqi, J. Zhang, D. Macrini, A. Shokoufandeh, S. Bouix, and S. Dickinson. 2008. Retrieving articulated 3-D models using medial surfaces. Machine Vision and Applications 19, 4 (2008), 261--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  178. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. 2012. Indoor segmentation and support inference from RGBD images. In ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. M.-C. Sima and A. Nuchter. 2013. An extension of the Felzenszwalb-Huttenlocher segmentation to 3D point clouds. 5th ICMV: Computer Vision, Image Analysis and Processing 8783 (2013).Google ScholarGoogle Scholar
  180. D. Smeets, Th. Fabry, J. Hermans, D. Vandermeulen, and P. Suetens. 2009. Isometric deformation modelling for object recognition. In Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns. 757--765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  181. R. Socher, B. Huval, B. Bhat, C. D. Manning, and A. Y. Ng. 2012. Convolutional-recursive deep learning for 3D object classification. In Advances in Neural Information Processing Systems 25. 656--664.Google ScholarGoogle Scholar
  182. S. Song and J. Xiao. 2014. Sliding shapes for 3D object detection in depth images. In Proceedings of the 13th European Conference on Computer Vision (ECCV’14). 634--651. Google ScholarGoogle ScholarCross RefCross Ref
  183. S. Song and J. Xiao. 2016. Deep sliding shapes for amodal 3D object detection in RGB-D images. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarCross RefCross Ref
  184. H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the International Conference on Computer Vision (ICCV’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  185. I. Sutskever. 2012. Training Recurrent Neural Networks. Ph.D. dissertation. University of Toronto.Google ScholarGoogle ScholarDigital LibraryDigital Library
  186. I. Sutskever, J. Martens, and G. E. Hinton. 2011. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1017--1024.Google ScholarGoogle Scholar
  187. C. Szegedy, S. Ioffe, and V. Vanhoucke. 2016. Inception-v4, inception-ResNet and the impact of residual connections on learning. CoRR abs/1602.07261 (2016).Google ScholarGoogle Scholar
  188. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2014. Going deeper with convolutions. CoRR abs/1409.4842 (2014).Google ScholarGoogle Scholar
  189. H. Tabia, H. Laga, D. Picard, and P.-H. Gosselin. 2014. Covariance descriptors for 3D shape matching and retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 4185--4192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  190. J. Tang, S. Miller, A. Singh, and P. Abbeel. 2012. A textured object recognition pipeline for color and depth image data. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’12). Google ScholarGoogle ScholarCross RefCross Ref
  191. J. W. H. Tangelder and R. C. Veltkamp. 2007. A survey of content based 3D shape retrieval methods. Multimedia Tools and Applications 39, 3 (2007), 441.Google ScholarGoogle ScholarDigital LibraryDigital Library
  192. L. Theis and M. Bethge. 2015. Generative image modeling using spatial LSTMs. In Advances in Neural Information Processing Systems 28.Google ScholarGoogle Scholar
  193. F. Tombari and L. Di Stefano. 2012. Hough voting for 3D object recognition under occlusion and clutter. IPSJ Transactions on Computer Vision and Applications 4 (2012), 20--29. Google ScholarGoogle ScholarCross RefCross Ref
  194. F. Tombari, S. Salti, and L. Di Stefano. 2010. Unique signatures of histograms for local surface description. In Proceedings of the 11th European Conference on Computer Vision: Part III (ECCV’10). 356--369. Google ScholarGoogle ScholarCross RefCross Ref
  195. F. Tombari, S. Salti, and L. Di Stefano. 2013. Performance evaluation of 3D keypoint detectors. International Journal of Computer Vision 102, 1--3 (2013), 198--220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  196. J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision (2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  197. J. P. C. Valentin, S. Sengupta, J. Warrell, A. Shahrokni, and P. H. S. Torr. 2013. Mesh based semantic modelling for indoor and outdoor scenes. In IEEE CVPR. 2067--2074. Google ScholarGoogle ScholarDigital LibraryDigital Library
  198. V. Vanhoucke, A. Senior, and M. Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.Google ScholarGoogle Scholar
  199. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11 (2010), 3371--3408.Google ScholarGoogle ScholarDigital LibraryDigital Library
  200. F. Visin, K. Kastner, K. Cho, M. Matteucci, A. C. Courville, and Y. Bengio. 2015. ReNet: A recurrent neural network based alternative to convolutional networks. CoRR abs/1505.00393 (2015).Google ScholarGoogle Scholar
  201. A.-V. Vo, L. Truong-Hong, D. F. Laefer, and M. Bertolotto. 2015. Octree-based region growing for point cloud segmentation. ISPRS Journal of Photogrammetry and Remote Sensing 104 (2015), 88--100. Google ScholarGoogle ScholarCross RefCross Ref
  202. L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. 2013. Regularization of neural networks using dropconnect. In Proceedings of the 30th ICML, Vol. 28. 1058--1066.Google ScholarGoogle Scholar
  203. F. Wang, L. Kang, and Y. Li. 2015. Sketch-based 3D shape retrieval using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarGoogle ScholarCross RefCross Ref
  204. W. Wang, L. Chen, Z. Liu, K. Kühnlenz, and D. Burschka. 2013. Textured/textureless object recognition and pose estimation using RGB-D image. Journal of Real-Time Image Processing 10, 4 (2013), 667--682. Google ScholarGoogle ScholarDigital LibraryDigital Library
  205. Y. Wang, Z. Xie, K. Xu, Y. Dou, and Y. Lei. 2016. An efficient and effective convolutional auto-encoder extreme learning machine network for 3D feature learning. Neurocomputing 174 (2016), 988--998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  206. D. Weikersdorfer, D. Gossow, and M. Beetz. 2012. Depth-adaptive superpixels. In 21st International Conference on Pattern Recognition (ICPR’12). 2087--2090.Google ScholarGoogle Scholar
  207. P. J. Werbos. 1990. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78, 10 (1990), 1550--1560. Google ScholarGoogle ScholarCross RefCross Ref
  208. W. Wohlkinger and M. Vincze. 2011. Ensemble of shape functions for 3D object classification. In IEEE International Conference on Robotics and Biomimetics (ROBIO’11). 2987--2992. Google ScholarGoogle ScholarCross RefCross Ref
  209. H. Wu and X. Gu. 2015. Towards dropout training for convolutional neural networks. Neural Networks 71 (2015), 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  210. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 2015. 3D shapenets: A deep representation for volumetric shapes. In IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.Google ScholarGoogle Scholar
  211. J. Xie, Y. Fang, F. Zhu, and E. Wong. 2015a. DeepShape: Deep learned shape descriptor for 3D shape matching and retrieval. In Proceedings of the IEEE Conference on CVPR. 1275--1283.Google ScholarGoogle Scholar
  212. Z. Xie, K. Xu, W. Shan, L. Liu, Y. Xiong, and H. Huang. 2015b. Projective feature learning for 3D shapes with multi-view depth images. Computer Graphics Forum (Proceedings of Pacific Graphics 2015) 34, 6 (2015).Google ScholarGoogle Scholar
  213. B. Xu, N. Wang, T. Chen, and M. Li. 2015b. Empirical evaluation of rectified activations in convolutional network. CoRR abs/1505.00853 (2015).Google ScholarGoogle Scholar
  214. Q. Xu, S. Jiang, W. Huang, F. Ye, and S. Xu. 2015a. Feature fusion based image retrieval using deep learning. Journal of Information and Computational Science 12, 6 (2015), 2361--2373. Google ScholarGoogle ScholarCross RefCross Ref
  215. Z. Yan, H. Zhang, Y. Jia, Th. Breuel, and Y. Yu. 2016. Combining the best of convolutional layers and recurrent layers: A hybrid network for semantic segmentation. CoRR abs/1603.04871 (2016).Google ScholarGoogle Scholar
  216. J. Yue, S. Mao, and M. Li. 2016. A deep learning framework for hyperspectral image classification using spatial pyramid pooling. Remote Sensing Letters 7, 9 (2016), 875--884. Google ScholarGoogle ScholarCross RefCross Ref
  217. A. Zaharescu, E. Boyer, K. Varanasi, and R. Horaud. 2009. Surface feature detection and description with applications to mesh matching. In IEEE Conference on CVPR. 373--380. Google ScholarGoogle ScholarCross RefCross Ref
  218. H. F. M. Zaki, F. Shafait, and A. Mian. 2016. Convolutional hypercube pyramid for accurate RGB-D object category and instance recognition. In IEEE ICRA. 1685--1692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  219. D. Zarpalas, P. Daras, A. Axenopoulos, D. Tzovaras, and M. G. Strintzis. 2006. 3D model search and retrieval using the spherical trace transform. EURASIP Journal on Advances in Signal Processing 2007 (2006).Google ScholarGoogle Scholar
  220. M. D. Zeiler and R. Fergus. 2013. Stochastic pooling for regularization of deep convolutional neural networks. CoRR abs/1301.3557 (2013).Google ScholarGoogle Scholar
  221. A. Zelener. 2015. Survey of object classification in 3D range scans. (2015).Google ScholarGoogle Scholar
  222. L. Zhang, L. Zhang, and B. Du. 2016a. Deep learning for remote sensing data: A technical tutorial on the state of the art. IEEE Geoscience and Remote Sensing Magazine 4, 2 (2016), 22--40. Google ScholarGoogle ScholarCross RefCross Ref
  223. X. Zhang, H. Zhang, Y. Zhang, Y. Yang, M. Wang, H. Luan, J. Li, and T. S. Chua. 2016b. Deep fusion of multiple semantic cues for complex event recognition. IEEE TIP 25, 3 (2016), 1033--1046.Google ScholarGoogle Scholar
  224. X. Zhang, J. Zou, X. Ming, K. He, and J. Sun. 2014. Efficient and accurate approximations of nonlinear convolutional networks. CoRR abs/1411.4229 (2014).Google ScholarGoogle Scholar
  225. W. Zhao and S. Du. 2016. Learning multiscale and deep representations for classifying remotely sensed imagery. ISPRS Journal of Photogrammetry and Remote Sensing 113 (2016), 155--165. Google ScholarGoogle ScholarCross RefCross Ref
  226. Y. Zhong. 2009. Intrinsic shape signatures: A shape descriptor for 3D object recognition. In 12th IEEE International Conference on Computer Vision Workshops (ICCV Workshops). 689--696.Google ScholarGoogle ScholarCross RefCross Ref
  227. Y. Zhou and Y. Wei. 2016. Learning hierarchical spectral-spatial features for hyperspectral image classification. IEEE Transactions on Cybernetics 46, 7 (2016), 1667--1678. Google ScholarGoogle ScholarCross RefCross Ref
  228. Z. Zhu, X. Wang, S. Bai, C. Yao, and X. Bai. 2014. Deep learning representation using autoencoder for 3D shape retrieval. CoRR abs/1409.7164 (2014).Google ScholarGoogle Scholar

Index Terms

  1. Deep Learning Advances in Computer Vision with 3D Data: A Survey



          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Computing Surveys
            ACM Computing Surveys  Volume 50, Issue 2
            March 2018
            567 pages
            • Editor:
            • Sartaj Sahni
            Issue’s Table of Contents

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].


            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 April 2017
            • Accepted: 1 January 2017
            • Revised: 1 December 2016
            • Received: 1 June 2016
            Published in csur Volume 50, Issue 2


            Request permissions about this article.

            Request Permissions

            Check for updates


            • survey
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.



          View online with eReader.
