Abstract
Deep learning has recently gained popularity achieving state-of-the-art performance in tasks involving text, sound, or image processing. Due to its outstanding performance, there have been efforts to apply it in more challenging scenarios, for example, 3D data processing. This article surveys methods applying deep learning on 3D data and provides a classification based on how they exploit them. From the results of the examined works, we conclude that systems employing 2D views of 3D data typically surpass voxel-based (3D) deep models, which however, can perform better with more layers and severe data augmentation. Therefore, larger-scale datasets and increased resolutions are required.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Deep Learning Advances in Computer Vision with 3D Data: A Survey
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, R. Jozefowicz, Y. Jia, L. Kaiser, M. Kudlur, J. Levenberg, D. Man, M. Schuster, R. Monga, S. Moore, D. Murray, C. Olah, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vigas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/ Software available from tensorflow.org.Google Scholar
- R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274--2282. Google ScholarDigital Library
- A. Agarwal, E. Akchurin, C. Basoglu, G. Chen, S. Cyphers, J. Droppo, A. Eversole, B. Guenter, M. Hillebrand, R. Hoens, X. Huang, Z. Huang, V. Ivanov, A. Kamenev, P. Kranen, O. Kuchaiev, W. Manousek, A. May, B. Mitra, O. Nano, G. Navarro, A. Orlov, M. Padmilac, H. Parthasarathi, B. Peng, A. Reznichenko, F. Seide, M. L. Seltzer, M. Slaney, A. Stolcke, Y. Wang, H. Wang, K. Yao, D. Yu, Y. Zhang, and G. Zweig. 2014. An Introduction to Computational Networks and the Computational Network Toolkit. Technical Report MSR-TR-2014-112. Microsoft Research.Google Scholar
- A. K. Aijazi, P. Checchin, and L. Trassoudaine. 2013. Segmentation based classification of 3D urban point clouds: A super-voxel based approach with evaluation. Remote Sensing 5, 4 (2013), 1624--1650. Google ScholarCross Ref
- A. Aldoma, F. Tombari, L. Di Stefano, and M. Vincze. 2012a. A global hypotheses verification method for 3D object recognition. In Proceedings of the 12th European Conference on Computer Vision. 511--524. Google ScholarDigital Library
- A. Aldoma, F. Tombari, R. B. Rusu, and M. Vincze. 2012b. Pattern Recognition: Joint 34th DAGM and 36th OAGM Symposium. Chapter OUR-CVFH -- Oriented, Unique and Repeatable Clustered Viewpoint Feature Histogram for Object Recognition and 6DOF Pose Estimation, 113--122.Google Scholar
- A. Aldoma, M. Vincze, N. Blodow, D. Gossow, S. Gedikli, R. B. Rusu, and G. Bradski. 2011. CAD-model recognition and 6DOF pose estimation using 3D cues. In IEEE ICCV Workshops. 585--592.Google Scholar
- L. A. Alexandre. 2012. 3D descriptors for object and category recognition: A comparative evaluation. In Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RSJ IROS.Google Scholar
- L. A. Alexandre. 2014. 3D Object recognition using convolutional neural networks with transfer learning between input channels. In 13th International Conference on Intelligent Autonomous Systems, Vol. 301.Google Scholar
- S. Bahrampour, N. Ramakrishnan, L. Schott, and M. Shah. 2015. Comparative study of Caffe, Neon, Theano, and Torch for deep learning. CoRR abs/1511.06435 (2015).Google Scholar
- S. Bai, X. Bai, Z. Zhou, Z. Zhang, and L. Jan Latecki. 2016. GIFT: A real-time and scalable 3D shape search engine. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarCross Ref
- P. Baldi and P. J. Sadowski. 2013. Understanding dropout. In Advances in Neural Information Processing Systems 26. 2814--2822.Google Scholar
- F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, D. Warde-Farley, and Y. Bengio. 2012. Theano: New features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop.Google Scholar
- S. Bell, C. L. Zitnick, K. Bala, and R. B. Girshick. 2015. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. CoRR abs/1512.04143 (2015).Google Scholar
- J. A. Benediktsson, J. A. Palmason, and J. Sveinsson. 2005. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE TGRS 43, 3 (2005), 480--491. Google ScholarCross Ref
- Y. Bengio. 2012. Neural Networks: Tricks of the Trade: Second Edition. Chapter: Practical Recommendations for Gradient-Based Training of Deep Architectures, 437--478.Google Scholar
- Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. 2007. Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems 19. 153--160.Google Scholar
- J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. 2011. Algorithms for hyper-parameter optimization. In 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), Vol. 24.Google Scholar
- J. Bergstra and Y. Bengio. 2012. Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13 (2012), 281--305.Google ScholarDigital Library
- J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. 2010. Theano: A CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy). Oral Presentation.Google Scholar
- P. J. Besl and N. D. McKay. 1992. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 2 (1992), 239--256. Google ScholarDigital Library
- J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N. Nasrabadi, and J. Chanussot. 2013. Hyperspectral remote sensing data analysis and future challenges. IEEE Geoscience and Remote Sensing Magazine 1, 2 (2013), 6--36. Google ScholarCross Ref
- L. Bo, X. Ren, and D. Fox. 2013. Unsupervised feature learning for RGB-D based object recognition. In Experimental Robotics: The 13th International Symposium on Experimental Robotics. 387--402. Google ScholarCross Ref
- D. Borrmann, J. Elseberg, K. Lingemann, and A. Nüchter. 2011. The 3D Hough transform for plane detection in point clouds: A review and a new accumulator design. 3D Research 2, 2 (2011), 1--13.Google Scholar
- F. Bosche, Y. Turkan, C. Haas, and R. Haas. 2010. Fusing 4D modeling and laser scanning for automated construction progress control. 26th ARCOM Annual Conference and Annual General Meeting (2010).Google Scholar
- Y.-L. Boureau, J. Ponce, and Y. LeCun. 2010. A theoretical analysis of feature pooling in vision algorithms. In Proceedings of the International Conference on Machine learning (ICML’10).Google Scholar
- A. Brock, Th. Lim, J. M. Ritchie, and N. Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. CoRR abs/1608.04236 (2016).Google Scholar
- M. M. Bronstein and I. Kokkinos. 2010. Scale-invariant heat kernel signatures for non-rigid shape recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 1704--1711. Google ScholarCross Ref
- S. Bu, P. Han, Z. Liu, J. Han, and H. Lin. 2015. Local deep feature learning framework for 3D shape. Computers 8 Graphics 46 (2015), 117--129. Shape Modeling International 2014.Google Scholar
- S. Bu, Z. Liu, J. Han, J. Wu, and R. Ji. 2014. Learning high-level feature by deep belief networks for 3-D model retrieval and recognition. IEEE Transactions on Multimedia 16, 8 (2014), 2154--2167. Google ScholarCross Ref
- B. Bustos, D. Keim, D. Saupe, and T. Schreck. 2007. Content-based 3D object retrieval. IEEE Computer Graphics and Applications 27, 4 (2007), 22--27. Google ScholarDigital Library
- W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki. 2015. Scene labeling with LSTM recurrent neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3547--3555. Google ScholarCross Ref
- Z. Cai, J. Han, L. Liu, and L. Shao. 2016. RGB-D datasets using microsoft kinect or similar sensors: A survey. Multimedia Tools and Applications (2016), 1--43.Google Scholar
- N. Charbonneau, J. Burgess, and L. Robichaud. 2015. Using 4D modelling in a university-museum research partnership. In 2015 Digital Heritage, Vol. 2. 603--610.Google Scholar
- K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference. Google ScholarCross Ref
- D.-Y. Chen, X. P. Tian, Y.-T. Shen, and M. Ouhyoung. 2003. On visual similarity based 3D model retrieval. Computer Graphics Forum (EUROGRAPHICS’03) 22, 3 (2003), 223--232.Google Scholar
- H. Chen and B. Bhanu. 2007. 3D free-form object recognition in range images using local surface patches. Pattern Recognition Letters 28, 10 (2007), 1252--1262. Google ScholarDigital Library
- W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. 2015a. Compressing neural networks with the hashing trick. CoRR abs/1504.04788 (2015).Google Scholar
- Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi. 2016. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE TGRS 54, 10 (2016), 6232--6251. Google ScholarCross Ref
- Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu. 2014. Deep learning-based classification of hyperspectral data. IEEE J-STARS 7, 6 (2014), 2094--2107. Google ScholarCross Ref
- Y. Chen, X. Zhao, and X. Jia. 2015b. Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J-STARS 8, 6 (2015), 2381--2392.Google Scholar
- R. Collobert, K. Kavukcuoglu, and C. Farabet. 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop.Google Scholar
- R. Collobert, K. Kavukcuoglu, and C. Farabet. 2012. Neural Networks: Tricks of the Trade: Second Edition. Chapter: Implementing Neural Networks Efficiently, 537--557.Google Scholar
- C. Cortes and V. Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273--297. Google ScholarCross Ref
- C. Couprie, C. Farabet, L. Najman, and Y. Lecun. 2013. Indoor semantic segmentation using depth information. CoRR abs/1301.3572 (2013).Google Scholar
- P. Daras and A. Axenopoulos. 2010. A 3D shape retrieval framework supporting multimodal queries. International Journal of Computer Vision 89, 2 (2010), 229--247. Google ScholarDigital Library
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Computer Vision and Pattern Recognition (CVPR’09).Google Scholar
- L. Deng. 2014. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing 3 (2014), e5. Google ScholarCross Ref
- M. Denil, B. Shakibi, L. Dinh, M. A. Ranzato, and N. de Freitas. 2013. Predicting parameters in deep learning. CoRR abs/1306.0543 (2013).Google Scholar
- E. Denton, E. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. CoRR abs/1404.0736 (2014).Google Scholar
- B. Douillard, J. Underwood, N. Kuntz, V. Vlaskine, A. Quadros, P. Morton, and A. Frenkel. 2011. On the segmentation of 3D LIDAR point clouds. In IEEE ICRA. 2798--2805.Google Scholar
- A. Doulamis, M. Ioannides, N. Doulamis, A. Hadjiprocopis, D. Fritsch, O. Balet, M. Julien, E. Protopapadakis, and others. 2013. 4D reconstruction of the past. Proceedings of SPIE 8795 (2013), 87950J-1--87950J-11. Google ScholarCross Ref
- A. Doulamis, S. Soile, N. Doulamis, C. Chrisouli, N. Grammalidis, K. Dimitropoulos, C. Manesis, C. Potsiou, and C. Ioannidis. 2015. Selective 4D modelling framework for spatial-temporal land information management system. Proceedings of SPIE 9535, 3rd RSCy (2015).Google Scholar
- N. Doulamis and A. Doulamis. 2012. Fast and adaptive deep fusion learning for detecting visual objects. In Proceedings of ECCV 2012. Workshops and Demonstrations. 345--354. Google ScholarDigital Library
- A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard. 2015. Multimodal deep learning for robust RGB-D object recognition. In IEEE/RSJ International Conference on IROS. Google ScholarCross Ref
- Y. Fang, J. Xie, G. Dai, M. Wang, F. Zhu, T. Xu, and E. Wong. 2015. 3D deep shape descriptor. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2319--2328. Google ScholarCross Ref
- M. Fauvel, J. Chanussot, and J. A. Benediktsson. 2012. A spatial--spectral kernel-based approach for the classification of remote-sensing images. Pattern Recognition 45, 1 (2012), 381--392. Google ScholarDigital Library
- J. Feng, Y. Wang, and S.-F. Chang. 2016. 3D shape retrieval using single depth image from low-cost sensors. In IEEE Winter Conference on Applications of Computer Vision (WACV’16). Google ScholarCross Ref
- S. Filipe and L. A. Alexandre. 2014. A comparative evaluation of 3D keypoint detectors in a RGB-D object dataset. In 9th International Conference on Computer Vision Theory and Applications. 476--483.Google Scholar
- S. Filipe, L. Itti, and L. A. Alexandre. 2015. BIK-BUS: Biologically motivated 3D keypoint based on bottom-up saliency. IEEE Transactions on Image Processing 24, 1 (2015), 163--175. Google ScholarCross Ref
- A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik. 2004. Recognizing objects in range data using regional point descriptors. In ECCV 2004. Lecture Notes in Computer Science, Vol. 3023. 224--237. Google ScholarCross Ref
- Y. Gao and Q. Dai. 2014. View-based 3D object retrieval: Challenges and approaches. IEEE MultiMedia 21, 3 (2014), 52--57. Google ScholarCross Ref
- Y. Gao, M. Wang, D. Tao, R. Ji, and Q. Dai. 2012. 3-D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing 21, 9 (2012), 4290--4303. Google ScholarDigital Library
- Y. Gao, M. Wang, Z. J. Zha, Q. Tian, Q. Dai, and N. Zhang. 2011. Less is more: Efficient 3-D object retrieval with query view selection. IEEE Transactions on Multimedia 13, 5 (2011), 1007--1018. Google ScholarDigital Library
- D. Giorgi, S. Biasotti, and L. Paraboschi. 2007. Shape retrieval contest 2007: Watertight models track. SHREC Competition 8 (2007).Google Scholar
- A. Godil, H. Dutagaci, C. Akgul, A. Axenopoulos, B. Bustos, M. Chaouch, P. Daras, and others. 2009. SHREC’09 track: Generic shape retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 61--68.Google Scholar
- Y. Gong, L. Liu, M. Yang, and L. D. Bourdev. 2014. Compressing deep convolutional networks using vector quantization. CoRR abs/1412.6115 (2014).Google Scholar
- I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. 2013. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1319--1327.Google Scholar
- A. Graves, A. Mohamed, and G. E. Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). 6645--6649. Google ScholarCross Ref
- K. Gregor, I. Danihelka, A. Graves, and D. Wierstra. 2015. DRAW: A recurrent neural network for image generation. CoRR abs/1502.04623 (2015).Google Scholar
- Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan. 2014. 3D object recognition in cluttered scenes with local surface features: A survey. IEEE TPAMI 36, 11 (2014), 2270--2287. Google ScholarCross Ref
- Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. Kwok. 2016a. A comprehensive performance evaluation of 3D local feature descriptors. IJCV 116, 1 (2016), 66--89. Google ScholarDigital Library
- Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew. 2016b. Deep learning for visual understanding: A review. Neurocomputing 187 (2016), 27--48. Recent Developments on Deep Big Vision. Google ScholarDigital Library
- Y. Guo, F. A. Sohel, M. Bennamoun, M. Lu, and J. Wan. 2013. Rotational projection statistics for 3D local surface description and object recognition. CoRR abs/1304.3192 (2013).Google Scholar
- Y. Guo, F. A. Sohel, M. Bennamoun, J. Wan, and M. Lu. 2015. A novel local surface feature for 3D object recognition under clutter and occlusion. Information Sciences 293 (2015), 196--213. Google ScholarCross Ref
- Y. Guo, J. Zhang, M. Lu, J. Wan, and Y. Ma. 2014. Benchmark datasets for 3D computer vision. In 9th IEEE Conference on Industrial Electronics and Applications (ICIEA’14). 1846--1851. Google ScholarCross Ref
- S. Gupta, R. Girshick, P. Arbelaez, and J. Malik. 2014. Learning rich features from RGB-D images for object detection and segmentation. In Proceedings of the 13th European Conference on Computer Vision. Google ScholarCross Ref
- Z. Han, Z. Liu, J. Han, C. M. Vong, S. Bu, and C. L. P. Chen. 2016. Mesh convolutional restricted Boltzmann machines for unsupervised learning of features with structure preservation on 3-D meshes. IEEE Transactions on Neural Networks and Learning Systems PP, 99 (2016), 1--14. Google ScholarCross Ref
- K. He, X. Zhang, S. Ren, and J. Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 (2014).Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2015a. Deep residual learning for image recognition. CoRR abs/1512.03385 (2015).Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2015b. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. CoRR abs/1502.01852 (2015).Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2016. Identity mappings in deep residual networks. CoRR abs/1603.05027 (2016).Google Scholar
- V. Hegde and R. Zadeh. 2016. FusionNet: 3D object classification using multiple data representations. CoRR abs/1607.05695 (2016).Google Scholar
- M. Hilaga, Y. Shinagawa, T. Kohmura, and T. L. Kunii. 2001. Topology matching for fully automatic similarity estimation of 3D shapes. In Proceedings of the 28th SIGGRAPH. 203--212. Google ScholarDigital Library
- G. E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 8 (2002), 1771--1800. Google ScholarDigital Library
- G. E. Hinton, P. Dayan, B. Frey, and R. M. Neal. 1995. The wake-sleep algorithm for self-organizing neural networks. Science 268, 5124 (1995), 1158--1161. Google ScholarCross Ref
- G. E. Hinton, S. Osindero, and Y.-W. Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527--1554. Google ScholarDigital Library
- G. E. Hinton and R. R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507. Google ScholarCross Ref
- G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580 (2012).Google Scholar
- G. E. Hinton, O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015).Google Scholar
- S. Hochreiter. 1991. Untersuchungen Zu Dynamischen Neuronalen Netzen. Diploma thesis. Technical University Munich, Institute of Computer Science.Google Scholar
- S. Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. IJUFKS 6, 2 (1998), 107--116. Google ScholarDigital Library
- S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780. Google ScholarDigital Library
- W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li. 2015. Deep convolutional neural networks for hyperspectral image classification. Journal of Sensors 2015, Article 258619 (2015). Google ScholarCross Ref
- G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. 2016. Deep networks with stochastic depth. CoRR abs/1603.09382 (2016).Google Scholar
- G. B. Huang, H. Zhou, X. Ding, and R. Zhang. 2012. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 2 (2012), 513--529. Google ScholarDigital Library
- G. B. Huang, Q.-Y. Zhu, and C.-K. Siew. 2006. Extreme learning machine: Theory and applications. Neurocomputing 70, 13 (2006), 489--501. Google ScholarCross Ref
- M. Ioannides, A. Hadjiprocopis, N. Doulamis, A. Doulamis, E. Protopapadakis, K. Makantasis, and others. 2013. Online 4D reconstruction using multi-images available under open access. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences 1 (2013), 169--174.Google Scholar
- S. Ioffe and C. Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015).Google Scholar
- M. Jaderberg, A. Vedaldi, and A. Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. CoRR abs/1405.3866 (2014).Google Scholar
- K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. 2009. What is the best multi-stage architecture for object recognition? In 12th IEEE International Conference on Computer Vision. 2146--2153. Google ScholarCross Ref
- S. Jayanti, Y. Kalyanaraman, N. Iyer, and K. Ramani. 2006. Developing an engineering shape benchmark for CAD models. Computer-Aided Design 38, 9 (2006), 939--953. Google ScholarCross Ref
- H. Jegou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128. Google ScholarDigital Library
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014).Google Scholar
- E. Johns, S. Leutenegger, and A. J. Davison. 2016. Pairwise decomposition of image sequences for active multi-view recognition. In Proceedings of the IEEE Conference on CVPR. 3183--3822. Google ScholarCross Ref
- A. E. Johnson and M. Hebert. 1999. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 5 (1999), 433--449. Google ScholarDigital Library
- N. Kalchbrenner, E. Grefenstette, and P. Blunsom. 2014. A convolutional neural network for modelling sentences. CoRR abs/1404.2188 (2014).Google Scholar
- L. L. C. Kasun, H. Zhou, G.-B. Huang, and C. M. Vong. 2013. Representational learning with extreme learning machine for big data. IEEE Intelligent Systems 28, 6 (2013), 31--34.Google ScholarDigital Library
- M. Kazhdan, Th. Funkhouser, and S. Rusinkiewicz. 2003. Rotation invariant spherical harmonic representation of 3D shape descriptors. In Symposium on Geometry Processing.Google ScholarDigital Library
- J. M. Khatib, N. Chileshe, and S. Sloan. 2007. Antecedents and benefits of 3D and 4D modelling for construction planners. Journal of Engineering, Design and Technology 5, 2 (2007), 159--172. Google ScholarCross Ref
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. 1097--1105.Google Scholar
- A. Krogh and J. A. Hertz. 1992. A simple weight decay can improve generalization. In Advances in Neural Information Processing Systems, Vol. 4. 950--957.Google Scholar
- G. Kyriakaki, A. Doulamis, N. Doulamis, M. Ioannides, K. Makantasis, E. Protopapadakis, A. Hadjiprocopis, K. Wenzel, and others. 2014. 4D reconstruction of tangible cultural heritage objects from web-retrieved images. International Journal of Heritage in the Digital Era 3, 2 (2014), 431--451. Google ScholarCross Ref
- L. Ladicky, C. Russell, P. Kohli, and P. H. S. Torr. 2009. Associative hierarchical CRFs for object class image segmentation. Proceedings of the IEEE 12th International Conference on Computer Vision (2009).Google ScholarCross Ref
- K. Lai, L. Bo, X. Ren, and D. Fox. 2011. A large-scale hierarchical multi-view RGB-D object dataset. In IEEE International Conference on on Robotics and Automation. Google ScholarCross Ref
- G. Lavoué. 2012. Combination of bag-of-words descriptors for robust partial shape retrieval. The Visual Computer 28, 9 (2012), 931--942. Google ScholarDigital Library
- V. Lebedev, Y. Ganin, M. Rakhuba, I. V. Oseledets, and V. S. Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned CP-decomposition. CoRR abs/1412.6553 (2014).Google Scholar
- Y. LeCun, Y. Bengio, and G. E. Hinton. 2015. Deep learning. Nature 521 (2015), 436--444. Google ScholarCross Ref
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of IEEE 86, 11 (1998), 2278--2324. Google ScholarCross Ref
- Y. LeCun, K. Kavukcuoglu, and C. Farabet. 2010. Convolutional networks and applications in vision. In Proceedings of the 2010 IEEE International Symposium on Circuits and Systems (ISCAS’10). 253--256. Google ScholarCross Ref
- H. Lee, E. Chaitanya, and A. Y. Ng. 2008. Sparse deep belief net model for visual area V2. In Advances in Neural Information Processing Systems 20. 873--880.Google Scholar
- B. Leng, S. Guo, X. Zhang, and Z. Xiong. 2015. 3D object retrieval with stacked local convolutional autoencoder. Signal Processing 112, C (2015), 119--128. Google ScholarDigital Library
- B. Leng, Y. Liu, K. Yu, X. Zhang, and Z. Xiong. 2016. 3D object understanding with 3D convolutional neural networks. Information Sciences 336, C (Oct. 2016), 188--201. Google ScholarDigital Library
- B. Leng, X. Zhang, M. Yao, and Z. Xiong. 2014. MultiMedia Modeling: 20th Anniversary International Conference, Part II. Chapter: 3D Object Classification Using Deep Belief Networks, 128--139.Google Scholar
- B. Li, Y. Lu, A. Godil, T. Schreck, B. Bustos, A. Ferreira, and others. 2014a. A comparison of methods for sketch-based 3D shape retrieval. Computer Vision and Image Understanding 119 (2014), 57--80. Google ScholarDigital Library
- B. Li, Y. Lu, C. Li, and others. 2015. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding 131 (2015). Google ScholarDigital Library
- B. Li, E. Zhou, B. Huang, J. Duan, Y. Wang, N. Xu, J. Zhang, and H. Yang. 2014b. Large scale recurrent neural network on GPU. In International Joint Conference on Neural Networks (IJCNN’14). 4062--4069. Google ScholarCross Ref
- Z. Lian, A. Godil, B. Bustos, M. Daoudi, and others. 2011. SHREC’11 track: Shape retrieval on non-rigid 3D watertight meshes. In Proceedings of the 4th Eurographics Conference on 3D Object Retrieval. 79--88.Google Scholar
- M. Lin, Q. Chen, and S. Yan. 2013. Network in network. CoRR abs/1312.4400 (2013).Google Scholar
- Q. Liu. 2012. A survey of recent view-based 3D model retrieval methods. CoRR abs/1208.3670 (2012).Google Scholar
- Z. Liu, S. Chen, S. Bu, and K. Li. 2014. High-level semantic feature for 3D shape based on deep belief networks. In IEEE International Conference on Multimedia and Expo (ICME’14). 1--6. Google ScholarCross Ref
- D. G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision (ICCV’99), Vol. 2. 1150--1157. Google ScholarCross Ref
- A. Maas, A. Hannun, and A. Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In ICML Workshop on Deep Learning for Audio, Speech, and Language Processing.Google Scholar
- A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the ACL. 142--150.Google Scholar
- A. Mademlis, P. Daras, D. Tzovaras, and M. G. Strintzis. 2009. 3D object retrieval using the 3D shape impact descriptor. Pattern Recognition 42, 11 (2009), 2447--2459. Google ScholarDigital Library
- K. Makantasis, A. Doulamis, N. Doulamis, and M. Ioannides. 2016. In the wild image retrieval and clustering for 3D cultural heritage landmarks reconstruction. MTAP 75, 7 (2016), 3593--3629. Google ScholarDigital Library
- K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis. 2015. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In IEEE IGARSS. 4959--4962. Google ScholarCross Ref
- J. Martens and I. Sutskever. 2011. Learning recurrent neural networks with Hessian-free optimization. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1033--1040.Google Scholar
- H. P. Martínez and G. N. Yannakakis. 2014. Deep multimodal fusion: Combining discrete events and continuous signals. In Proceedings of the 16th International Conference on Multimodal Interaction. 34--41. Google ScholarDigital Library
- M. Mathieu, M. Henaff, and Y. LeCun. 2013. Fast training of convolutional networks through FFTs. CoRR abs/1312.5851 (2013).Google Scholar
- D. Maturana and S. Scherer. 2015. VoxNet: A 3D convolutional neural network for real-time object recognition. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 922--928. Google ScholarCross Ref
- W. McCulloch and W. Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 5, 4 (1943), 115--133. Google ScholarCross Ref
- A. Merentitis and C. Debes. 2015. Automatic fusion and classification using random forests and features extracted with deep learning. In International Geoscience and Remote Sensing Symposium. 2943--2946. Google ScholarCross Ref
- A. Mian, M. Bennamoun, and R. Owens. 2010. On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. IJCV 89, 2--3 (2010), 348--361.Google ScholarDigital Library
- K. Mikolajczyk and C. Schmid. 2005. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 10 (2005), 1615--1630. Google ScholarDigital Library
- K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. 2005. A comparison of affine region detectors. IJCV 65, 1 (2005), 43--72. Google ScholarDigital Library
- M. Muja and D. G. Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Application (VISSAPP’09). 331--340.Google Scholar
- V. Nair and G. E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 807--814.Google ScholarDigital Library
- A. Nguyen and B. Le. 2013. 3D point cloud segmentation: A survey. In Proceedings of the 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM). 225--230. Google ScholarCross Ref
- M. Niepert, M. Ahmed, and K. Kutzkov. 2016. Learning convolutional neural networks for graphs. CoRR abs/1605.05273 (2016).Google Scholar
- W. Ouyang, P. Luo, X. Zeng, S. Qiu, Y. Tian, H. Li, S. Yang, and others. 2014. DeepID-Net: Multi-stage and deformable deep convolutional neural networks for object detection. CoRR abs/1409.3505 (2014).Google Scholar
- J. Papon, A. Abramov, M. Schoeler, and F. Worgotter. 2013. Voxel cloud connectivity segmentation—Supervoxels for point clouds. In IEEE Conference on Computer Vision and Pattern Recognition. 2027--2034. Google ScholarDigital Library
- R. Pascanu, C. Gülçehre, K. Cho, and Y. Bengio. 2013a. How to construct deep recurrent neural networks. CoRR abs/1312.6026 (2013).Google Scholar
- R. Pascanu, T. Mikolov, and Y. Bengio. 2013b. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1310--1318.Google Scholar
- C. R. Qi, H. Su, M. Niessner, A. Dai, M. Yan, and L. J. Guibas. 2016. Volumetric and multi-view CNNs for object classification on 3D data. arXiv preprint arXiv:1604.03265v2 (2016).Google Scholar
- T. Rabbani, F. Van Den Heuvel, and G. Vosselmann. 2006. Segmentation of point clouds using smoothness constraint. ISPRS Archives 36, 5 (2006), 248--253.Google Scholar
- M. Ranzato, Y. Boureau, and Y. LeCun. 2008. Sparse feature learning for deep belief networks. In Advances in Neural Information Processing Systems 20. 1185--1192.Google Scholar
- S. Ren, K. He, R. Girshick, and J. Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.Google Scholar
- S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. 2011. Contracting auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th ICML. 833--840.Google Scholar
- A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. 2014. FitNets: Hints for thin deep nets. CoRR abs/1412.6550 (2014).Google Scholar
- F. Rosenblatt. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 6 (1958), 386--408. Google ScholarCross Ref
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams. 1986. Learning representations by back-propagating errors. Nature 323 (1986), 533--536. Google ScholarCross Ref
- R. B. Rusu, N. Blodow, Z. C. Marton, and M. Beetz. 2008. Aligning point cloud views using persistent feature histograms. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 3384--3391. Google ScholarCross Ref
- R. B. Rusu, G. Bradski, R. Thibaux, and J. Hsu. 2010. Fast 3D recognition and pose using the viewpoint feature histogram. In IEEE/RSJ International Conference on IROS. 2155--2162.Google Scholar
- R. B. Rusu and S. Cousins. 2011. 3D is here: Point Cloud Library (PCL). In IEEE International Conference on Robotics and Automation (ICRA’11). 1--4. Google ScholarCross Ref
- S. Salti, A. Petrelli, F. Tombari, and L. Di Stefano. 2012. On the affinity between 3D detectors and descriptors. In Proceedings of the 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission. 424--431.Google Scholar
- J. Sanchez-Riera, K.-L. Hua, Y.-S. Hsiao, T. Lim, S. C. Hidayati, and W.-H. Cheng. 2016. A comparative study of data fusion for RGB-D based visual recognition. Pattern Recognition Letters 73 (2016), 1--6. Google ScholarDigital Library
- D. Scherer, A. Muller, and S. Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In 20th ICANN. Vol. 6354. 92--101. Google ScholarCross Ref
- G. Schindler and F. Dellaert. 2012. 4D cities: Analyzing visualizing and interacting with historical urban photo collections. Journal of Multimedia (2012).Google Scholar
- C. Schmid, R. Mohr, and C. Bauckhage. 2000. Evaluation of interest point detectors. International Journal of Computer Vision 37, 2 (2000), 151--172. Google ScholarDigital Library
- J. Schmidhuber. 1992. Learning complex, extended sequences using the principle of history compression. Neural Computation 4, 2 (1992), 234--242. Google ScholarDigital Library
- J. Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85--117. Google ScholarDigital Library
- R. Schnabel, R. Wahl, and R. Klein. 2007. Efficient RANSAC for point-cloud shape detection. Computer Graphics Forum 26, 2 (2007), 214--226. Google ScholarCross Ref
- M. Schwarz, H. Schulz, and S. Behnke. 2015. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In IEEE ICRA. 1329--1335. Google ScholarCross Ref
- N. Sedaghat, M. Zolfaghari, and Th. Brox. 2016. Orientation-boosted voxel nets for 3D object recognition. CoRR abs/1604.03351 (2016).Google Scholar
- P. Shilane, P. Min, M. Kazhdan, and T. Funkhouser. 2004. The Princeton shape benchmark. In Shape Modeling International. Google ScholarCross Ref
- K. Siddiqi, J. Zhang, D. Macrini, A. Shokoufandeh, S. Bouix, and S. Dickinson. 2008. Retrieving articulated 3-D models using medial surfaces. Machine Vision and Applications 19, 4 (2008), 261--275. Google ScholarDigital Library
- N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. 2012. Indoor segmentation and support inference from RGBD images. In ECCV. Google ScholarDigital Library
- M.-C. Sima and A. Nuchter. 2013. An extension of the Felzenszwalb-Huttenlocher segmentation to 3D point clouds. 5th ICMV: Computer Vision, Image Analysis and Processing 8783 (2013).Google Scholar
- D. Smeets, Th. Fabry, J. Hermans, D. Vandermeulen, and P. Suetens. 2009. Isometric deformation modelling for object recognition. In Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns. 757--765. Google ScholarDigital Library
- R. Socher, B. Huval, B. Bhat, C. D. Manning, and A. Y. Ng. 2012. Convolutional-recursive deep learning for 3D object classification. In Advances in Neural Information Processing Systems 25. 656--664.Google Scholar
- S. Song and J. Xiao. 2014. Sliding shapes for 3D object detection in depth images. In Proceedings of the 13th European Conference on Computer Vision (ECCV’14). 634--651. Google ScholarCross Ref
- S. Song and J. Xiao. 2016. Deep sliding shapes for amodal 3D object detection in RGB-D images. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarCross Ref
- H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the International Conference on Computer Vision (ICCV’15). Google ScholarDigital Library
- I. Sutskever. 2012. Training Recurrent Neural Networks. Ph.D. dissertation. University of Toronto.Google ScholarDigital Library
- I. Sutskever, J. Martens, and G. E. Hinton. 2011. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1017--1024.Google Scholar
- C. Szegedy, S. Ioffe, and V. Vanhoucke. 2016. Inception-v4, inception-ResNet and the impact of residual connections on learning. CoRR abs/1602.07261 (2016).Google Scholar
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2014. Going deeper with convolutions. CoRR abs/1409.4842 (2014).Google Scholar
- H. Tabia, H. Laga, D. Picard, and P.-H. Gosselin. 2014. Covariance descriptors for 3D shape matching and retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 4185--4192. Google ScholarDigital Library
- J. Tang, S. Miller, A. Singh, and P. Abbeel. 2012. A textured object recognition pipeline for color and depth image data. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’12). Google ScholarCross Ref
- J. W. H. Tangelder and R. C. Veltkamp. 2007. A survey of content based 3D shape retrieval methods. Multimedia Tools and Applications 39, 3 (2007), 441.Google ScholarDigital Library
- L. Theis and M. Bethge. 2015. Generative image modeling using spatial LSTMs. In Advances in Neural Information Processing Systems 28.Google Scholar
- F. Tombari and L. Di Stefano. 2012. Hough voting for 3D object recognition under occlusion and clutter. IPSJ Transactions on Computer Vision and Applications 4 (2012), 20--29. Google ScholarCross Ref
- F. Tombari, S. Salti, and L. Di Stefano. 2010. Unique signatures of histograms for local surface description. In Proceedings of the 11th European Conference on Computer Vision: Part III (ECCV’10). 356--369. Google ScholarCross Ref
- F. Tombari, S. Salti, and L. Di Stefano. 2013. Performance evaluation of 3D keypoint detectors. International Journal of Computer Vision 102, 1--3 (2013), 198--220.Google ScholarDigital Library
- J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision (2013).Google ScholarDigital Library
- J. P. C. Valentin, S. Sengupta, J. Warrell, A. Shahrokni, and P. H. S. Torr. 2013. Mesh based semantic modelling for indoor and outdoor scenes. In IEEE CVPR. 2067--2074. Google ScholarDigital Library
- V. Vanhoucke, A. Senior, and M. Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.Google Scholar
- P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11 (2010), 3371--3408.Google ScholarDigital Library
- F. Visin, K. Kastner, K. Cho, M. Matteucci, A. C. Courville, and Y. Bengio. 2015. ReNet: A recurrent neural network based alternative to convolutional networks. CoRR abs/1505.00393 (2015).Google Scholar
- A.-V. Vo, L. Truong-Hong, D. F. Laefer, and M. Bertolotto. 2015. Octree-based region growing for point cloud segmentation. ISPRS Journal of Photogrammetry and Remote Sensing 104 (2015), 88--100. Google ScholarCross Ref
- L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. 2013. Regularization of neural networks using dropconnect. In Proceedings of the 30th ICML, Vol. 28. 1058--1066.Google Scholar
- F. Wang, L. Kang, and Y. Li. 2015. Sketch-based 3D shape retrieval using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. Google ScholarCross Ref
- W. Wang, L. Chen, Z. Liu, K. Kühnlenz, and D. Burschka. 2013. Textured/textureless object recognition and pose estimation using RGB-D image. Journal of Real-Time Image Processing 10, 4 (2013), 667--682. Google ScholarDigital Library
- Y. Wang, Z. Xie, K. Xu, Y. Dou, and Y. Lei. 2016. An efficient and effective convolutional auto-encoder extreme learning machine network for 3D feature learning. Neurocomputing 174 (2016), 988--998. Google ScholarDigital Library
- D. Weikersdorfer, D. Gossow, and M. Beetz. 2012. Depth-adaptive superpixels. In 21st International Conference on Pattern Recognition (ICPR’12). 2087--2090.Google Scholar
- P. J. Werbos. 1990. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78, 10 (1990), 1550--1560. Google ScholarCross Ref
- W. Wohlkinger and M. Vincze. 2011. Ensemble of shape functions for 3D object classification. In IEEE International Conference on Robotics and Biomimetics (ROBIO’11). 2987--2992. Google ScholarCross Ref
- H. Wu and X. Gu. 2015. Towards dropout training for convolutional neural networks. Neural Networks 71 (2015), 1--10. Google ScholarDigital Library
- Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 2015. 3D shapenets: A deep representation for volumetric shapes. In IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.Google Scholar
- J. Xie, Y. Fang, F. Zhu, and E. Wong. 2015a. DeepShape: Deep learned shape descriptor for 3D shape matching and retrieval. In Proceedings of the IEEE Conference on CVPR. 1275--1283.Google Scholar
- Z. Xie, K. Xu, W. Shan, L. Liu, Y. Xiong, and H. Huang. 2015b. Projective feature learning for 3D shapes with multi-view depth images. Computer Graphics Forum (Proceedings of Pacific Graphics 2015) 34, 6 (2015).Google Scholar
- B. Xu, N. Wang, T. Chen, and M. Li. 2015b. Empirical evaluation of rectified activations in convolutional network. CoRR abs/1505.00853 (2015).Google Scholar
- Q. Xu, S. Jiang, W. Huang, F. Ye, and S. Xu. 2015a. Feature fusion based image retrieval using deep learning. Journal of Information and Computational Science 12, 6 (2015), 2361--2373. Google ScholarCross Ref
- Z. Yan, H. Zhang, Y. Jia, Th. Breuel, and Y. Yu. 2016. Combining the best of convolutional layers and recurrent layers: A hybrid network for semantic segmentation. CoRR abs/1603.04871 (2016).Google Scholar
- J. Yue, S. Mao, and M. Li. 2016. A deep learning framework for hyperspectral image classification using spatial pyramid pooling. Remote Sensing Letters 7, 9 (2016), 875--884. Google ScholarCross Ref
- A. Zaharescu, E. Boyer, K. Varanasi, and R. Horaud. 2009. Surface feature detection and description with applications to mesh matching. In IEEE Conference on CVPR. 373--380. Google ScholarCross Ref
- H. F. M. Zaki, F. Shafait, and A. Mian. 2016. Convolutional hypercube pyramid for accurate RGB-D object category and instance recognition. In IEEE ICRA. 1685--1692. Google ScholarDigital Library
- D. Zarpalas, P. Daras, A. Axenopoulos, D. Tzovaras, and M. G. Strintzis. 2006. 3D model search and retrieval using the spherical trace transform. EURASIP Journal on Advances in Signal Processing 2007 (2006).Google Scholar
- M. D. Zeiler and R. Fergus. 2013. Stochastic pooling for regularization of deep convolutional neural networks. CoRR abs/1301.3557 (2013).Google Scholar
- A. Zelener. 2015. Survey of object classification in 3D range scans. (2015).Google Scholar
- L. Zhang, L. Zhang, and B. Du. 2016a. Deep learning for remote sensing data: A technical tutorial on the state of the art. IEEE Geoscience and Remote Sensing Magazine 4, 2 (2016), 22--40. Google ScholarCross Ref
- X. Zhang, H. Zhang, Y. Zhang, Y. Yang, M. Wang, H. Luan, J. Li, and T. S. Chua. 2016b. Deep fusion of multiple semantic cues for complex event recognition. IEEE TIP 25, 3 (2016), 1033--1046.Google Scholar
- X. Zhang, J. Zou, X. Ming, K. He, and J. Sun. 2014. Efficient and accurate approximations of nonlinear convolutional networks. CoRR abs/1411.4229 (2014).Google Scholar
- W. Zhao and S. Du. 2016. Learning multiscale and deep representations for classifying remotely sensed imagery. ISPRS Journal of Photogrammetry and Remote Sensing 113 (2016), 155--165. Google ScholarCross Ref
- Y. Zhong. 2009. Intrinsic shape signatures: A shape descriptor for 3D object recognition. In 12th IEEE International Conference on Computer Vision Workshops (ICCV Workshops). 689--696.Google ScholarCross Ref
- Y. Zhou and Y. Wei. 2016. Learning hierarchical spectral-spatial features for hyperspectral image classification. IEEE Transactions on Cybernetics 46, 7 (2016), 1667--1678. Google ScholarCross Ref
- Z. Zhu, X. Wang, S. Bai, C. Yao, and X. Bai. 2014. Deep learning representation using autoencoder for 3D shape retrieval. CoRR abs/1409.7164 (2014).Google Scholar
Index Terms
- Deep Learning Advances in Computer Vision with 3D Data: A Survey
Recommendations
Benchmarking deep learning techniques for face recognition
Highlights- Training networks for face recognition is very complex and time-consuming.
- ...
AbstractRecent progresses in Convolutional Neural Networks (CNNs) and GPUs have greatly advanced the state-of-the-art performance for face recognition. However, training CNNs for face recognition is complex and time-consuming. Multiple factors ...
Multi-view convolutional vision transformer for 3D object recognition
AbstractWith the rapid development of three-dimensional (3D) vision technology and the increasing application of 3D objects, there is an urgent need for 3D object recognition in the fields of computer vision, virtual reality, and artificial intelligence ...
Highlights- Proposing a new architecture for view-based 3D object recognition.
- Combining the respective advantages of convolutional neural network and transformer.
- Designing a multi-scale feature fusion module.
- Designing a masking ...
Deep imitation learning for 3D navigation tasks
Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in ...
Comments