Skip to main content

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12373))

Included in the following conference series:

Abstract

Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1\(^\mathbf{st}\) on the competitive SemanticKITTI leaderboard\(^\star \). It also achieves 8–23\(\times \) computation reduction and 3\(\times \) measured speedup over MinkowskiNet and KPConv with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.

H. Tang and Z. Liu—indicates equal contributions; order determined by a coin toss.

\(\star \) https://competitions.codalab.org/competitions/20331#results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Behley, J., et al.: SemanticKITTI: a dataset for semantic scene understanding of LiDAR sequences. In: ICCV (2019)

    Google Scholar 

  2. Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S.: Once for all: train one network and specialize it for efficient deployment. In: ICLR (2020)

    Google Scholar 

  3. Cai, H., et al.: AutoML for architecting efficient and specialized neural networks. IEEE Micro 40(1), 75–82 (2019)

    Google Scholar 

  4. Cai, H., Zhu, L., Han, S.: ProxylessNAS: direct neural architecture search on target task and hardware. In: ICLR (2019)

    Google Scholar 

  5. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv (2015)

    Google Scholar 

  6. Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convNets: minkowski convolutional neural networks. In: CVPR (2019)

    Google Scholar 

  7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  8. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. IJRR 32(11), 1231–1237 (2013)

    Google Scholar 

  9. Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: CVPR (2018)

    Google Scholar 

  10. Guo, Z., et al.: Single path one-shot neural architecture search with uniform sampling. In: ECCV (2020)

    Google Scholar 

  11. Han, L., Zheng, T., Xu, L., Fang, L.: OccuSeg: occupancy-aware 3D instance segmentation. In: CVPR (2020)

    Google Scholar 

  12. He, Y., Lin, J., Liu, Z., Wang, H., Li, L.J., Han, S.: AMC: autoML for model compression and acceleration on mobile devices. In: ECCV (2018)

    Google Scholar 

  13. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv (2017)

    Google Scholar 

  14. Jiang, L., Zhao, H., Shi, S., Liu, S., Fu, C.W., Jia, J.: PointGroup: dual-set point grouping for 3D instance segmentation. In: CVPR (2020)

    Google Scholar 

  15. Lahoud, J., Ghanem, B., Pollefeys, M., Oswald, M.R.: 3D instance segmentation via multi-task metric learning. In: ICCV (2019)

    Google Scholar 

  16. Lei, H., Akhtar, N., Mian, A.: Octree guided CNN with spherical kernels for 3D point clouds. In: CVPR (2019)

    Google Scholar 

  17. Li, M., Lin, J., Ding, Y., Liu, Z., Zhu, J.Y., Han, S.: GAN compression: efficient architectures for interactive conditional GANs. In: CVPR (2020)

    Google Scholar 

  18. Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: PointCNN: convolution on \(\cal{X}\)-transformed points. In: NeurIPS (2018)

    Google Scholar 

  19. Liu, C., et al.: Progressive neural architecture search. In: ECCV (2018)

    Google Scholar 

  20. Liu, H., Simonyan, K., Yang, Y.: DARTS: differentiable architecture search. In: ICLR (2019)

    Google Scholar 

  21. Liu, Z., et al.: MetaPruning: meta learning for automatic neural network channel pruning. In: ICCV (2019)

    Google Scholar 

  22. Liu, Z., Tang, H., Lin, Y., Han, S.: Point-voxel CNN for efficient 3D deep learning. In: NeurIPS (2019)

    Google Scholar 

  23. Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet v2: practical guidelines for efficient CNN architecture design. In: ECCV (2018)

    Google Scholar 

  24. Mao, J., Wang, X., Li, H.: Interpolated convolutional networks for 3D point cloud understanding. In: ICCV (2019)

    Google Scholar 

  25. Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IROS (2015)

    Google Scholar 

  26. Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2001)

    MathSciNet  MATH  Google Scholar 

  27. Qi, C.R., Chen, X., Litany, O., Guibas, L.J.: ImVoteNet: boosting 3D object detection in point clouds with image votes. In: CVPR (2020)

    Google Scholar 

  28. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: ICCV (2019)

    Google Scholar 

  29. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)

    Google Scholar 

  30. Qi, C.R., Liu, W., Wu, C., Su, H., Guibas, L.J.: Frustum pointNets for 3D object detection from RGB-D data. In: CVPR (2018)

    Google Scholar 

  31. Qi, C.R., Su, H., Niessner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: CVPR (2016)

    Google Scholar 

  32. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)

    Google Scholar 

  33. Radosavovic, I., Johnson, J., Xie, S., Lo, W.Y., Dollar, P.: On network design spaces for visual recognition. In: ICCV (2019)

    Google Scholar 

  34. Riegler, G., Ulusoy, A.O., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: CVPR (2017)

    Google Scholar 

  35. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)

    Google Scholar 

  36. Shi, S., et al.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: CVPR (2020)

    Google Scholar 

  37. Shi, S., Wang, X., Li, H.: PointRCNN: 3D object proposal generation and detection from point cloud. In: CVPR (2019)

    Google Scholar 

  38. Shi, S., Wang, Z., Shi, J., Wang, X., Li, H.: PV-RCNN: point-voxel feature set abstraction for 3D object detection. In: TPAMI (2020)

    Google Scholar 

  39. Stamoulis, D., et al.: Single-path NAS: designing hardware-efficient convNets in less than 4 hours. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds.) ECML PKDD 2019. LNCS (LNAI), vol. 11907, pp. 481–497. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46147-8_29

    Chapter  Google Scholar 

  40. Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. In: ACL (2019)

    Google Scholar 

  41. Su, H., et al.: SPLATNet: sparse lattice networks for point cloud processing. In: CVPR (2018)

    Google Scholar 

  42. Tan, M., et al.: MnasNet: platform-aware neural architecture search for mobile. In: CVPR (2019)

    Google Scholar 

  43. Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: ICML (2019)

    Google Scholar 

  44. Tatarchenko, M., Park, J., Koltun, V., Zhou, Q.Y.: Tangent convolutions for dense prediction in 3D. In: CVPR (2018)

    Google Scholar 

  45. Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV (2019)

    Google Scholar 

  46. Wang, H., et al.: HAT: hardware-aware transformers for efficient natural language processing. In: ACL (2020)

    Google Scholar 

  47. Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: HAQ: hardware-aware automated quantization with mixed precision. In: CVPR (2019)

    Google Scholar 

  48. Wang, K., Liu, Z., Lin, Y., Lin, J., Han, S.: Hardware-centric autoML for mixed-precision quantization. Int. J. Comput. Vis. 128(8), 2035–2048 (2020). https://doi.org/10.1007/s11263-020-01339-6

    Article  Google Scholar 

  49. Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. SIGGRAPH 36(4), 1–11 (2017)

    Google Scholar 

  50. Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: Adaptive O-CNN: a patch-based deep representation of 3D shapes. SIGGRAPH Asia 37(6), 1–11 (2018)

    Google Scholar 

  51. Wang, T., et al.: APQ: joint search for network architecture, pruning and quantization policy. In: CVPR (2020)

    Google Scholar 

  52. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. SIGGRAPH 38(5), 1–12 (2019)

    Google Scholar 

  53. Wang, Z., Lu, F.: VoxSegNet: volumetric CNNs for semantic part segmentation of 3D shapes. TVCG (2019)

    Google Scholar 

  54. Wu, B., et al.: Fbnet: hardware-aware efficient convnet design via differentiable neural architecture search. In: CVPR (2019)

    Google Scholar 

  55. Wu, W., Qi, Z., Fuxin, L.: PointConv: deep convolutional networks on 3D point clouds. In: CVPR (2019)

    Google Scholar 

  56. Xu, Y., Fan, T., Xu, M., Zeng, L., Qiao, Y.: SpiderCNN: deep learning on point sets with parameterized convolutional filters. In: ECCV (2018)

    Google Scholar 

  57. Yan, Y., Mao, Y., Li, B.: SECOND: sparsely embedded convolutional detection. Sensors 18(10), 3337 (2018)

    Google Scholar 

  58. Yang, B., et al.: Learning object bounding boxes for 3D instance segmentation on point clouds. In: NeurIPS (2019)

    Google Scholar 

  59. Yang, Z., Sun, Y., Liu, S., Shen, X., Jia, J.: STD: sparse-to-dense 3D object detector for point cloud. In: ICCV (2019)

    Google Scholar 

  60. Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: CVPR (2018)

    Google Scholar 

  61. Zhou, Y., Tuzel, O.: VoxelNet: end-to-end learning for point cloud based 3D object detection. In: CVPR (2018)

    Google Scholar 

  62. Zhu, Z., Liu, C., Yang, D., Yuille, A., Xu, D.: V-NAS: neural architecture search for volumetric medical image segmentation. In: 3DV (2019)

    Google Scholar 

  63. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: ICLR (2017)

    Google Scholar 

  64. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: CVPR (2018)

    Google Scholar 

Download references

Acknowledgement

We thank MIT Quest for Intelligence, MIT-IBM Watson AI Lab, Xilinx, Samsung for supporting this research. We also thank AWS Machine Learning Research Awards for providing the computational resource.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhijian Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2569 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tang, H. et al. (2020). Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12373. Springer, Cham. https://doi.org/10.1007/978-3-030-58604-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58604-1_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58603-4

  • Online ISBN: 978-3-030-58604-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics