ABSTRACT
Over the last five years Deep Neural Nets have offered more accurate solutions to many problems in speech recognition, and computer vision, and these solutions have surpassed a threshold of acceptability for many applications. As a result, Deep Neural Networks have supplanted other approaches to solving problems in these areas, and enabled many new applications. While the design of Deep Neural Nets is still something of an art form, in our work we have found basic principles of design space exploration used to develop embedded microprocessor architectures to be highly applicable to the design of Deep Neural Net architectures. In particular, we have used these design principles to create a novel Deep Neural Net called SqueezeNet that requires only 480KB of storage for its model parameters. We have further integrated all these experiences to develop something of a playbook for creating small Deep Neural Nets for embedded systems.
- Abadi, M. et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467. (2016).Google Scholar
- Anisimov, D. and Khanova, T. 2017. Towards lightweight convolutional neural networks for object detection. arXiv:1707.01395 [cs]. (Jul. 2017).Google Scholar
- Avary, M. 2017. Telecommunication Charges in Over-the-Air Updates (personal communication).Google Scholar
- Badrinarayanan, V. et al. 2015. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561. (2015).Google Scholar
- Buciluú, C. et al. 2006. Model Compression. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2006), 535--541. Google ScholarDigital Library
- Canziani, A. et al. 2016. An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. (2016).Google Scholar
- Chen, T. et al. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274. (2015).Google Scholar
- Deng, J. et al. 2009. Imagenet: A large-scale hierarchical image database. Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on (2009), 248--255.Google ScholarCross Ref
- Gatys, L.A. et al. 2015. A Neural Algorithm of Artistic Style. arXiv:1508.06576 [cs, q-bio]. (Aug. 2015).Google Scholar
- Goodfellow, I. et al. 2016. Deep learning. MIT press.Google Scholar
- Gries, M. 2004. Methods for evaluating and covering the design space during early design development. Integration, the VLSI journal. 38, 2 (2004), 131--183.Google Scholar
- Gries, M. and Keutzer, K. 2006. Building ASIPS: The MESCAL methodology. Springer Science & Business Media.Google Scholar
- Gschwend, D. 2016. Zynqnet: An fpga-accelerated embedded convolutional neural network. Master's thesis, Swiss Federal Institute of Technology Zurich (ETH-Zurich).Google Scholar
- Han, S. et al. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149. (2015).Google Scholar
- Han, S. et al. 2015. Learning Both Weights and Connections for Efficient Neural Networks. Proceedings of the 28th International Conference on Neural Information Processing Systems (Cambridge, MA, USA, 2015), 1135--1143.Google Scholar
- He, K. et al. 2016. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition (2016), 770--778.Google Scholar
- Hochreiter, S. and Schmidhuber, J. 1997. Long short-term memory. Neural computation. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
- Horowitz, M. 2014. 1.1 computing's energy problem (and what we can do about it). Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014 IEEE International (2014), 10--14.Google ScholarCross Ref
- How HBO's Silicon Valley built "Not Hotdog" with mobile TensorFlow, Keras & React Native: 2017. https://hackernoon.com/how-hbos-silicon-valley-built-not-hotdog-with-mobile-tensorflow-keras-react-native-ef03260747f3.Google Scholar
- Howard, A.G. et al. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. (2017).Google Scholar
- Howard, A.G. 2013. Some improvements on deep convolutional neural network based image classification. arXiv preprint arXiv:1312.5402. (2013).Google Scholar
- Huang, J. et al. 2016. Speed/accuracy trade-offs for modern convolutional object detectors. arXiv preprint arXiv:1611.10012. (2016).Google Scholar
- Iandola, F. 2016. Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale. arXiv preprint arXiv:1612.06519. (2016).Google Scholar
- Iandola, F.N. et al. 2016. Firecaffe: near-linear acceleration of deep neural network training on compute clusters. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), 2592--2600.Google Scholar
- Iandola, F.N. et al. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360. (2016).Google Scholar
- Ioannou, Y. et al. 2016. Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups. arXiv:1605.06489 [cs]. (May 2016).Google Scholar
- Jia, Y. et al. 2014. Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM international conference on Multimedia (2014), 675--678.Google ScholarDigital Library
- Krizhevsky, A. et al. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems (2012), 1097--1105.Google Scholar
- Lecun, Y. et al. 1990. Optimal brain damage. Advances in Neural Information Processing Systems (1990).Google Scholar
- Li, Q. et al. 2017. Mimicking Very Efficient Network for Object Detection. (2017), 6356--6364.Google Scholar
- Lightweight Neural Style on Pytorch: https://github.com/lizeng614/SqueezeNet-Neural-Style-Pytorch.Google Scholar
- Lin, M. et al. 2013. Network in network. arXiv preprint arXiv:1312.4400. (2013).Google Scholar
- Liu, W. et al. 2016. Ssd: Single shot multibox detector. European conference on computer vision (2016), 21--37.Google Scholar
- Nair, V. and Hinton, G.E. 2010. Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th international conference on machine learning (ICML-10) (2010), 807--814.Google ScholarDigital Library
- neural-style: https://github.com/jcjohnson/neural-style.Google Scholar
- Paszke, A. et al. 2016. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147. (2016).Google Scholar
- Pothos, V.K. et al. A fast, embedded implementation of a Convolutional Neural Network for Image Recognition.Google Scholar
- Rastegari, M. et al. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. Computer Vision - ECCV 2016 (Oct. 2016), 525--542.Google Scholar
- Redmon, J. et al. 2016. You Only Look Once: Unified, Real-Time Object Detection. (2016), 779--788.Google Scholar
- Ren, S. et al. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39, 6 (Jun. 2017), 1137--1149. Google ScholarDigital Library
- Schmidhuber, J. 1997. Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability. Neural Networks. 10, 5 (Jul. 1997), 857--873. Google ScholarDigital Library
- Schumacher, E.F. 2011. Small is beautiful: A study of economics as if people mattered. Random House.Google Scholar
- Shah, N. and Keutzer, K. 2002. Network processors: Origin of species. The 17th International Symposium of Computer and Information Science (2002).Google Scholar
- Simonyan, K. and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. (2014).Google Scholar
- Sun, D. et al. 2017. Enabling Embedded Inference Engine with ARM Compute Library: A Case Study. arXiv:1704.03751 [cs]. (Apr. 2017).Google Scholar
- Szegedy, C. et al. 2015. Going deeper with convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition (2015), 1--9.Google Scholar
- Szegedy, C. et al. 2016. Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), 2818--2826.Google Scholar
- Wu, B. et al. 2017. SqueezeDet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. CVPR Embedded Vision Workshop (2017).Google Scholar
- Wu, C. et al. 2017. A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation. arXiv preprint arXiv:1703.04071. (2017).Google Scholar
- Xiao, X. et al. 2017. Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition. arXiv preprint arXiv:1702.07975. (2017).Google Scholar
- Xie, S. et al. 2016. Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431. (2016).Google Scholar
- Xu, B. 2016. Deep Convolutional Networks for Image Classification. University of Alberta.Google Scholar
- Zhang, X. et al. 2017. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv:1707.01083 [cs]. (Jul. 2017).Google Scholar
Index Terms
- Small neural nets are beautiful: enabling embedded systems with small deep-neural-network architectures
Recommendations
Deep convolutional player modeling on log and level data
FDG '17: Proceedings of the 12th International Conference on the Foundations of Digital GamesWe present a novel approach to player modeling based on a convolutional neural net trained on game event logs. We test our approach and a hybrid extension over two distinct games, a clone of Super Mario Bros. and Gwario, a human computation version of ...
Ensemble of fine‐tuned convolutional neural networks for urine sediment microscopic image classification
In this study, an ensemble of fine‐tuned convolutional neural networks (CNNs) is proposed. As CNN training requires large annotated data, which are lacking in the field of urine sediment microscopic image processing, the authors first pre‐trained the CNNs,...
Towards dropout training for convolutional neural networks
Recently, dropout has seen increasing use in deep learning. For deep convolutional neural networks, dropout is known to work well in fully-connected layers. However, its effect in convolutional and pooling layers is still not clear. This paper ...
Comments