ABSTRACT
Neural Architecture Search (NAS) is an effective way to automatically design neural architectures for various multimedia applications. Weight-sharing, as one of the most popular NAS strategies, has been widely adopted due to its search efficiency. Existing weight-sharing NAS methods overlook the influence of data distribution and treat each data sample equally. Contrastively, in this paper, we empirically discover that different data samples have different influences on architectures, e.g., some data samples are easy to fit by certain architectures but hard by others. Hence, there exist architectures with better performances on early data samples being more likely to be discovered in the whole NAS searching process, which leads to a suboptimal searching result. To tackle this problem, we propose Curriculum-NAS, a curriculum training framework on weight-sharing NAS, which dynamically changes the training data weights during the searching process. In particular, Curriculum-NAS utilizes the multiple subnets included in weight-sharing NAS to jointly assess data uncertainty, which serves as the difficulty criterion in a curriculum manner, so that the potentially optimal architectures can obtain higher probability of being fully trained and discovered. Extensive experiments on several image and text datasets demonstrate that our Curriculum-NAS can bring consistent improvement over existing weight-sharing NAS. The code is available online at https://github.com/zhouyw16/curriculum-nas.
Supplemental Material
Available for Download
- Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. 2018. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning. PMLR, 550--559.Google Scholar
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. 41--48.Google ScholarDigital Library
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of machine learning research, Vol. 13, 2 (2012).Google Scholar
- Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2017. Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344 (2017).Google Scholar
- Han Cai, Ligeng Zhu, and Song Han. 2018. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332 (2018).Google Scholar
- Thibault Castells, Philippe Weinzaepfel, and Jerome Revaud. 2020. SuperLoss: A Generic Loss for Robust Curriculum Learning. Advances in Neural Information Processing Systems, Vol. 33 (2020).Google Scholar
- Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, and Jon Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. Advances in neural information processing systems, Vol. 31 (2018).Google Scholar
- Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. 2019a. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1294--1303.Google ScholarCross Ref
- Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao, and Jian Sun. 2019b. Detnas: Backbone search for object detection. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google Scholar
- An-Chieh Cheng, Chieh Hubert Lin, Da-Cheng Juan, Wei Wei, and Min Sun. 2020. Instanas: Instance-aware neural architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3577--3584.Google ScholarCross Ref
- Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. 2017. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819 (2017).Google Scholar
- Xiangxiang Chu, Bo Zhang, and Ruijun Xu. 2021. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12239--12248.Google ScholarCross Ref
- Xiangxiang Chu, Tianbao Zhou, Bo Zhang, and Jixiang Li. 2020. Fair darts: Eliminating unfair advantages in differentiable architecture search. In European conference on computer vision. Springer, 465--480.Google ScholarDigital Library
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google ScholarCross Ref
- Xuanyi Dong, Lu Liu, Katarzyna Musial, and Bogdan Gabrys. 2021. Nats-bench: Benchmarking nas algorithms for architecture topology and size. IEEE transactions on pattern analysis and machine intelligence (2021).Google Scholar
- Xuanyi Dong and Yi Yang. 2019a. One-shot neural architecture search via self-evaluated template network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3681--3690.Google ScholarCross Ref
- Xuanyi Dong and Yi Yang. 2019b. Searching for a robust neural architecture in four gpu hours. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1761--1770.Google ScholarCross Ref
- Xuanyi Dong and Yi Yang. 2020. Nas-bench-201: Extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326 (2020).Google Scholar
- Jiemin Fang, Yuzhu Sun, Qian Zhang, Yuan Li, Wenyu Liu, and Xinggang Wang. 2020. Densely connected search space for more flexible neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10628--10637.Google ScholarCross Ref
- Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. 2019. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7036--7045.Google ScholarCross Ref
- Alex Graves, Marc G Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu. 2017. Automated curriculum learning for neural networks. In international conference on machine learning. PMLR, 1311--1320.Google Scholar
- Yong Guo, Yaofo Chen, Yin Zheng, Peilin Zhao, Jian Chen, Junzhou Huang, and Mingkui Tan. 2020a. Breaking the curse of space explosion: Towards efficient nas with curriculum search. In International Conference on Machine Learning. PMLR, 3822--3831.Google Scholar
- Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020b. Single path one-shot neural architecture search with uniform sampling. In European Conference on Computer Vision. Springer, 544--560.Google ScholarDigital Library
- Guy Hacohen and Daphna Weinshall. 2019. On the power of curriculum learning in training deep networks. In International Conference on Machine Learning. PMLR, 2535--2544.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.Google ScholarCross Ref
- Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric Xing. 2018. Neural architecture search with bayesian optimisation and optimal transport. arXiv preprint arXiv:1802.07191 (2018).Google Scholar
- Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7482--7491.Google Scholar
- Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- M Pawan Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-Paced Learning for Latent Variable Models. In NIPS, Vol. 1. 2.Google Scholar
- Guohao Li, Guocheng Qian, Itzel C Delgadillo, Matthias Muller, Ali Thabet, and Bernard Ghanem. 2020. Sgas: Sequential greedy architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1620--1630.Google ScholarCross Ref
- Liam Li and Ameet Talwalkar. 2020. Random search and reproducibility for neural architecture search. In Uncertainty in artificial intelligence. PMLR, 367--377.Google Scholar
- Dongze Lian, Yin Zheng, Yintao Xu, Yanxiong Lu, Leyu Lin, Peilin Zhao, Junzhou Huang, and Shenghua Gao. 2019. Towards fast adaptation of neural architectures with meta learning. In International Conference on Learning Representations.Google Scholar
- Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan L Yuille, and Li Fei-Fei. 2019. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 82--92.Google ScholarCross Ref
- Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018b. Progressive neural architecture search. In Proceedings of the European conference on computer vision (ECCV). 19--34.Google ScholarDigital Library
- Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018a. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).Google Scholar
- Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2018. Neural architecture optimization. Advances in neural information processing systems, Vol. 31 (2018).Google Scholar
- Mary Ann Marcinkiewicz. 1994. Building a large annotated corpus of English: The Penn Treebank. Using Large Corpora (1994), 273.Google Scholar
- Renato Negrinho and Geoff Gordon. 2017. Deeparchitect: Automatically designing and training deep architectures. arXiv preprint arXiv:1704.08792 (2017).Google Scholar
- Vladimir Nekrasov, Hao Chen, Chunhua Shen, and Ian Reid. 2019. Fast neural architecture search of compact semantic segmentation models via auxiliary cells. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9126--9135.Google ScholarCross Ref
- Junran Peng, Ming Sun, ZHAO-XIANG ZHANG, Tieniu Tan, and Junjie Yan. 2019. Efficient neural architecture transformation search in channel-level for object detection. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google Scholar
- Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient neural architecture search via parameters sharing. In International Conference on Machine Learning. PMLR, 4095--4104.Google Scholar
- Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, Vol. 33. 4780--4789.Google ScholarDigital Library
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.Google ScholarCross Ref
- Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. 2016. Training region-based object detectors with online hard example mining. In Proceedings of the IEEE conference on computer vision and pattern recognition. 761--769.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Samarth Sinha, Animesh Garg, and Hugo Larochelle. 2020. Curriculum By Smoothing. Advances in Neural Information Processing Systems, Vol. 33 (2020).Google Scholar
- David So, Quoc Le, and Chen Liang. 2019. The evolved transformer. In International Conference on Machine Learning. PMLR, 5877--5886.Google Scholar
- Valentin I Spitkovsky, Hiyan Alshawi, and Dan Jurafsky. 2010. From baby steps to leapfrog: How "less is more" in unsupervised dependency parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 751--759.Google Scholar
- Felipe Petroski Such, Aditya Rawal, Joel Lehman, Kenneth Stanley, and Jeffrey Clune. 2020. Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data. In International Conference on Machine Learning. PMLR, 9206--9216.Google Scholar
- Xin Wang, Yudong Chen, and Wenwu Zhu. 2021. A Survey on Curriculum Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google ScholarCross Ref
- Yujing Wang, Yaming Yang, Yiren Chen, Jing Bai, Ce Zhang, Guinan Su, Xiaoyu Kou, Yunhai Tong, Mao Yang, and Lidong Zhou. 2020. Textnas: A neural architecture search space tailored for text representation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9242--9249.Google ScholarCross Ref
- Daphna Weinshall, Gad Cohen, and Dan Amir. 2018. Curriculum learning by transfer learning: Theory and experiments with deep networks. In International Conference on Machine Learning. PMLR, 5238--5246.Google Scholar
- Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3 (1992), 229--256.Google ScholarDigital Library
- Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. 2018. SNAS: stochastic neural architecture search. arXiv preprint arXiv:1812.09926 (2018).Google Scholar
- Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. 2019. PC-DARTS: Partial channel connections for memory-efficient architecture search. arXiv preprint arXiv:1907.05737 (2019).Google Scholar
- Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor Darrell. 2018. Deep layer aggregation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2403--2412.Google ScholarCross Ref
- Yiheng Zhang, Zhaofan Qiu, Jingen Liu, Ting Yao, Dong Liu, and Tao Mei. 2019. Customizable architecture search for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11641--11650.Google ScholarCross Ref
- Mingjun Zhao, Haijiang Wu, Di Niu, and Xiaoli Wang. 2020. Reinforced Curriculum Learning on Pre-Trained Neural Machine Translation Models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9652--9659.Google ScholarCross Ref
- Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. 2018. Practical block-wise neural network architecture generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2423--2432.Google ScholarCross Ref
- Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).Google Scholar
- Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8697--8710.Google ScholarCross Ref
Index Terms
- Curriculum-NAS: Curriculum Weight-Sharing Neural Architecture Search
Recommendations
A Training-free Genetic Neural Architecture Search
ACM ICEA '21: Proceedings of the 2021 ACM International Conference on Intelligent Computing and its Emerging ApplicationsThe so-called neural architecture search (NAS) provides an alternative way to construct a "good neural architecture," which would normally outperform hand-made architectures, for solving complex problems without domain knowledge. However, a critical ...
Auto-Keras: An Efficient Neural Architecture Search System
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningNeural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a ...
Graph neural architecture prediction
AbstractGraph neural networks (GNNs) have shown their superiority in the modeling of graph data. Recently, increasing attention has been paid to automatic graph neural architecture search, aiming to overcome the shortcomings of manually constructing GNN ...
Comments