Curriculum-NAS: Curriculum Weight-Sharing Neural Architecture Search

Authors:
Yuwei Zhou

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Xin Wang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Hong Chen

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Xuguang Duan

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Chaoyu Guan

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Wenwu Zhu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022Pages 6792–6801https://doi.org/10.1145/3503161.3548271

Published:10 October 2022Publication History

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 6792–6801

ABSTRACT

Neural Architecture Search (NAS) is an effective way to automatically design neural architectures for various multimedia applications. Weight-sharing, as one of the most popular NAS strategies, has been widely adopted due to its search efficiency. Existing weight-sharing NAS methods overlook the influence of data distribution and treat each data sample equally. Contrastively, in this paper, we empirically discover that different data samples have different influences on architectures, e.g., some data samples are easy to fit by certain architectures but hard by others. Hence, there exist architectures with better performances on early data samples being more likely to be discovered in the whole NAS searching process, which leads to a suboptimal searching result. To tackle this problem, we propose Curriculum-NAS, a curriculum training framework on weight-sharing NAS, which dynamically changes the training data weights during the searching process. In particular, Curriculum-NAS utilizes the multiple subnets included in weight-sharing NAS to jointly assess data uncertainty, which serves as the difficulty criterion in a curriculum manner, so that the potentially optimal architectures can obtain higher probability of being fully trained and discovered. Extensive experiments on several image and text datasets demonstrate that our Curriculum-NAS can bring consistent improvement over existing weight-sharing NAS. The code is available online at https://github.com/zhouyw16/curriculum-nas.

Supplemental Material

Available for Download

mp4

MM22-fp2228.mp4 (15.2 MB)

References

Gabriel Bender, Pieter-Jan Kindermans, Barret Zoph, Vijay Vasudevan, and Quoc Le. 2018. Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning. PMLR, 550--559.Google Scholar
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. 41--48.Google ScholarDigital Library
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of machine learning research, Vol. 13, 2 (2012).Google Scholar
Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2017. Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344 (2017).Google Scholar
Han Cai, Ligeng Zhu, and Song Han. 2018. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332 (2018).Google Scholar
Thibault Castells, Philippe Weinzaepfel, and Jerome Revaud. 2020. SuperLoss: A Generic Loss for Robust Curriculum Learning. Advances in Neural Information Processing Systems, Vol. 33 (2020).Google Scholar
Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, and Jon Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. Advances in neural information processing systems, Vol. 31 (2018).Google Scholar
Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. 2019a. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1294--1303.Google ScholarCross Ref
Yukang Chen, Tong Yang, Xiangyu Zhang, Gaofeng Meng, Xinyu Xiao, and Jian Sun. 2019b. Detnas: Backbone search for object detection. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google Scholar
An-Chieh Cheng, Chieh Hubert Lin, Da-Cheng Juan, Wei Wei, and Min Sun. 2020. Instanas: Instance-aware neural architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3577--3584.Google ScholarCross Ref
Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. 2017. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819 (2017).Google Scholar
Xiangxiang Chu, Bo Zhang, and Ruijun Xu. 2021. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12239--12248.Google ScholarCross Ref
Xiangxiang Chu, Tianbao Zhou, Bo Zhang, and Jixiang Li. 2020. Fair darts: Eliminating unfair advantages in differentiable architecture search. In European conference on computer vision. Springer, 465--480.Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.Google ScholarCross Ref
Xuanyi Dong, Lu Liu, Katarzyna Musial, and Bogdan Gabrys. 2021. Nats-bench: Benchmarking nas algorithms for architecture topology and size. IEEE transactions on pattern analysis and machine intelligence (2021).Google Scholar
Xuanyi Dong and Yi Yang. 2019a. One-shot neural architecture search via self-evaluated template network. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3681--3690.Google ScholarCross Ref
Xuanyi Dong and Yi Yang. 2019b. Searching for a robust neural architecture in four gpu hours. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1761--1770.Google ScholarCross Ref
Xuanyi Dong and Yi Yang. 2020. Nas-bench-201: Extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326 (2020).Google Scholar
Jiemin Fang, Yuzhu Sun, Qian Zhang, Yuan Li, Wenyu Liu, and Xinggang Wang. 2020. Densely connected search space for more flexible neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10628--10637.Google ScholarCross Ref
Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. 2019. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7036--7045.Google ScholarCross Ref
Alex Graves, Marc G Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu. 2017. Automated curriculum learning for neural networks. In international conference on machine learning. PMLR, 1311--1320.Google Scholar
Yong Guo, Yaofo Chen, Yin Zheng, Peilin Zhao, Jian Chen, Junzhou Huang, and Mingkui Tan. 2020a. Breaking the curse of space explosion: Towards efficient nas with curriculum search. In International Conference on Machine Learning. PMLR, 3822--3831.Google Scholar
Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun. 2020b. Single path one-shot neural architecture search with uniform sampling. In European Conference on Computer Vision. Springer, 544--560.Google ScholarDigital Library
Guy Hacohen and Daphna Weinshall. 2019. On the power of curriculum learning in training deep networks. In International Conference on Machine Learning. PMLR, 2535--2544.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4700--4708.Google ScholarCross Ref
Kirthevasan Kandasamy, Willie Neiswanger, Jeff Schneider, Barnabas Poczos, and Eric Xing. 2018. Neural architecture search with bayesian optimisation and optimal transport. arXiv preprint arXiv:1802.07191 (2018).Google Scholar
Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7482--7491.Google Scholar
Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
M Pawan Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-Paced Learning for Latent Variable Models. In NIPS, Vol. 1. 2.Google Scholar
Guohao Li, Guocheng Qian, Itzel C Delgadillo, Matthias Muller, Ali Thabet, and Bernard Ghanem. 2020. Sgas: Sequential greedy architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1620--1630.Google ScholarCross Ref
Liam Li and Ameet Talwalkar. 2020. Random search and reproducibility for neural architecture search. In Uncertainty in artificial intelligence. PMLR, 367--377.Google Scholar
Dongze Lian, Yin Zheng, Yintao Xu, Yanxiong Lu, Leyu Lin, Peilin Zhao, Junzhou Huang, and Shenghua Gao. 2019. Towards fast adaptation of neural architectures with meta learning. In International Conference on Learning Representations.Google Scholar
Chenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan L Yuille, and Li Fei-Fei. 2019. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 82--92.Google ScholarCross Ref
Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, and Kevin Murphy. 2018b. Progressive neural architecture search. In Proceedings of the European conference on computer vision (ECCV). 19--34.Google ScholarDigital Library
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018a. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).Google Scholar
Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. 2018. Neural architecture optimization. Advances in neural information processing systems, Vol. 31 (2018).Google Scholar
Mary Ann Marcinkiewicz. 1994. Building a large annotated corpus of English: The Penn Treebank. Using Large Corpora (1994), 273.Google Scholar
Renato Negrinho and Geoff Gordon. 2017. Deeparchitect: Automatically designing and training deep architectures. arXiv preprint arXiv:1704.08792 (2017).Google Scholar
Vladimir Nekrasov, Hao Chen, Chunhua Shen, and Ian Reid. 2019. Fast neural architecture search of compact semantic segmentation models via auxiliary cells. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9126--9135.Google ScholarCross Ref
Junran Peng, Ming Sun, ZHAO-XIANG ZHANG, Tieniu Tan, and Junjie Yan. 2019. Efficient neural architecture transformation search in channel-level for object detection. Advances in Neural Information Processing Systems, Vol. 32 (2019).Google Scholar
Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient neural architecture search via parameters sharing. In International Conference on Machine Learning. PMLR, 4095--4104.Google Scholar
Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, Vol. 33. 4780--4789.Google ScholarDigital Library
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510--4520.Google ScholarCross Ref
Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. 2016. Training region-based object detectors with online hard example mining. In Proceedings of the IEEE conference on computer vision and pattern recognition. 761--769.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
Samarth Sinha, Animesh Garg, and Hugo Larochelle. 2020. Curriculum By Smoothing. Advances in Neural Information Processing Systems, Vol. 33 (2020).Google Scholar
David So, Quoc Le, and Chen Liang. 2019. The evolved transformer. In International Conference on Machine Learning. PMLR, 5877--5886.Google Scholar
Valentin I Spitkovsky, Hiyan Alshawi, and Dan Jurafsky. 2010. From baby steps to leapfrog: How "less is more" in unsupervised dependency parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 751--759.Google Scholar
Felipe Petroski Such, Aditya Rawal, Joel Lehman, Kenneth Stanley, and Jeffrey Clune. 2020. Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data. In International Conference on Machine Learning. PMLR, 9206--9216.Google Scholar
Xin Wang, Yudong Chen, and Wenwu Zhu. 2021. A Survey on Curriculum Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).Google ScholarCross Ref
Yujing Wang, Yaming Yang, Yiren Chen, Jing Bai, Ce Zhang, Guinan Su, Xiaoyu Kou, Yunhai Tong, Mao Yang, and Lidong Zhou. 2020. Textnas: A neural architecture search space tailored for text representation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9242--9249.Google ScholarCross Ref
Daphna Weinshall, Gad Cohen, and Dan Amir. 2018. Curriculum learning by transfer learning: Theory and experiments with deep networks. In International Conference on Machine Learning. PMLR, 5238--5246.Google Scholar
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3 (1992), 229--256.Google ScholarDigital Library
Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. 2018. SNAS: stochastic neural architecture search. arXiv preprint arXiv:1812.09926 (2018).Google Scholar
Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. 2019. PC-DARTS: Partial channel connections for memory-efficient architecture search. arXiv preprint arXiv:1907.05737 (2019).Google Scholar
Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor Darrell. 2018. Deep layer aggregation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2403--2412.Google ScholarCross Ref
Yiheng Zhang, Zhaofan Qiu, Jingen Liu, Ting Yao, Dong Liu, and Tao Mei. 2019. Customizable architecture search for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11641--11650.Google ScholarCross Ref
Mingjun Zhao, Haijiang Wu, Di Niu, and Xiaoli Wang. 2020. Reinforced Curriculum Learning on Pre-Trained Neural Machine Translation Models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9652--9659.Google ScholarCross Ref
Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. 2018. Practical block-wise neural network architecture generation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2423--2432.Google ScholarCross Ref
Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).Google Scholar
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8697--8710.Google ScholarCross Ref

Index Terms

Curriculum-NAS: Curriculum Weight-Sharing Neural Architecture Search
1. Computing methodologies
  1. Artificial intelligence

Recommendations

A Training-free Genetic Neural Architecture Search
ACM ICEA '21: Proceedings of the 2021 ACM International Conference on Intelligent Computing and its Emerging Applications

The so-called neural architecture search (NAS) provides an alternative way to construct a "good neural architecture," which would normally outperform hand-made architectures, for solving complex problems without domain knowledge. However, a critical ...
Read More
Auto-Keras: An Efficient Neural Architecture Search System
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms, e.g., NASNet, PNAS, usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a ...
Read More
Graph neural architecture prediction
Abstract
Graph neural networks (GNNs) have shown their superiority in the modeling of graph data. Recently, increasing attention has been paid to automatic graph neural architecture search, aiming to overcome the shortcomings of manually constructing GNN ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK
Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
curriculum learning
data uncertainty
neural architecture search
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 341
  Total Downloads
- Downloads (Last 12 months)219
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Curriculum-NAS: Curriculum Weight-Sharing Neural Architecture Search

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

A Training-free Genetic Neural Architecture Search

Auto-Keras: An Efficient Neural Architecture Search System

Graph neural architecture prediction