skip to main content

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions

Published:24 May 2021Publication History
Skip Abstract Section


Deep learning has made substantial breakthroughs in many fields due to its powerful automatic representation capabilities. It has been proven that neural architecture design is crucial to the feature representation of data and the final performance. However, the design of the neural architecture heavily relies on the researchers’ prior knowledge and experience. And due to the limitations of humans’ inherent knowledge, it is difficult for people to jump out of their original thinking paradigm and design an optimal model. Therefore, an intuitive idea would be to reduce human intervention as much as possible and let the algorithm automatically design the neural architecture. Neural Architecture Search (NAS) is just such a revolutionary algorithm, and the related research work is complicated and rich. Therefore, a comprehensive and systematic survey on the NAS is essential. Previously related surveys have begun to classify existing work mainly based on the key components of NAS: search space, search strategy, and evaluation strategy. While this classification method is more intuitive, it is difficult for readers to grasp the challenges and the landmark work involved. Therefore, in this survey, we provide a new perspective: beginning with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then providing solutions for subsequent related research work. In addition, we conduct a detailed and comprehensive analysis, comparison, and summary of these works. Finally, we provide some possible future research directions.


  1. S. Hochreiter and J. Schmidhuber. 1997. Lonort-term memory. Neural Computation 9, 8 (1997), 1735–1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. X. Chen, O. Firat, A. Bapna, M. Johnson, W. Macherey, G. Foster, L. Jones, N. Parmar, M. Schuster, Z. Chen, Y. Wu, and M. Hughes. 2018. The best of both worlds: Combining recent advances in neural machine translation. arXiv:1804.09849. Retrieved from ScholarGoogle Scholar
  3. Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. ScholarGoogle Scholar
  4. K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. ICLR.Google ScholarGoogle Scholar
  5. M. Suganuma, S. Shirakawa, and T. Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference. 497–504.Google ScholarGoogle Scholar
  6. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. ScholarGoogle Scholar
  7. A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.Google ScholarGoogle Scholar
  8. S. Ren, K. He, R. Girshick, and J. Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91–99.Google ScholarGoogle Scholar
  9. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg. 2016. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, Cham., 21–37.Google ScholarGoogle Scholar
  10. T. Y. Lin, P. Goyal, R. Girshick, K. He, and Dollár, P. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. 2980–2988.Google ScholarGoogle Scholar
  11. B. Zoph and Q. V. Le. 2017. Neural architecture search with reinforcement learning. ICLR.Google ScholarGoogle Scholar
  12. B. Baker, O. Gupta, N. Naik, and R. Raskar. 2016. Designing neural network architectures using reinforcement learning. arXiv:1611.02167.Google ScholarGoogle Scholar
  13. N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE, Vol. 1, 886–893.Google ScholarGoogle Scholar
  14. D. G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision. IEEE, Vol. 2, 1150–1157.Google ScholarGoogle ScholarCross RefCross Ref
  15. E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, ... and A. Kurakin. 2017. Large-scale evolution of image classifiers. In Proceedings of the 34th International Conference on Machine Learning, Volume 70, 2902–2911. JMLR. org.Google ScholarGoogle Scholar
  16. L. Xie and A. Yuille. 2017. Genetic CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1379–1388.Google ScholarGoogle Scholar
  17. H. Liu, K. Simonyan, and Y. Yang. 2018. Darts: Differentiable architecture search. arXiv:1806.09055.Google ScholarGoogle Scholar
  18. Y. Shu, W. Wang, and S. Cai. 2019. Understanding architectures learnt by cell-based neural architecture search. arXiv:1909.09569.Google ScholarGoogle Scholar
  19. H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean. 2018. Efficient neural architecture search via parameter sharing. arXiv:1802.03268.Google ScholarGoogle Scholar
  20. B. Baker, O. Gupta, R. Raskar, and N. Naik. 2017. Accelerating neural architecture search using performance prediction. arXiv:1705.10823.Google ScholarGoogle Scholar
  21. C. Li, J. Peng, L. Yuan, G. Wang, X. Liang, L. Lin, and X. Chang. 2020. Block-wisely supervised neural architecture search with knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1989–1998.Google ScholarGoogle Scholar
  22. G. Bender, P. Kindermans, B. Zoph, V. Vasudevan, and Q. Le. 2018. Understanding and simplifying one-shot architecture search. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 80:550–559.Google ScholarGoogle Scholar
  23. A. Brock, T. Lim, J. M. Ritchie, and N. Weston. 2017. Smash: One-shot model architecture search through hypernetworks. arXiv:1708.05344.Google ScholarGoogle Scholar
  24. C. Sciuto, K. Yu, M. Jaggi, C. Musat, and M. Salzmann. 2019. Evaluating the search phase of neural architecture search. arXiv:1902.08142.Google ScholarGoogle Scholar
  25. M. Zhang, H. Li, S. Pan, X. Chang, and S. Su. 2020 Overcoming multi-model forgetting in one-shot NAS with diversity maximization. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  26. X. Cheng, Y. Zhong, M. Harandi, Y. Dai, X. Chang, H. Li, T. Drummond, and Z. Ge. 2020. Hierarchical neural architecture search for deep stereo matching. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  27. T. Elsken, J. H. Metzen, and F. Hutter. 2018. Neural architecture search: A survey. arXiv:1808.05377.Google ScholarGoogle Scholar
  28. M. Wistuba, A. Rawat, and T. Pedapati. 2019. A survey on neural architecture search. arXiv:1905.01392.Google ScholarGoogle Scholar
  29. M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le. 2019. MnasNet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2820–2828.Google ScholarGoogle Scholar
  30. K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarGoogle Scholar
  31. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, ... and A. Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarGoogle Scholar
  32. B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8697–8710.Google ScholarGoogle Scholar
  33. Z. Zhong, J. Yan, W. Wu, J. Shao, and C. L. Liu. 2018. Practical block-wise neural network architecture generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2423–2432.Google ScholarGoogle Scholar
  34. H. Liu, K. Simonyan, O. Vinyals, C. Fernando, and K. Kavukcuoglu. 2017. Hierarchical representations for efficient architecture search. arXiv:1711.00436.Google ScholarGoogle Scholar
  35. J. D. Dong, A. C. Cheng, D. C. Juan, W. Wei, and M. Sun. 2018. DPP-Net: Device-aware progressive search for Pareto-optimal neural architectures. In Proceedings of the European Conference on Computer Vision (ECCV’18). 517–531.Google ScholarGoogle Scholar
  36. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700–4708.Google ScholarGoogle Scholar
  37. C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L. J. Li, ... and K. Murphy. 2018. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV’18). 19–34.Google ScholarGoogle Scholar
  38. T. Saikia, Y. Marrakchi, A. Zela, F. Hutter, and T. Brox. 2019. AutoDispNet: Improving disparity estimation with AutoML. In Proceedings of the IEEE International Conference on Computer Vision. 1812–1823.Google ScholarGoogle Scholar
  39. J. Cui, P. Chen, R. Li, S. Liu, X. Shen, and J. Jia. 2019. Fast and practical neural architecture search. In Proceedings of the IEEE International Conference on Computer Vision. 6509–6518.Google ScholarGoogle Scholar
  40. Y. Xiong, R. Mehta, and V. Singh. 2019. Resource constrained neural network architecture search: Will a submodularity assumption help? In Proceedings of the IEEE International Conference on Computer Vision. 1901–1910.Google ScholarGoogle Scholar
  41. M. S. Ryoo, A. J. Piergiovanni, M. Tan, and A. Angelova. 2019. AssembleNet: Searching for multi-stream neural connectivity in video architectures. arXiv:1905.13209.Google ScholarGoogle Scholar
  42. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, ... and Q. V. Le. 2012. Large scale distributed deep networks. In Advances in Neural Information Processing Systems. 1223–1231.Google ScholarGoogle Scholar
  43. E. Real, A. Aggarwal, Y. Huang, and Q. V. Le. 2019. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 4780–4789.Google ScholarGoogle Scholar
  44. X. Chen, L. Xie, J. Wu, and Q. Tian. 2019. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In Proceedings of the IEEE International Conference on Computer Vision. 1294–1303.Google ScholarGoogle Scholar
  45. A. J. Piergiovanni, A. Angelova, A. Toshev, and M. S. Ryoo. 2019. Evolving space-time neural architectures for videos. In Proceedings of the IEEE International Conference on Computer Vision. 1793–1802.Google ScholarGoogle Scholar
  46. S. Xie, H. Zheng, C. Liu, and L. Lin. 2018. SNAS: Stochastic neural architecture search. arXiv:1812.09926.Google ScholarGoogle Scholar
  47. T. Chen, I. Goodfellow, and J. Shlens. 2015. Net2net: Accelerating learning via knowledge transfer. arXiv:1511.05641.Google ScholarGoogle Scholar
  48. M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. P. Bashivan, M. Tensen, and J. J. DiCarlo. 2019. Teacher guided architecture search. In Proceedings of the IEEE International Conference on Computer Vision. 5320–5329.Google ScholarGoogle Scholar
  50. X. Zheng, R. Ji, L. Tang, B. Zhang, J. Liu, and Q. Tian. 2019. Multinomial distribution learning for effective neural architecture search. In Proceedings of the IEEE International Conference on Computer Vision. 1304–1313.Google ScholarGoogle Scholar
  51. H. Cai, T. Chen, W. Zhang, Y. Yu, and J. Wang. 2018. Efficient architecture search by network transformation. In 32nd AAAI Conference on Artificial Intelligence. 2787–2794.Google ScholarGoogle Scholar
  52. A. Ashok, N. Rhinehart, F. Beainy, and K. M. Kitani. 2017. N2N learning: Network to network compression via policy gradient reinforcement learning. arXiv:1709.06030.Google ScholarGoogle Scholar
  53. J. Mei, Y. Li, X. Lian, X. Jin, L. Yang, A. Yuille, and J. Yang. 2019. AtomNAS: Fine-grained end-to-end neural architecture search. arXiv:1912.09640.Google ScholarGoogle Scholar
  54. X. Gong, S. Chang, Y. Jiang, and Z. Wang. 2019. AutoGAN: Neural architecture search for generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 3224–3234.Google ScholarGoogle Scholar
  55. R. Pasunuru and M. Bansal. 2019. Continual and multi-task architecture search. arXiv:1906.05226.Google ScholarGoogle Scholar
  56. G. Hinton, O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531.Google ScholarGoogle Scholar
  57. H. Cai, J. Yang, W. Zhang, S. Han, and Y. Yu. 2018. Path-level network transformation for efficient architecture search. arXiv:1806.02639.Google ScholarGoogle Scholar
  58. C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning. In 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  59. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.Google ScholarGoogle Scholar
  60. J. Fang, Y. Sun, K. Peng, Q. Zhang, Y. Li, W. Liu, and X. Wang. 2020. Fast neural network adaptation via parameter remapping and architecture search. arXiv:2001.02525.Google ScholarGoogle Scholar
  61. K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, and E. P. Xing. 2018. Neural architecture search with Bayesian optimisation and optimal transport. In Advances in Neural Information Processing Systems. 2016–2025.Google ScholarGoogle Scholar
  62. R. Negrinho and G. Gordon. 2017. DeepArchitect: Automatically designing and training deep architectures. arXiv:1704.08792.Google ScholarGoogle Scholar
  63. C. Liu, L. C. Chen, F. Schroff, H. Adam, W. Hua, A. L. Yuille, and L. Fei-Fei. 2019. Auto-DeepLab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 82–92.Google ScholarGoogle Scholar
  64. S. Ding, T. Chen, X. Gong, W. Zha, and Z. Wang. 2020. AutoSpeech: Neural architecture search for speaker recognition. arXiv:2005.03215.Google ScholarGoogle Scholar
  65. Y. Zhang, Z. Qiu, J. Liu, T. Yao, D. Liu, and T. Mei. 2019. Customizable architecture search for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11641–11650.Google ScholarGoogle Scholar
  66. Y. Chen, T. Yang, X. Zhang, G. Meng, C. Pan, and J. Sun. 2019. DetNAS: Neural architecture search on object detection. arXiv:1903.10979.Google ScholarGoogle Scholar
  67. G. Anandalingam and T. L. Friesz. 1992. Hierarchical optimization: An introduction. Annals of Operations Research 34, 1 (1992), 1–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. B. Colson, P. Marcotte, and G. Savard. 2007. An overview of bilevel optimization. Annals of Operations Research 153, 1 (2007), 235–256.Google ScholarGoogle ScholarCross RefCross Ref
  69. R. Shin, C. Packer, and D. Song. 2018. Differentiable Neural Network Architecture Search. ICLR.Google ScholarGoogle Scholar
  70. K. Ahmed and L. Torresani. 2018. MaskConnect: Connectivity learning by gradient descent. In Proceedings of the European Conference on Computer Vision (ECCV’18). 349–365.Google ScholarGoogle Scholar
  71. S. Saxena and J. Verbeek. 2016. Convolutional neural fabrics. In Advances in Neural Information Processing Systems. 4053–4061.Google ScholarGoogle Scholar
  72. K. Ahmed and L. Torresani. 2017. Connectivity learning in multi-branch networks. arXiv:1709.09582.Google ScholarGoogle Scholar
  73. T. Veniat and L. Denoyer. 2018. Learning time/memory-efficient deep architectures with budgeted super networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3492–3500.Google ScholarGoogle Scholar
  74. R. Luo, F. Tian, T. Qin, E. Chen, and T. Y. Liu. 2018. Neural architecture optimization. In Advances in Neural Information Processing Systems. 7816–7827.Google ScholarGoogle Scholar
  75. J. Chang, Y. Guo, G. Meng, S. Xiang, and C. Pan. 2019. DATA: Differentiable ArchiTecture approximation. In Advances in Neural Information Processing Systems. 874–884.Google ScholarGoogle Scholar
  76. G. Ghiasi, T. Y. Lin, and Q. V. Le. 2019. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7036–7045.Google ScholarGoogle Scholar
  77. D. Tran, J. Ray, Z. Shou, S. F. Chang, and M. Paluri. 2017. ConvNet architecture search for spatiotemporal feature learning. arXiv:1708.05038.Google ScholarGoogle Scholar
  78. L. C. Chen, M. Collins, Y. Zhu, G. Papandreou, B. Zoph, F. Schroff, ... and J. Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. In Advances in Neural Information Processing Systems. 8699–8710.Google ScholarGoogle Scholar
  79. C. Ying, A. Klein, E. Real, E. Christiansen, K. Murphy, and F. Hutter. 2019. NAS-bench-101: Towards reproducible neural architecture search. arXiv:1902.09635.Google ScholarGoogle Scholar
  80. Y. Jiang, C. Hu, T. Xiao, C. Zhang, and J. Zhu. 2019. Improved differentiable architecture search for language modeling and named entity recognition. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 3576–3581.Google ScholarGoogle Scholar
  81. J. Lafferty, A. McCallum, and F. C. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. ICML 2001 (2001), 282–289.Google ScholarGoogle Scholar
  82. D. Koller and N. Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press.Google ScholarGoogle Scholar
  83. X. Dong and Y. Yang. 2019. Searching for a robust neural architecture in four GPU hours. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1761–1770.Google ScholarGoogle Scholar
  84. Y. Xu, L. Xie, X. Zhang, X. Chen, G. Qi, Q. Tian, and H. Xiong. 2019. PC-DARTS: Partial channel connections for memory-efficient architecture search. arXiv:abs/1907.05737.Google ScholarGoogle Scholar
  85. T. Elsken, J. H. Metzen, and F. Hutter. 2017. Simple and efficient architecture search for convolutional neural networks. arXiv:1711.04528.Google ScholarGoogle Scholar
  86. A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. 2014. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 806–813.Google ScholarGoogle Scholar
  87. B. Zoph, D. Yuret, J. May, and K. Knight. 2016. Transfer learning for low-resource neural machine translation. arXiv:1604.02201.Google ScholarGoogle Scholar
  88. M. T. Luong, Q. V. Le, I. Sutskever, O. Vinyals, and L. Kaiser. 2015. Multi-task sequence to sequence learning. arXiv:1511.06114.Google ScholarGoogle Scholar
  89. D. Ha, A. Dai, and Q. V. Le. 2016. Hypernetworks. arXiv:1609.09106.Google ScholarGoogle Scholar
  90. L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. 2017. Hyperband: Bandit-based configuration evaluation for hyperparameter optimization. In International Conference on Learning Representations 2017 (ICLR’17).Google ScholarGoogle Scholar
  91. C. Zhang, M. Ren, and R. Urtasun. 2018. Graph hypernetworks for neural architecture search. arXiv:1810.05749.Google ScholarGoogle Scholar
  92. X. Dong and Y. Yang. 2019. One-shot neural architecture search via self-evaluated template network. In Proceedings of the IEEE International Conference on Computer Vision. 3681–3690.Google ScholarGoogle Scholar
  93. M. Mirza and S. Osindero. 2014. Conditional generative adversarial nets. arXiv:1411.1784.Google ScholarGoogle Scholar
  94. T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. 2016. Improved techniques for training gans. In Advances in Neural Information Processing Systems. 2234–2242.Google ScholarGoogle Scholar
  95. T. Karras, T. Aila, S. Laine, and J. Lehtinen. 2017. Progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196.Google ScholarGoogle Scholar
  96. N. T. Tran, T. A. Bui, and N. M. Cheung. 2018. Dist-GAN: An improved GAN using distance constraints. In Proceedings of the European Conference on Computer Vision (ECCV’18). 370–385.Google ScholarGoogle Scholar
  97. W. Wang, Y. Sun, and S. Halgamuge. 2018. Improving MMD-GAN training with repulsive loss function. arXiv:1812.09916.Google ScholarGoogle Scholar
  98. Q. Hoang, T. D. Nguyen, T. Le, and D. Phung. 2018. MGAN: Training generative adversarial nets with multiple generators. In ICLR.Google ScholarGoogle Scholar
  99. G. Ghiasi, T. Y. Lin, and Q. V. Le. 2019. NAS-FPN: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7036–7045.Google ScholarGoogle Scholar
  100. H. Cai, C. Gan, and S. Han. 2019. Once for all: Train one network and specialize it for efficient deployment. arXiv:1908.09791.Google ScholarGoogle Scholar
  101. X. Chu, B. Zhang, R. Xu, and J. Li. 2019. FairNAS: Rethinking evaluation fairness of weight sharing neural architecture search. arXiv:1907.01845.Google ScholarGoogle Scholar
  102. X. Li, C. Lin, C. Li, M. Sun, W. Wu, J. Yan, and W. Ouyang. 2019. Improving one-shot NAS by suppressing the posterior fading. arXiv:1910.02543.Google ScholarGoogle Scholar
  103. B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, ... and K. Keutzer. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10734–10742.Google ScholarGoogle Scholar
  104. H. Cai, L. Zhu, and S. Han. 2018. ProxylessNAS: Direct neural architecture search on target task and hardware. arXiv:1812.00332.Google ScholarGoogle Scholar
  105. L. Li and A. Talwalkar. 2019. Random search and reproducibility for neural architecture search. arXiv:1902.07638.Google ScholarGoogle Scholar
  106. Z. Guo, X. Zhang, H. Mu, W. Heng, Z. Liu, Y. Wei, and J. Sun. 2019. Single path one-shot neural architecture search with uniform sampling. arXiv:1904.00420.Google ScholarGoogle Scholar
  107. T. Domhan, J. T. Springenberg, and F. Hutter. 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In 24th International Joint Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  108. A. Klein, S. Falkner, J. T. Springenberg, and F. Hutter. 2016. Learning curve prediction with Bayesian neural networks. In International Conference on Learning Representation. 184--194.Google ScholarGoogle Scholar
  109. A. Chandrashekaran and I. R. Lane. 2017. Speeding up hyper-parameter optimization by extrapolation of learning curves using previous builds. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Cham, 477–492.Google ScholarGoogle Scholar
  110. B. Deng, J. Yan, and D. Lin. 2017. Peephole: Predicting network performance before training. arXiv:1712.03351.Google ScholarGoogle Scholar
  111. A. Zela, T. Elsken, T. Saikia, Y. Marrakchi, T. Brox, and F. Hutter. 2020. Understanding and robustifying differentiable architecture search. In ICLR.Google ScholarGoogle Scholar
  112. J. Peng, M. Sun, Z. X. Zhang, T. Tan, and J. Yan. 2019. Efficient neural architecture transformation search in channel-level for object detection. In Advances in Neural Information Processing Systems. 14290–14299.Google ScholarGoogle Scholar
  113. Y. Zhu et al. [n.d.]. Deep subdomain adaptation network for image classification. In IEEE Transactions on Neural Networks and Learning Systems. DOI: 10.1109/TNNLS.2020.2988928.Google ScholarGoogle Scholar
  114. Miao Zhang, Huiqi Li, Shirui Pan, Xiaojun Chang, Chuan Zhou, Zongyuan Ge, and Steven W. Su. One-shot neural architecture search: Maximising diversity to overcome catastrophic forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence. DOI:10.1109/TPAMI.2020.3035351Google ScholarGoogle Scholar
  115. N. Nayman, A. Noy, T. Ridnik, I. Friedman, R. Jin, and L. Zelnik. 2019. XNAS: Neural architecture search with expert advice. In Advances in Neural Information Processing Systems. 1975–1985.Google ScholarGoogle Scholar
  116. S. Cao, X. Wang, and K. M. Kitani. 2019. Learnable embedding space for efficient neural architecture compression. arXiv:1902.00383.Google ScholarGoogle Scholar
  117. T. Elsken, J. H. Metzen, and F. Hutter. 2018. Efficient multi-objective neural architecture search via Lamarckian evolution. arXiv:1804.09081.Google ScholarGoogle Scholar
  118. X. Li, Y. Zhou, Z. Pan, and J. Feng. 2019. Partial order pruning: For best speed/accuracy trade-off in neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9145–9153.Google ScholarGoogle Scholar
  119. X. Dai, P. Zhang, B. Wu, H. Yin, F. Sun, Y. Wang, ... and P. Vajda. 2019. ChamNet: Towards efficient network design through platform-aware model adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11398–11407.Google ScholarGoogle Scholar
  120. F. Liang, C. Lin, R. Guo, M. Sun, W. Wu, J. Yan, and W. Ouyang. 2019. Computation reallocation for object detection. arXiv:1912.11234.Google ScholarGoogle Scholar
  121. I. Fedorov, R. P. Adams, M. Mattina, and P. Whatmough. 2019. Sparse: Sparse architecture search for CNNs on resource-constrained microcontrollers. In Advances in Neural Information Processing Systems. 4978–4990.Google ScholarGoogle Scholar
  122. V. Nekrasov, H. Chen, C. Shen, and I. Reid. 2019. Fast neural architecture search of compact semantic segmentation models via auxiliary cells. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9126–9135.Google ScholarGoogle Scholar
  123. E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. 2019. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 113–123.Google ScholarGoogle Scholar
  124. M. Tan and Q. V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. arXiv:1905.11946.Google ScholarGoogle Scholar
  125. X. Zhang, Q. Wang, J. Zhang, and Z. Zhong. 2019. Adversarial AutoAugment. arXiv:1912.11188.Google ScholarGoogle Scholar
  126. X. Dong and Y. Yang. 2019. Network pruning via transformable architecture search. In Advances in Neural Information Processing Systems. 759–770.Google ScholarGoogle Scholar
  127. Z. Lu, I. Whalen, V. Boddeti, Y. D. Dhebar, K. Deb, E. D. Goodman, and W. Banzhaf. 2018. NSGA-NET: A multi-objective genetic algorithm for neural architecture search. Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  128. X. Dong, L. Liu, K. Musial, and B. Gabrys. 2020. NATS-Bench: Benchmarking NAS algorithms for architecture topology and size. arXiv:2009.00437.Google ScholarGoogle Scholar
  129. A. Yang, P. M. Esperança, and F. M. Carlucci. 2019. NAS evaluation is frustratingly hard. arXiv:1912.12522.Google ScholarGoogle Scholar
  130. M. Wistuba. 2018. Deep learning architecture search by neuro-cell-based evolution with function-preserving mutations. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Cham, 243–258.Google ScholarGoogle Scholar
  131. T. DeVries and G. W. Taylor. 2017. Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552.Google ScholarGoogle Scholar
  132. F. P. Casale, J. Gordon, and N. Fusi. 2019. Probabilistic neural architecture search. arXiv:abs/1902.05116.Google ScholarGoogle Scholar
  133. S. Zagoruyko and N. Komodakis. 2016. Wide residual networks. arXiv:1605.07146.Google ScholarGoogle Scholar
  134. X. Gastaldi. 2017. Shake-shake regularization. arXiv:1705.07485.Google ScholarGoogle Scholar
  135. Y. Yamada, M. Iwamura, and K. Kise. 2016. Deep pyramidal residual networks with separated stochastic depth. arXiv:1612.01230.Google ScholarGoogle Scholar
  136. G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. 2016. Deep networks with stochastic depth. In European Conference on Computer Vision. Springer, Cham, 646–661.Google ScholarGoogle Scholar
  137. G. Larsson, M. Maire, and G. Shakhnarovich. 2016. FractalNet: Ultra-deep neural networks without residuals. arXiv:1605.07648.Google ScholarGoogle Scholar
  138. H. Zhou, M. Yang, J. Wang, and W. Pan. 2019. BayesNAS: A Bayesian approach for neural architecture search. arXiv:1905.04919.Google ScholarGoogle Scholar
  139. X. Zhang, X. Zhou, M. Lin, and J. Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856.Google ScholarGoogle Scholar
  140. S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492–1500.Google ScholarGoogle Scholar
  141. X. Zhang, Z. Li, C. Change Loy, and D. Lin. 2017. PolyNet: A pursuit of structural diversity in very deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 718–726.Google ScholarGoogle Scholar
  142. Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng. 2017. Dual path networks. In Advances in Neural Information Processing Systems. 4467–4475.Google ScholarGoogle Scholar
  143. J. Devlin, M. W. Chang, K. Lee, and K. Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.Google ScholarGoogle Scholar
  144. J. Long, E. Shelhamer, and T. Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.Google ScholarGoogle Scholar
  145. W. Sun, Z. Huang, M. Liang, T. Shao, and H. Bi. 2020. Cocoon image segmentation method based on fully convolutional networks. In Proceedings of the 7th Asia International Symposium on Mechatronics. Springer, Singapore, 832–843.Google ScholarGoogle Scholar
  146. G. Vecchio, S. Palazzo, D. Giordano, F. Rundo, and C. Spampinato. [n.d.]. MASK-RL: Multiagent video object segmentation framework through reinforcement learning. In IEEE Transactions on Neural Networks and Learning Systems. DOI: 10.1109/TNNLS.2019.2963282.Google ScholarGoogle Scholar
  147. Z. Ji, Y. Zhao, Y. Pang, X. Li and J. Han. [n.d.]. Deep attentive video summarization with distribution consistency learning. In IEEE Transactions on Neural Networks and Learning Systems. DOI: 10.1109/TNNLS.2020.2991083.Google ScholarGoogle Scholar
  148. D. Zhang, J. Han, L. Zhao, and T. Zhao. [n.d.]. From discriminant to complete: Reinforcement searching-agent learning for weakly supervised object detection. In IEEE Transactions on Neural Networks and Learning Systems. DOI: 10.1109/TNNLS.2020.2969483.Google ScholarGoogle Scholar
  149. K. Shih, C. Chiu, J. Lin, and Y. Bu. 2020. Real-time object detection with reduced-region proposal network via multi-feature concatenation. In IEEE Transactions on Neural Networks and Learning Systems. 31, 6 (June 2020, pp. 2164–2173. DOI: 10.1109/TNNLS.2019.2929059.Google ScholarGoogle ScholarCross RefCross Ref
  150. Y. Zhou, G. G. Yen, and Z. Yi. [n.d.]. Evolutionary compression of deep neural networks for biomedical image segmentation. In IEEE Transactions on Neural Networks and Learning Systems. DOI: 10.1109/TNNLS.2019.2933879.Google ScholarGoogle Scholar
  151. B. Zhang, D. Xiong, J. Xie, and J. Su. [n.d.]. Neural machine translation with GRU-gated attention model. In IEEE Transactions on Neural Networks and Learning Systems. DOI: 10.1109/TNNLS.2019.2957276.Google ScholarGoogle Scholar
  152. M. Guo, Y. Yang, R. Xu, and Z. Liu. 2019. When NAS meets robustness: In search of robust architectures against adversarial attacks. arXiv:abs/1911.10695.Google ScholarGoogle Scholar
  153. G. Li, G. Qian, I. C. Delgadillo, M. Müller, A. Thabet, and B. Ghanem. 2019. SGAS: Sequential greedy architecture search. arXiv:1912.00195.Google ScholarGoogle Scholar
  154. J. Fang, Y. Sun, Q. Zhang, Y. Li, W. Liu, and X. Wang. 2019. Densely connected search space for more flexible neural architecture search. arXiv:1906.09607.Google ScholarGoogle Scholar
  155. D. So, C. Liang, and Q. V. Le. 2019. The evolved transformer. arXiv:abs/1901.11117.Google ScholarGoogle Scholar
  156. A. Zela, J. Siems, and F. Hutter. 2020. NAS-Bench-1Shot1: Benchmarking and dissecting one-shot neural architecture search. arXiv:2001.10422.Google ScholarGoogle Scholar
  157. X. Dai, A. Wan, P. Zhang, B. Wu, Z. He, Z. Wei, ... and J. E. Gonzalez. 2020. FBNetV3: Joint architecture-recipe search using neural acquisition function. arXiv:2006.02049.Google ScholarGoogle Scholar
  158. X. Dong, M. Tan, A. W. Yu, D. Peng, B. Gabrys, and Q. V. Le. 2020. AutoHAS: Differentiable hyper-parameter and architecture search. arXiv:2006.03656.Google ScholarGoogle Scholar
  159. H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv:1710.09412.Google ScholarGoogle Scholar
  160. G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. 2016. Deep networks with stochastic depth. In European Conference on Computer Vision. Springer, Cham, 646–661.Google ScholarGoogle Scholar
  161. D. R. Jones. 2001. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization 21, 4 (2001), 345–383.Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. F. Hutter, H. H. Hoos, and K. Leyton-Brown. 2011. Sequential model-based optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization. Springer, Berlin, 507–523.Google ScholarGoogle Scholar

Index Terms

  1. A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions



    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 54, Issue 4
      May 2022
      782 pages
      Issue’s Table of Contents

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 May 2021
      • Revised: 1 January 2021
      • Accepted: 1 January 2021
      • Received: 1 June 2020
      Published in csur Volume 54, Issue 4


      Request permissions about this article.

      Request Permissions

      Check for updates


      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.



    View online with eReader.


    HTML Format

    View this article in HTML Format .

    View HTML Format