Skip to main content
Log in

A comprehensive survey on optimizing deep learning models by metaheuristics

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Deep neural networks (DNNs), which are extensions of artificial neural networks, can learn higher levels of feature hierarchy established by lower level features by transforming the raw feature space to another complex feature space. Although deep networks are successful in a wide range of problems in different fields, there are some issues affecting their overall performance such as selecting appropriate values for model parameters, deciding the optimal architecture and feature representation and determining optimal weight and bias values. Recently, metaheuristic algorithms have been proposed to automate these tasks. This survey gives brief information about common basic DNN architectures including convolutional neural networks, unsupervised pre-trained models, recurrent neural networks and recursive neural networks. We formulate the optimization problems in DNN design such as architecture optimization, hyper-parameter optimization, training and feature representation level optimization. The encoding schemes used in metaheuristics to represent the network architectures are categorized. The evolutionary and selection operators, and also speed-up methods are summarized, and the main approaches to validate the results of networks designed by metaheuristics are provided. Moreover, we group the studies on the metaheuristics for deep neural networks based on the problem type considered and present the datasets mostly used in the studies for the readers. We discuss about the pros and cons of utilizing metaheuristics in deep learning field and give some future directions for connecting the metaheuristics and deep learning. To the best of our knowledge, this is the most comprehensive survey about metaheuristics used in deep learning field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Agarwal S, Awan A, Roth D (2004) Learning to detect objects in images via a sparse, part-based representation. IEEE Trans Pattern Anal Mach Intell 26(11):1475–1490

    Google Scholar 

  • Agbehadji IE, Millham R, Fong SJ, Yang H (2018) Kestrel-based search algorithm (KSA) for parameter tuning unto long short term memory (LSTM) network for feature selection in classification of high-dimensional bioinformatics datasets. In: Federated conference on computer science and information systems (FedCSIS), pp 15–20

  • Ai S, Chakravorty A, Rong C (2019) Evolutionary ensemble LSTM based household peak demand prediction. In: International conference on artificial intelligence in information and communication (ICAIIC), pp 1–6

  • Alvernaz S, Togelius J (2017) Autoencoder-augmented neuroevolution for visual doom playing. In: IEEE conference on computational intelligence and games (CIG), pp 1–8

  • Andrzejak RG, Lehnertz K, Mormann F, Rieke C, David P, Elger CE (2001) Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys Rev E 64

    Google Scholar 

  • Assuncao F, Sereno D, Lourenco N, Machado P, Ribeiro B (2018) Automatic evolution of autoencoders for compressed representations. In: IEEE congress on evolutionary computation (CEC), pp 1–8

  • Ayumi V, Rere LMR, Fanany MI, Arymurthy AM (2016) Optimization of convolutional neural network using microcanonical annealing algorithm. In: International conference on advanced computer science and information systems (ICACSIS). IEEE, pp 506–511

  • Badem H, Basturk A, Caliskan A, Yuksel ME (2017) A new efficient training strategy for deep neural networks by hybridization of artificial bee colony and limited-memory BFGS optimization algorithms. Neurocomputing 266:506–526

    Google Scholar 

  • Baker B, Gupta O, Naik N, Raskar R (2016) Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167

  • Baker B, Gupta O, Raskar R, Naik N (2017) Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823

  • Baldominos A, Saez Y, Isasi P (2019) Hybridizing evolutionary computation and deep neural networks: an approach to handwriting recognition using committees and transfer learning. Complexity 2019:1–16

    Google Scholar 

  • Banharnsakun A (2018) Towards improving the convolutional neural networks for deep learning using the distributed artificial bee colony method. Int J Mach Learn Cybern 10:1301–1311

    Google Scholar 

  • Bender G, Kindermans P-J, Zoph B, Vasudevan V, Le Q (2018) Understanding and simplifying one-shot architecture search. In: International conference on machine learning, pp 550–559

  • Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    MathSciNet  MATH  Google Scholar 

  • Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of the 24th international conference on neural information processing systems, NIPS’11, USA. Curran Associates Inc., pp 2546–2554

  • Bhattacharya U, Chaudhuri BB (2005) Databases for research on recognition of handwritten characters of Indian scripts. In: Eighth international conference on document analysis and recognition (ICDAR’05), vol 2, pp 789–793

  • Bibaeva V (2018) Using metaheuristics for hyper-parameter optimization of convolutional neural networks. In: IEEE 28th international workshop on machine learning for signal processing (MLSP), pp 1–6

  • Blanco R, Malagón P, Cilla JJ, Moya JM (2018) Multiclass network attack classifier using CNN tuned with genetic algorithms. In: 28th International symposium on power and timing modeling, optimization and simulation (PATMOS), pp 177–182

  • Bochinski E, Senst T, Sikora T (2017) Hyper-parameter optimization for convolutional neural network committees based on evolutionary algorithms. In: IEEE international conference on image processing (ICIP), pp 3924–3928

  • Brinkmann BH, Bower MR, Stengel KA, Worrell GA, Stead M (2009) Large-scale electrophysiology: acquisition, compression, encryption, and storage of big data. J. Neurosci Methods 180(1):185–192

    Google Scholar 

  • Brock A, Lim T, Ritchie JM, Weston N (2017) Smash: one-shot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344

  • Cai H, Yang J, Zhang W, Han S, Yu Y (2018) Path-level network transformation for efficient architecture search. arXiv preprint arXiv:1806.02639

  • Cai Y, Cai Z, Zeng M, Liu X, Wu J, Wang G (2018) A novel deep learning approach: Stacked evolutionary auto-encoder. In: International joint conference on neural networks (IJCNN), pp 1–8

  • Carletta J, Ashby S, Bourban S, Flynn M, Guillemot M, Hain T, Kadlec J, Karaiskos V, Kraaij W, Kronenthal M, Lathoud G, Lincoln M, Lisowska A, andWilfried Post IM, Reidsma D, Wellner P (2006) The AMI meeting corpus: a pre-announcement. In: Proceedings of the second international conference on machine learning for multimodal interaction, pp 28–39

  • Cha Y-J, Choi W, Bykztrk O (2017) Deep learning-based crack damage detection using convolutional neural networks. Comput Aided Civ Infrastruct Eng 32(5):361–378

    Google Scholar 

  • Chen T, Goodfellow I, Shlens J (2015a) Net2net: accelerating learning via knowledge transfer. arXiv preprint arXiv:1511.05641

  • Chen X, Fang H, Lin T-Y, Vedantam R, Gupta S, Dollár P, Zitnick CL (2015b) Microsoft coco captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325

  • Cheng F, Yu J, Xiong H (2010) Facial expression recognition in Jaffe dataset based on Gaussian process classification. IEEE Trans Neural Netw 21(10):1685–1690

    Google Scholar 

  • Chhabra Y, Varshney S, Ankita (2017). Hybrid particle swarm training for convolution neural network (CNN). In: Tenth international conference on contemporary computing (IC3), pp 1–3

  • Chiba Z, Abghour N, Moussaid K, Rida M (2019) Intelligent approach to build a deep neural network based IDS for cloud environment using combination of machine learning algorithms. Comput Secur 86:291–317

    Google Scholar 

  • Chiroma H, Gital AY, Rana N, Abdulhamid SM, Muhammad AN, Umar AY, Abubakar AI (2019) Nature inspired meta-heuristic algorithms for deep learning: recent progress and novel perspective. In: Advances in intelligent systems and computing. Springer, pp 59–70

  • CICIDS2017 (2017) Cicids2017 data set

  • CIDDS-001 (2019) Cidds-001 dataset, hochschule coburg

  • Coates A, Ng AY, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. International conference on artificial intelligence and statistics. Pt. Lauderdale, Florida, USA, pp 215–223

    Google Scholar 

  • Coello CAC (2003) Evolutionary multi-objective optimization: a critical review. In: Evolutionary optimization. Springer, pp 117–146

  • Cohen G, Afshar S, Tapson J, van Schaik A (2017) EMNIST: an extension of MNIST to handwritten letters

  • David OE, Greental I (2014) Genetic algorithms for evolving deep neural networks. In: Proceedings of the 2014 conference companion on genetic and evolutionary computation companion-GECCO Comp14. ACM Press, pp 1451–1452

  • De Jong K (1975) An analysis of the behavior of a class of genetic adaptive systems. Ph.D. thesis, University of Michigan

  • de Rosa GH, Papa JP (2019) Soft-tempering deep belief networks parameters through genetic programming. J Artif Intell Syst 1(1):43–59

    Google Scholar 

  • Deepa S, Baranilingesan I (2018) Optimized deep learning neural network predictive controller for continuous stirred tank reactor. Comput Electr Eng 71:782–797

    Google Scholar 

  • Delowar Hossain, Capi G (2017) Genetic algorithm based deep learning parameters tuning for robot object recognition and grasping. Zenodo

  • Desell T (2017) Large scale evolution of convolutional neural networks using volunteer computing. In: Proceedings of the genetic and evolutionary computation conference companion, GECCO ’17. Association for Computing Machinery, New York, pp 127–128

  • Ding C, Li W, Zhang L, Tian C, Wei W, Zhang Y (2018) Hyperspectral image classification based on convolutional neural networks with adaptive network structure. In: International conference on orange technologies (ICOT), pp 1–5

  • Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. Trans Syst Man Cyber B 26(1):29–41

    Google Scholar 

  • Dua D, Graff C (2017) UCI machine learning repository

  • Eitz M, Hays J, Alexa M (2012) How do humans sketch objects? ACM Trans Graph (Proc SIGGRAPH) 31(4):44:1–44:10

  • Elsken T, Metzen JH, Hutter F (2018) Efficient multi-objective neural architecture search via lamarckian evolution. arXiv preprint arXiv:1804.09081

  • Erhan D, Courville A, Bengio Y, Vincent P (2010) Why does unsupervised pre-training help deep learning? In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 201–208

  • Evans B, Al-Sahaf H, Xue B, Zhang M (2018) Evolutionary deep learning: a genetic programming approach to image classification. In: IEEE congress on evolutionary computation (CEC), pp 1–6

  • Falco ID, Pietro GD, Sannino G, Scafuri U, Tarantino E, Cioppa AD, Trunfio GA (2018) Deep neural network hyper-parameter setting for classification of obstructive sleep apnea episodes. In: IEEE symposium on computers and communications (ISCC), pp 01187–01192

  • Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70

    Google Scholar 

  • Fekiač J, Zelinka I, Burguillo JC (2011) A review of methods for encoding neural network topologies in evolutionary computation. In: Proceedings of 25th European conference on modeling and simulation ECMS, pp 410–416

  • Floreano D, Dürr P, Mattiussi C (2008) Neuroevolution: from architectures to learning. Evol Intell 1(1):47–62

    Google Scholar 

  • Fogel LJ, Owens AJ, Walsh MJ (1966) Artificial intelligence through simulated evolution. Wiley, New York

    MATH  Google Scholar 

  • Fong S, Deb S, Yang XS (2017) How meta-heuristic algorithms contribute to deep learning in the hype of big data analytics. Advances in intelligent systems and computing. Springer, Singapore, pp 3–25

    Google Scholar 

  • Fujino S, Mori N, Matsumoto K (2017) Deep convolutional networks for human sketches by means of the evolutionary deep learning. In: Joint 17th world congress of international fuzzy systems association and 9th international conference on soft computing and intelligent systems (IFSA-SCIS), pp 1–5

  • Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4):193–202

    MATH  Google Scholar 

  • Furui S, Maekawa K, Isahara H (2000) A japanese national project on spontaneous speech corpus and processing technology. In: Proceedings of ASR’00, pp 244–248

  • Gaier A, Ha D (2019) Weight agnostic neural networks. In: Advances in neural information processing systems, pp 5364–5378

  • Gauci J, Stanley K (2007) Generating large-scale neural networks through discovering geometric regularities. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, pp 997–1004

  • Ghamisi P, Chen Y, Zhu XX (2016) A self-improving convolution neural network for the classification of hyperspectral data. IEEE Geosci Remote Sens Lett 13(10):1537–1541

    Google Scholar 

  • Gibb S, La HM, Louis S (2018) A genetic algorithm for convolutional network structure optimization for concrete crack detection. In: IEEE congress on evolutionary computation (CEC), pp 1–8

  • Glover F (1986) Future paths for integer programming and links to artificial intelligence. Comput Oper Res 13(5):533–549

    MathSciNet  MATH  Google Scholar 

  • Goldberg DE, Richardson J et al (1987) Genetic algorithms with sharing for multimodal function optimization. In: Genetic algorithms and their applications: proceedings of the second international conference on genetic algorithms. Lawrence Erlbaum, Hillsdale, pp 41–49

  • Gong M, Liu J, Li H, Cai Q, Su L (2015) A multiobjective sparse feature learning model for deep neural networks. IEEE Trans Neural Netw Learn Syst 26(12):3263–3277

    MathSciNet  Google Scholar 

  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  • Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology

  • Gruau F (1994) Neural network synthesis using cellular encoding and the genetic algorithm

  • Gülcü A, Kuş Z (2019) Konvolüsyonel sinir ağlarında hiper-parametre optimizasyonu yöntemlerinin incelenmesi. Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji 7(2):503–522

    Google Scholar 

  • Guo B, Hu J, Wu W, Peng Q, Wu F (2019) The tabu\_genetic algorithm: a novel method for hyper-parameter optimization of learning algorithms. Electronics 8(5):579

    Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  • Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    MathSciNet  MATH  Google Scholar 

  • Hinz T, Navarro-Guerrero N, Magg S, Wermter S (2018) Speeding up the hyperparameter optimization of deep convolutional neural networks. Int J Comput Intell Appl 17(02):1850008

    Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Google Scholar 

  • Hoekstra V (2011) An overview of neuroevolution techniques. Technical report

  • Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor

    Google Scholar 

  • Hossain D, Capi G (2017) Multiobjective evolution for deep learning and its robotic applications. In: 8th international conference on information, intelligence, systems applications (IISA), pp 1–6

  • Hossain D, Capi G, Jindai M (2018) Optimizing deep learning parameters using genetic algorithm for object recognition and robot grasping. J Electron Sci Technol 16(1):11–15

    Google Scholar 

  • Hosseini M, Pompili D, Elisevich K, Soltanian-Zadeh H (2017) Optimized deep learning for EEG big data and seizure prediction BCI via internet of things. IEEE Trans Big Data 3(4):392–404

    Google Scholar 

  • Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  • Huang Y, Wu R, Sun Y et al (2015) Vehicle logo recognition system based on convolutional neural networks with a pretraining strategy. IEEE Trans Intell Transp Syst 16(4):1951–19604

    Google Scholar 

  • Hull JJ (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554

    Google Scholar 

  • Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In: International conference on learning and intelligent optimization. Springer, pp 507–523

  • Irsoy O, Cardie C (2014) Deep recursive neural networks for compositionality in language. In: Advances in neural information processing systems, pp 2096–2104

  • Jacob C, Rehder J (1993) Evolution of neural net architectures by a hierarchical grammar-based genetic system. In: Artificial neural nets and genetic algorithms. Springer, pp 72–79

  • Jaderberg M, Dalibard V, Osindero S, Czarnecki WM, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K (2017) Population based training of neural networks

  • Jain A, Phanishayee A, Mars J, Tang L, Pekhimenko G (2018) Gist: efficient data encoding for deep neural network training. In: ACM/IEEE 45th annual international symposium on computer architecture (ISCA), pp 776–789

  • Jin H, Song Q, Hu X (2019) Auto-keras: an efficient neural architecture search system. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1946–1956

  • Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges. Swarm Evol Comput 1(2):61–70

    Google Scholar 

  • Junior FEF, Yen GG (2019) Particle swarm optimization of deep neural networks architectures for image classification. Swarm Evol Comput 49:62–74

    Google Scholar 

  • Kaggle (2017) Kaggle competition dataset

  • Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical Report TR06, Computer Engineering Department, Engineering Faculty, Erciyes University

  • Karaboga D, Akay B (2009) A comparative study of artificial bee colony algorithm. Appl Math Comput 214(1):108–132

    MathSciNet  MATH  Google Scholar 

  • Kassahun Y, Edgington M, Metzen JH, Sommer G, Kirchner F (2007) A common genetic encoding for both direct and indirect encodings of networks. In: Proceedings of the 9th annual conference on genetic and evolutionary computation, pp 1029–1036

  • Kennedy J, Eberhart RC (1995) Particle swarm optimization. IEEE international conference on neural networks 4:1942–1948

    Google Scholar 

  • Khalifa MH, Ammar M, Ouarda W, Alimi AM (2017) Particle swarm optimization for deep learning of convolution neural network. In: Sudan conference on computer science and information technology (SCCSIT), pp 1–5

  • Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516

    Google Scholar 

  • Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680

    MathSciNet  MATH  Google Scholar 

  • Koza JR (1990) Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems. Technical Report STAN–CS–90–1314, Computer Science Department, Stanford University

  • Koza JR, Rice JP (1991) Genetic generation of both the weights and architecture for a neural network. In: IJCNN-91-seattle international joint conference on neural networks, vol 2. IEEE, pp 397–404

  • Koziel S, Michalewicz Z (1999) Evolutionary algorithms, homomorphous mappings, and constrained parameter optimization. Evol Comput 7(1):19–44

    Google Scholar 

  • Kramer MA (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE J 37(2):233–243

    Google Scholar 

  • Kramer O (2018) Evolution of convolutional highway networks. Springer, Berlin, pp 395–404

    Google Scholar 

  • Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical Report 5, University of Toronto

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  • Kösten MM, Barut M, Acir N (2018) Deep neural network training with iPSO algorithm. In: 26th Signal processing and communications applications conference (SIU), pp 1–4

  • Kumar P, Batra S (2018) Meta-heuristic based optimized deep neural network for streaming data prediction. In: International conference on advances in computing, communication control and networking (ICACCCN), pp 1079–1085

  • Lakshmanaprabu S, Sachi NM, Shankar K, Arunkumar N, Gustavo R (2019) Optimal deep learning model for classification of lung cancer on CT images. Future Gener Comput Syst 92:374–382

    Google Scholar 

  • Lamos-Sweeney JD (2012) Deep learning using genetic algorithms

  • Lander S, Shang Y (2015) EvoAE: a new evolutionary method for training autoencoders for deep learning networks. In: IEEE 39th annual computer software and applications conference, vol 2, pp 790–795

  • Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning, ICML ’07. Association for Computing Machinery, New York, pp 473-480

  • Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Google Scholar 

  • Li FF, Andreetto M, Ranzato MA, Perona P (2003) Caltech-101 silhouttes dataset. Technical Report 7694, California Institute of Technology

  • Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2018) Feature selection: a data perspective. ACM Comput Surv (CSUR) 50(6):94

    Google Scholar 

  • Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2017) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(1):6765–6816

    MathSciNet  MATH  Google Scholar 

  • Li L, Talwalkar A (2020) Random search and reproducibility for neural architecture search. In: Uncertainty in artificial intelligence. PMLR, pp 367–377

  • Li Y, Lu G, Zhou L, Jiao L (2017) Quantum inspired high dimensional hyperparameter optimization of machine learning model. In: International smart cities conference (ISC2), pp 1–6

  • Lim SM, Sultan ABM, Sulaiman MN, Mustapha A, Leong K (2017) Crossover and mutation operators of genetic algorithms. Int J Mach Learn Comput 7(1):9–12

    Google Scholar 

  • Lin K, Pai P, Ting Y (2019) Deep belief networks with genetic algorithms in forecasting wind speed. IEEE Access 7:99244–99253

    Google Scholar 

  • Lindenmayer A (1968) Mathematical models for cellular interactions in development II. Simple and branching filaments with two-sided inputs. J Theor Biol 18(3):300–315

  • Liu G, Xiao L, Xiong C (2017) Image classification with deep belief networks and improved gradient descent. In: IEEE international conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC), vol 1, pp 375–380

  • Liu H, Simonyan K, Vinyals O, Fernando C, Kavukcuoglu K (2017) Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436

  • Liu J, Gong M, Miao Q, Wang X, Li H (2018) Structure learning for deep neural networks based on multiobjective optimization. IEEE Trans Neural Netw Learn Syst 29(6):2450–2463

    MathSciNet  Google Scholar 

  • Liu P, Basha MDE, Li Y, Xiao Y, Sanelli PC, Fang R (2019) Deep evolutionary networks with expedited genetic algorithms for medical image denoising. Med Image Anal 54:306–315

    Google Scholar 

  • Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the 2015 IEEE international conference on computer vision (ICCV), ICCV ’15, pp 3730–3738

  • Loni M, Daneshtalab M, Sjödin M (2018a) ADONN: adaptive design of optimized deep neural networks for embedded systems. In: 21st Euromicro conference on digital system design (DSD), pp 397–404

  • Loni M, Majd A, Loni A, Daneshtalab M, Sjödin M, Troubitsyna E (2018b) Designing compact convolutional neural network for embedded stereo vision systems. In: IEEE 12th international symposium on embedded multicore/many-core systems-on-chip (MCSoC), pp 244–251

  • Lopez-Rincon A, Tonda A, Elati M, Schwander O, Piwowarski B, Gallinari P (2018) Evolutionary optimization of convolutional neural networks for cancer miRNA biomarkers classification. Appl Soft Comput 65:91–100

    Google Scholar 

  • Lorenzo PR, Nalepa J (2018) Memetic evolution of deep neural networks. In: Proceedings of the genetic and evolutionary computation conference, pp 505–512

  • Loussaief S, Abdelkrim A (2018) Convolutional neural network hyper-parameters optimization based on genetic algorithms. Int J Adv Comput Sci Appl 9(10):252–266

    Google Scholar 

  • Lu Z, Whalen I, Boddeti V, Dhebar Y, Deb K, Goodman E, Banzhaf W (2019) Nsga-net: neural architecture search using multi-objective genetic algorithm. In: Proceedings of the genetic and evolutionary computation conference, pp 419–427

  • Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. IEEE conference on computer vision and pattern recognition. California, USA, San Francisco, pp 94–101

  • Mahfoud SW (1995) Niching methods for genetic algorithms. Ph.D. thesis. University of Illinois at Urbana Champaign, Urbana

  • Mamaev A (2018) Flowers recognition dataset

  • Marlin B, Swersky K, Chen B, Freitas N (2010) Inductive principles for restricted Boltzmann machine learning. J Mach Learn Res Proc Track 9:509–516

    Google Scholar 

  • Martín A, Lara-Cabrera R, Fuentes-Hurtado F, Naranjo V, Camacho D (2018) EvoDeep: a new evolutionary approach for automatic deep neural networks parametrisation. J Parallel Distrib Comput 117:180–191

    Google Scholar 

  • Martinez D, Brewer W, Behm G, Strelzoff A, Wilson A, Wade D (2018) Deep learning evolutionary optimization for regression of rotorcraft vibrational spectra. In: IEEE/ACM machine learning in HPC environments (MLHPC), pp 57–66

  • Martín A, Fuentes-Hurtado F, Naranjo V, Camacho D (2017) Evolving deep neural networks architectures for android malware classification. In: IEEE congress on evolutionary computation (CEC), pp 1659–1666

  • Mattioli F, Caetano D, Cardoso A, Naves E, Lamounier E (2019) An experiment on the use of genetic algorithms for topology selection in deep learning. J Electr Comput Eng 2019:1–12

    Google Scholar 

  • McNicholas W, Levy P (2000) Sleep-related breathing disorders: definitions and measurements. Eur Respir J 15(6):988

    Google Scholar 

  • Miikkulainen R, Liang JZ, Meyerson E, Rawal A, Fink D, Francon O, Raju B, Shahrzad H, Navruzyan A, Duffy N et al (2017) Evolving deep neural networks. corr abs/1703.00548 (2017). arXiv preprintarXiv:1703.00548

  • Miranda V, da Hora Martins J, Palma V (2014) Optimizing large scale problems with metaheuristics in a reduced space mapped by autoencoders-application to the wind-hydro coordination. IEEE Trans Power Syst 29(6):3078–3085

    Google Scholar 

  • Mitchell M, Santorini B, Marcinkiewicz MA, Taylor A (1999) Treebank-3 ldc99t42. Phila Linguist Data Consort 3:2

    Google Scholar 

  • Moriarty DE, Miikkulainen R (1997) Forming neural networks through efficient and adaptive coevolution. Evol Comput 5(4):373–399

    Google Scholar 

  • Munder S, Gavrila DM (2006) An experimental study on pedestrian classification. IEEE Trans Pattern Anal Mach Intell 28(11):1863–1868

    Google Scholar 

  • Muñoz-Ordóñez J, Cobos C, Mendoza M, Herrera-Viedma E, Herrera F, Tabik S (2018) Framework for the training of deep neural networks in TensorFlow using metaheuristics. In: Intelligent data engineering and automated learning—IDEAL 2018, pp 801–811. Springer

  • Nakisa B, Rastgoo MN, Rakotonirainy A, Maire F, Chandran V (2018) Long short term memory hyperparameter optimization for a neural network based emotion recognition framework. IEEE Access 6:49325–49338

    Google Scholar 

  • Nalepa J, Lorenzo PR (2017) Convergence analysis of PSO for hyper-parameter selection in deep neural networks. In: Advances on P2P, parallel, grid, cloud and internet computing, pp 284–295. Springer

  • Negrinho R, Gordon G (2017) Deeparchitect: Automatically designing and training deep architectures. arXiv preprint arXiv:1704.08792

  • Nilsback M-E, Zisserman A (2008) Automated flower classification over a large number of classes. Conference on computer vision, graphics image processing. Madurai, India, pp 722–729

    Google Scholar 

  • Nour M, Slay J (2015) Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In: IEEE military communications and information systems conference (MilCIS), pp 1–6

  • NSL-KDD (2019) Dataset of nsl-kdd

  • Osaba E, Carballedo R, Diaz F, Onieva E, De La Iglesia I, Perallos A (2014) Crossover versus mutation: a comparative analysis of the evolutionary strategy of genetic algorithms applied to combinatorial optimization problems. Sci World J. https://doi.org/10.1155/2014/154676

    Article  Google Scholar 

  • Pandey S, Kaur G (2018) Curious to click it?-Identifying clickbait using deep learning and evolutionary algorithm. In: International conference on advances in computing, communications and informatics (ICACCI), pp 1481–1487

  • Patterson J, Gibson A (2017) Deep learning: a practitioner’s approach. O’Reilly Media, Inc

  • Pavai G, Geetha T (2016) A survey on crossover operators. ACM Comput Surv (CSUR) 49(4):1–43

    Google Scholar 

  • Pham H, Guan MY, Zoph B, Le QV, Dean J (2018) Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268

  • Pouyanfar S, Tao Y, Mohan A, Tian H, Kaseb AS, Gauen K, Dailey R, Aghajanzadeh S, Lu Y-H, Chen S-C, Shyu M-L (2018) Dynamic sampling in convolutional neural networks for imbalanced data classification. In: IEEE international conference on multimedia information processing and retrieval, pp 112–117

  • Proenca H, Filipe S, Santos R, Oliveira J, Alexandre LA (2010) The UBIRIS. v2: a database of visible wavelength iris images captured on-the-move and at-a-distance. IEEE Trans Pattern Anal Mach Intell 32(8):1529

  • Qolomany B, Maabreh M, Al-Fuqaha A, Gupta A, Benhaddou D (2017) Parameters optimization of deep learning models using particle swarm optimization. In: 13th International wireless communications and mobile computing conference (IWCMC), pp 1285–1290

  • Ranzato M, Poultney C, Chopra S, LeCun Y (2006) Efficient learning of sparse representations with an energy-based model. In: Proceedings of advances in neural information processing systems, Vancouver, BC, Canada, pp 1137–1144

  • Rashedi E, Nezamabadi-pour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179(13):2232–2248 (Special section on high order fuzzy sets)

  • Rashid TA, Fattah P, Awla DK (2018) Using accuracy measure for improving the training of LSTM with metaheuristic algorithms. Procedia Comput Sci 140:324–333

    Google Scholar 

  • Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolution for image classifier architecture search. Proc AAAI Conf Artif Intell 33:4780–4789

    Google Scholar 

  • Real E, Moore S, Selle A, Saxena S, Suematsu YL, Tan J, Le QV, Kurakin A (2017) Large-scale evolution of image classifiers. In: Proceedings of machine learning research, PMLR, vol 70. International Convention Centre, Sydney, pp 2902–2911

  • Rechenberg I (1965) Cybernetic solution path of an experimental problem. Library translation 1122, Royal Aircraft Establishment, Farnborough, Hants, UK

  • Rere LMR, Fanany MI, Arymurthy AM (2016) Metaheuristic algorithms for convolution neural network. Comput Intell Neurosci 2016:1–13

    Google Scholar 

  • Rere LR, Fanany MI, Arymurthy AM (2015) Simulated annealing algorithm for deep learning. Procedia Comput Sci 72:137–144

    Google Scholar 

  • Rosa G, Papa J, Costa K, Passos L, Pereira C, Yang X-S (2016) Learning parameters in deep belief networks through firefly algorithm. In: Artificial neural networks in pattern recognition. Springer, pp 138–149

  • Rosa G, Papa J, Marana A, Scheirer W, Cox D (2015) Fine-tuning convolutional neural networks using harmony search. In: Pardo A, Kittler J (eds) Progress in pattern recognition, image analysis, computer vision, and applications. Springer, Cham, pp 683–690

    Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    MATH  Google Scholar 

  • Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252

    MathSciNet  Google Scholar 

  • Sabar NR, Turky A, Song A, Sattar A (2017) Optimising deep belief networks by hyper-heuristic approach. In: IEEE congress on evolutionary computation (CEC), pp 2738–2745

  • Sabar NR, Turky A, Song A, Sattar A (2019) An evolutionary hyper-heuristic to optimise deep belief networks for image reconstruction. Appl Soft Comput 97

    Google Scholar 

  • Salih A, Moshaiov A (2016) Multi-objective neuro-evolution: should the main reproduction mechanism be crossover or mutation? In: IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 004585–004590

  • Saxena A, Goebel K (2008) Turbofan engine degradation simulation data set

  • Schiffmann W (2000) Encoding feedforward networks for topology optimization by simulated evolution. In: Fourth international conference on knowledge-based intelligent engineering systems and allied technologies, KES’2000. Proceedings (Cat. No. 00TH8516), vol 1. IEEE, pp 361–364

  • Sehgal A, La H, Louis S, Nguyen H (2019) Deep reinforcement learning using genetic algorithm for parameter optimization. In: Third IEEE international conference on robotic computing (IRC). IEEE, pp 596–601

  • Semeion (2008) Semeion handwritten digit data set

  • Shi W, Liu D, Cheng X, Li Y, Zhao Y (2019) Particle swarm optimization-based deep neural network for digital modulation recognition. IEEE Access 7:104591–104600

    Google Scholar 

  • Shrestha A, Mahmood A (2019) Review of deep learning algorithms and architectures. IEEE Access 7:53040–53065

    Google Scholar 

  • Silva PH, Luz E, Zanlorensi LA, Menotti D, Moreira G (2018) Multimodal feature level fusion based on particle swarm optimization with deep transfer learning. In: IEEE congress on evolutionary computation (CEC), pp 1–8

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  • Sinha T, Haidar A, Verma B (2018) Particle swarm optimization based approach for finding optimal values of convolutional neural network parameters. In: IEEE congress on evolutionary computation (CEC), pp 1–6

  • Smith WA, Randall RB (2015) Rolling element bearing diagnostics using the case western reserve university data: a benchmark study. Mech Syst Signal Process 64(12):100–131

    Google Scholar 

  • Smolensky P (1986) Chapter 6: Information processing in dynamical systems: foundations of harmony theory. In: Rumelhart DE, McClelland JL, Group PR (eds) Parallel distributed processing: explorations in the microstructure of cognition: foundations, vol 1. MIT Press, Cambridge, pp 194–281

    Google Scholar 

  • Soon FC, Khaw HY, Chuah JH, Kanesan J (2018) Hyper-parameters optimisation of deep CNN architecture for vehicle logo recognition. IET Intell Transp Syst 12(8):939–946

    Google Scholar 

  • Stanley KO, D’Ambrosio DB, Gauci J (2009) A hypercube-based encoding for evolving large-scale neural networks. Artif Life 15(2):185–212

    Google Scholar 

  • Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evol Comput 10(2):99–127

    Google Scholar 

  • Steinholtz OS (2018) A comparative study of black-box optimization algorithms for tuning of hyper-parameters in deep neural networks

  • Storn R, Price K (1995) Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. Technical report, International Computer Science Institute, Berkley

    MATH  Google Scholar 

  • Suganuma M, Shirakawa S, Nagao T (2017) A genetic programming approach to designing convolutional neural network architectures. In: Proceedings of the genetic and evolutionary computation conference, GECCO ’17. Association for Computing Machinery, New York, pp 497–504

  • Sun Y, Xue B, Zhang M, Yen GG (2018) Automatically evolving cnn architectures based on blocks. arXiv preprint arXiv:1810.11875

  • Sun Y, Xue B, Zhang M, Yen GG (2018) An experimental study on hyper-parameter optimization for stacked auto-encoders. In: IEEE congress on evolutionary computation (CEC), pp 1–8

  • Sun Y, Xue B, Zhang M, Yen GG (2019a) Evolving deep convolutional neural networks for image classification. IEEE Trans Evolut Comput 24:394–407

    Google Scholar 

  • Sun Y, Xue B, Zhang M, Yen GG (2019b) A particle swarm optimization-based flexible convolutional autoencoder for image classification. IEEE Trans Neural Netw Learn Syst 30(8):2295–2309

    Google Scholar 

  • Sung K-K, Poggio T (1998) Example-based learning for view-based human face detection. IEEE Trans Pattern Anal Mach Intell 20(1):39–51

    Google Scholar 

  • Syulistyo AR, Purnomo DMJ, Rachmadi MF, Wibowo A (2016) Particle swarm optimization (PSO) for training optimization on convolutional neural networK (CNN). Jurnal Ilmu Komputer dan Informasi 9(1):52

    Google Scholar 

  • Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  • Tanaka T, Moriya T, Shinozaki T, Watanabe S, Hori T, Duh K (2016) Automated structure discovery and parameter tuning of neural network language model based on evolution strategy. In: IEEE spoken language technology workshop (SLT), pp 665–671

  • TCWB (2019) Wind speed and weather-related data at the Penghu station in Taiwan

  • The Cancer Genome Atlas (TCGA) (2006) The cancer genome atlas (TCGA)

  • Tian H, Pouyanfar S, Chen J, Chen S, Iyengar SS (2018) Automatic convolutional neural network selection for image classification using genetic algorithms. In: IEEE international conference on information reuse and integration (IRI), pp 444–451

  • Tian H, Tao Y, Pouyanfar S, Chen S-C, Shyu M-L (2018) Multimodal deep representation learning for video classification. World Wide Web 1:1–17

    Google Scholar 

  • Tian Y, Liu X (2019) A deep adaptive learning method for rolling bearing fault diagnosis using immunity. Tsinghua Sci Technol 24(6):750–762

    Google Scholar 

  • Tian Z, Fong S (2016) Survey of meta-heuristic algorithms for deep learning training. In: Optimization algorithms—methods and applications. InTech, pp 195–220

  • Tirumala SS, Ali S, Ramesh CP (2016) Evolving deep neural networks: a new prospect. In: 12th International conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD), pp 69–74

  • Torralba A, Fergus R, Freeman WT (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11):1958–1970

    Google Scholar 

  • Trivedi A, Srivastava S, Mishra A, Shukla A, Tiwari R (2018) Hybrid evolutionary approach for Devanagari handwritten numeral recognition using convolutional neural network. Procedia Comput Sci 125:525–532

    Google Scholar 

  • VIA/I-ELCAP (2019) Elcap public lung image database

  • Vidnerova P, Neruda R (2017) Evolution strategies for deep neural network models design. In: CEUR workshop proceedings, Proceedings of the 17th conference on information technologies—applications and theory, ITAT 2017, pp 159–166

  • Vito SD, Fattoruso G, Pardo M, Tortorella F, Francia GD (2012) Semi-supervised learning techniques in artificial olfaction: a novel approach to classification problems and drift counteraction. IEEE Sens J 12(11):3215–3224

    Google Scholar 

  • Wade D, Vongpaseuth T, Lugos R, Ayscue J, Wilson A, Antolick L, Brower N, Krick S, Szelistowski M, Albarado K (2015) Machine learning algorithms for hums improvement on rotorcraft components. In: AHS Forum 71, at Virginia Beach, VA

  • Wang B, Sun Y, Xue B, Zhang M (2018) Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification. In: IEEE congress on evolutionary computation (CEC), pp 1–8

  • Wang C, Xu C, Yao X, Tao D (2019) Evolutionary generative adversarial networks. IEEE Trans Evolut Comput 23:921–934

    Google Scholar 

  • Wang Y, Zhang H, Zhang G (2019) cPSO-CNN: an efficient PSO-based algorithm for fine-tuning hyper-parameters of convolutional neural networks. Swarm Evol Comput 49:114–123

    Google Scholar 

  • Wei P, Li Y, Zhang Z, Hu T, Li Z, Liu D (2019) An optimization method for intrusion detection classification model based on deep belief network. IEEE Access 7:87593–87605

    Google Scholar 

  • Wistuba M (2018) Deep learning architecture search by neuro-cell-based evolution with function-preserving mutations. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 243–258

  • Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms

  • Xie L, Yuille A (2017) Genetic CNN. In: Proceedings of the IEEE international conference on computer vision (ICCV)

  • Xie S, Zheng H, Liu C, Lin L (2018) SNAS: stochastic neural architecture search. arXiv e-prints

  • Ye F (2017) Particle swarm optimization-based automatic parameter selection for deep neural networks and its applications in large-scale and high-dimensional data. PLoS ONE 12(12)

    Google Scholar 

  • Yinka-Banjo C, Ugot O-A (2019) A review of generative adversarial networks and its application in cybersecurity. Artif Intell Rev 53:1721–1736

    Google Scholar 

  • Yoo Y (2019) Hyperparameter optimization of deep neural network using univariate dynamic encoding algorithm for searches. Knowl-Based Syst 178:74–83

    Google Scholar 

  • Yu F, Seff A, Zhang Y, Song S, Funkhouser T, Xiao J (2015) LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365

  • Yuliyono AD, Girsang AS (2019) Artificial bee colony-optimized LSTM for bitcoin price prediction. Adv Sci Technol Eng Syst J 4(5):375–383

    Google Scholar 

  • Zavalnyi O, Zhao G, Savchenko Y, Xiao W (2018) Experimental evaluation of metaheuristic optimization of gradients as an alternative to backpropagation. In: IEEE 4th International conference on computer and communications (ICCC), pp 2095–2099

  • Zhang C, Lim P, Qin AK, Tan KC (2017) Multiobjective deep belief networks ensemble for remaining useful life estimation in prognostics. IEEE Trans Neural Netw Learn Syst 28(10):2306–2318

    Google Scholar 

  • Zhang C, Sun JH, Tan KC (2015) Deep belief networks ensemble with multi-objective optimization for failure diagnosis. In: IEEE international conference on systems, man, and cybernetics, pp 32–37

  • Zhong Z, Yan J, Wu W, Shao J, Liu C-L (2018) Practical block-wise neural network architecture generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2423–2432

  • Zhu H, An Z, Yang C, Xu K, Zhao E, Xu Y (2019) EENA: efficient evolution of neural architecture. In: Proceedings of the IEEE international conference on computer vision workshops

  • Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578

  • Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8697–8710

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bahriye Akay.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Akay, B., Karaboga, D. & Akay, R. A comprehensive survey on optimizing deep learning models by metaheuristics. Artif Intell Rev 55, 829–894 (2022). https://doi.org/10.1007/s10462-021-09992-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-09992-0

Keywords

Navigation