Deep neural networks architecture driven by problem-specific information

Urda, Daniel; Veredas, Francisco J.; González-Enrique, Javier; Ruiz-Aguilar, Juan J.; Jerez, Jose M.; Turias, Ignacio J.

doi:10.1007/s00521-021-05702-7

Deep neural networks architecture driven by problem-specific information

Original Article
Published: 29 January 2021

Volume 33, pages 9403–9423, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Daniel Urda ORCID: orcid.org/0000-0003-2662-798X¹,
Francisco J. Veredas⁴,
Javier González-Enrique²,
Juan J. Ruiz-Aguilar³,
Jose M. Jerez⁴ &
…
Ignacio J. Turias²

447 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Deep learning provides a variety of neural network-based models, known as deep neural networks (DNNs), which are being successfully used in several domains to build highly accurate predictors. A key factor which usually makes DNNs to outperform traditional machine learning models is the amount of data that is nowadays accessible and available. Nevertheless, there are other factors linked to DNNs topologies that may also have influence on the predictive performance of DNN models. In particular, fully connected deep neural networks (fc-DNNs) typically struggle in achieving good performance rates when applied to small datasets. This is due to the high number of parameters which need to be learned when training this kind of models, which makes them prone to over-fitting issues. In this paper, authors propose the use of problem-specific information in order to impose constraints to network architecture so that a fc-DNN is transformed into a partially connected DNN (pc-DNN), in such a way that network topology is driven by prior knowledge. This work compares two baseline models, the elastic net and fc-DNNs, to pc-DNNs applied on three synthetic datasets with different number of samples. Synthetic data was generated to estimate the goodness of using problem-specific information to drive network architectures. Furthermore, a similar analysis is performed herein on a real-world problem dataset to show the benefits of pc-DNN models in term of predictive performance. The results of the analysis showed that pc-DNNs with built-in problem-specific information clearly outperformed the elastic net and fc-DNNs in most of the datasets used, in either synthetic or real-world problems. The pc-DNNs turned out to be a useful model, especially when it is applied to small- or medium-size datasets, on which it significantly outperformed the baseline models considered in this study. Specifically, the pc-DNNs achieved AUC and MSE improvement rates of (\(8.21\%\), \(19.79\%\)) and (\(6.65\%\), \(20.54\%\)) in small- and medium-size datasets for both case studies analyzed, the synthetic and real-world problem, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Max-Flow Based Approach for Neural Architecture Search

Architecture Search for Deep Neural Network

Artificial Neural Networks Training Acceleration Through Network Science Strategies

Notes

Code available at: https://github.com/durda-ubu/pcDNNs/blob/main/code/activations.py.
Data available at: https://github.com/durda-ubu/pcDNNs/tree/main/synthetic%20dataset.
Apply for access at: http://www.juntadeandalucia.es/medioambiente/servtc5/WebClima/menu_consultas.jsp?b=s.

References

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. CVPR. IEEE Computer Society, Washington, pp 770–778
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., New York, pp 1097–1105
Google Scholar
Cao Y, Geddes TA, Hwa Yang JY, Yang P (2020) Ensemble deep learning in bioinformatics. Nat Mach Intell 2(9):500–508
Article Google Scholar
Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12(7):878
Article Google Scholar
Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118
Article Google Scholar
Litjens G, Sánchez CI, Timofeeva N, Hermsen M, Nagtegaal I, Kovacs I, Hulsbergen-Van De Kaa C, Bult P, Van Ginneken B, Van Der Laak J (2016) Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci Rep 6(26):286
Google Scholar
Amodei D, Ananthanarayanan S, et al (2016) Deep speech 2 : end-to-end speech recognition in english and mandarin. In: Balcan MF, Weinberger KQ (eds.) Proceedings of The 33rd international conference on machine learning, proceedings of machine learning research. PMLR. vol. 48, pp. 173–182
Shang C, Yang F, Huang D, Lyu W (2014) Data-driven soft sensor development based on deep learning technique. J Process Control 24(3):223–233
Article Google Scholar
Lee D, Kang S, Shin J (2017) Using deep learning techniques to forecast environmental consumption level. Sustain Sci Pract Policy 9(10):1894
Google Scholar
Banko M, Brill E (2001) Scaling to very very large corpora for natural language disambiguation. In: Proceedings of the 39th annual meeting on association for computational linguistics, ACL ’01, pp 26–33
Pereira F, Norvig P, Halevy A (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24:8–12
Google Scholar
Koumakis L (2020) Deep learning models in genomics; are we there yet? Comput Struct Biotechnol J 18:1466–1473
Article Google Scholar
Tobore I, Li J, Yuhang L, Al-Handarish Y, Kandwal A, Nie Z, Wang L (2019) Deep learning intervention for health care challenges: some biomedical domain considerations. JMIR mHealth uHealth 7(8):e11966
Article Google Scholar
Moradi R, Berangi C, Minaei B (2020) A survey of regularization strategies for deep models. Artif Intell Rev 53(6):3947–3985. https://doi.org/10.1007/s10462-019-09784-7
Article Google Scholar
Lu J, Behbood V, Hao P, Zuo H, Xue S, Zhang G (2015) Transfer learning using computational intelligence: a survey. Knowl Based Syst 80:14–23. https://doi.org/10.1016/j.knosys.2015.01.010
Article Google Scholar
Shorten C, Khoshgoftaar TM (2019) A n for deep learning. J Big Data. https://doi.org/10.1186/s40537-019-0197-0
Article Google Scholar
Antoniou A, Storkey A, Edwards H (2018) Data augmentation generative adversarial networks
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Gr Stat 22(2):231–245
Article MathSciNet Google Scholar
Sagi O, Rokach L (2018) Ensemble learning: a survey. WIREs Data Min Knowl Discov 8(4):e1249. https://doi.org/10.1002/widm.1249
Article Google Scholar
Nusrat I, Jang SB (2018) A comparison of regularization techniques in deep neural networks. Symmetry 10(11):648
Article Google Scholar
Ghods A, Cook DJ (2020) A survey of deep network techniques all classifiers can adopt. Data Min Knowl Discov. https://doi.org/10.1007/s10618-020-00722-8
Article Google Scholar
Noh H, You T, Mun J, Han B (2017) Regularizing deep neural networks by noise: its interpretation and optimization. In: Guyon I, Luxburg UV, Bengio , Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5109–5118. Curran Associates, Inc.. https://proceedings.neurips.cc/paper/2017/file/217e342fc01668b10cb1188d40d3370e-Paper.pdf
Khan SH, Hayat M, Porikli F (2019) Regularization of deep neural networks with spectral dropout. Neural Netw 110:82–90. https://doi.org/10.1016/j.neunet.2018.09.009
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks
Moreno-Barea FJ, Strazzera F, Jerez JM, Urda D, Franco L (2018) Forward noise adjustment scheme for data augmentation. In: 2018 IEEE symposium series on computational intelligence (SSCI), pp 728–734
Kingma DP, Welling M (2014) Auto-Encoding Variational Bayes. In: 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings
Li X, Zhang W, Ding Q, Sun JQ (2020) Intelligent rotating machinery fault diagnosis based on deep learning using data augmentation. J Intell Manuf 31:433–452. https://doi.org/10.1007/s10845-018-1456-1
Article Google Scholar
Liu S, Lee K, Lee I (2020) Document-level multi-topic sentiment classification of email data with bilstm and data augmentation. Knowl Based Syst 197(105):918. https://doi.org/10.1016/j.knosys.2020.105918
Article Google Scholar
Pan SJ, Yang Q et al (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Liang G, Zheng L (2020) A transfer learning method with deep residual network for pediatric pneumonia diagnosis. Comput Method Progr Biomed 187(104):964. https://doi.org/10.1016/j.cmpb.2019.06.023
Article Google Scholar
Khan S, Islam N, Jan Z, Ud Din I, Rodrigues JJPC (2019) A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recognit Lett 125:1–6. https://doi.org/10.1016/j.patrec.2019.03.022
Article Google Scholar
Wei W,Meng D, Zhao Q, Xu Z, Wu (2019) emi-supervised transfer learning for image rain removal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67
MathSciNet MATH Google Scholar
Al-Smadi M, Al-Zboon S, Jararweh Y, Juola P (2020) Transfer learning for Arabic named entity recognition with deep neural networks. IEEE Access 8:37736–37745. https://doi.org/10.1109/ACCESS.2020.2973319
Article Google Scholar
López-García G, Jerez JM, Franco L, Veredas FJ (2020) Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data. PLoS One 15(3):e0230536
Article Google Scholar
Pesciullesi G, Schwaller P, Laino T, Reymond JL (2020) Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates. Nat Commun 11:4874. https://doi.org/10.1038/s41467-020-18671-7
Article Google Scholar
Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun 9(1):2383
Article Google Scholar
Nosrati MS, Hamarneh G (2016) Incorporating prior knowledge in medical image segmentation: a survey. CoRR abs/1607.01092. http://arxiv.org/abs/1607.01092
Luque-Baena R, Urda D, Gonzalo Claros M, Franco L, Jerez J (2014) Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords. J Biomed Inform 49:32–44. https://doi.org/10.1016/j.jbi.2014.01.006
Article Google Scholar
Kim Y, Kim Y, Lee S, Yang H, Kim S (2019) Personalized prediction of acquired resistance to EGFR-targeted inhibitors using a pathway-based machine learning approach. Cancers 11(1):45. https://doi.org/10.3390/cancers11010045
Article Google Scholar
Urda D, Aragón F, Bautista R, Franco L, Veredas FJ, Claros MG, Jerez JM (2018) BLASSO: integration of biological knowledge into a regularized linear model. BMC Syst Biol 12(Suppl 5):94
Article Google Scholar
Frecon J, Salzo S, Pontil M (2018) Bilevel learning of the group lasso structure. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31. Curran Associates, Inc., New York, pp 8301–8311
Google Scholar
Tian S, Wang C, Wang B (2019) Incorporating pathway information into feature selection towards better performed gene signatures. BioMed Res Int 2019. https://doi.org/10.1155/2019/2497509
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Nilashi M, Bagherifard K, Rahmani M, Rafe V (2017) A recommender system for tourism industry using cluster ensemble and prediction machine learning techniques. Comput Ind Eng 109:357–368. https://doi.org/10.1016/j.cie.2017.05.016
Article Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320
Article MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 58(1):267–288
MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22
Article Google Scholar
Hassabis D, Kumaran D, Summerfield C, Botvinick M (2017) Neuroscience-inspired artificial intelligence. Neuron 95(2):245–258. https://doi.org/10.1016/j.neuron.2017.06.011
Article Google Scholar
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on international conference on machine learning, ICML’10, pp 807–814
KiseÎák J, Lu Y, Svihra J, Szépe P, Stehlík M, (2020) SPOCU: scaled polynomial constant unit activation function. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05182-1
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR. http://arxiv.org/abs/1502.03167
Chollet F, Allaire J, et al (2017) R interface to keras. https://github.com/rstudio/keras
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. CoRR abs/1412.6980. http://arxiv.org/abs/1412.6980
Eum KD, Kazemiparkouhi F, Wang B, Manjourides J, Pun V, Pavlu V, Suh H (2019) Long-term NO2 exposures and cause-specific mortality in american older adults. Environ Int 124:10–15
Article Google Scholar
Sanyal S, Rochereau T, Maesano CN, Com-Ruelle L, Annesi-Maesano I (2018) Long-Term effect of outdoor air pollution on mortality and morbidity: a 12-year Follow-Up study for metropolitan France. Int J Environ Res Public Health 15(11):2487
Article Google Scholar
Sabolová R, Sečkárová V, Dušek J, Stehlík M (2015) Entropy based statistical inference for methane emissions released from wetland. Chemom Intell Lab Syst 141:125–133. https://doi.org/10.1016/j.chemolab.2014.12.008
Article Google Scholar
Kříž R (2014) Chaos in nitrogen dioxide concentration time series and its prediction. In: Zelinka I, Suganthan PN, Chen G, Snasel V, Abraham A, Rössler O (eds) Nostradamus 2014: prediction, modeling and analysis of complex systems. Springer International Publishing, Cham, pp 365–376
Google Scholar
Liu Y, Tian Y, Chen M (2017) Research on the prediction of carbon emission based on the chaos theory and neural network. Int J Bioautom 21(4):339–348
Google Scholar
Stehlík M, Dusek J, Kiselák J, (2016) Missing chaos in global climate change data interpreting? Ecol Complex 25:53–59. https://doi.org/10.1016/j.ecocom.2015.12.003
Article Google Scholar
Navares R, Aznarte JL (2020) Predicting air quality with deep learning lstm: towards comprehensive models. Ecol Inform 55:101019. https://doi.org/10.1016/j.ecoinf.2019.101019
Article Google Scholar
Izonin I, Greguš ml, M, Tkachenko R, Logoyda M, Mishchuk O, Kynash Y, (2019) Sgd-based wiener polynomial approximation for missing data recovery in air pollution monitoring dataset. In: Rojas I, Joya G, Catala A (eds) Adv Comput Intell. Springer International Publishing, Cham, pp 781–793
Chapter Google Scholar
Wang J, Song G (2018) A deep spatial-temporal ensemble model for air quality prediction. Neurocomputing 314:198–206. https://doi.org/10.1016/j.neucom.2018.06.049
Article Google Scholar
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence. Volume 2, IJCAI’95, pp 1137–1143
AlBadawy EA, Saha A, Mazurowski MA (2018) Deep learning for segmentation of brain tumors: impact of cross-institutional training and testing. Med Phys 45(3):1150–1158. https://doi.org/10.1002/mp.12752
Article Google Scholar
Chui KT, Tsang KF, Chi HR, Ling BWK, Wu CK (2016) An accurate ECG-based transportation safety drowsiness detection scheme. IEEE Trans Ind Inform 12(4):1438–1452. https://doi.org/10.1109/TII.2016.2573259
Article Google Scholar
Bergmeir C, Hyndman RJ, Koo B (2018) A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput Stat Data Anal 120:70–83. https://doi.org/10.1016/j.csda.2017.11.003
Article MathSciNet MATH Google Scholar
Bischl B, Richter J, Bossek J, Horn D, Thomas J, Lang M (2018) mlrMBO: a modular framework for model-based optimization of expensive black-box functions. http://arxiv.org/abs/1703.03373
Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Int Res 11(1):169–198
MATH Google Scholar
Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923
Article Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Lacoste A, Laviolette F, Marchand M (2012) Bayesian comparison of machine learning algorithms on single and multiple datasets. Proc Fifteenth Int Conf Artif Intell Stat 22:665–675
Google Scholar

Download references

Acknowledgements

Authors acknowledge support through grants RTI2018-098160-B-I00 and TIN2017-88728-C2 from the Spanish Ministerio de Ciencia, Innovación y Universidades, which include ERDF funds.

Author information

Authors and Affiliations

Grupo de Inteligencia Computacional Aplicada (GICAP), Departamento de Ingeniería Informática, Escuela Politécnica Superior, Universidad de Burgos, Av. Cantabria s/n, 09006, Burgos, Spain
Daniel Urda
Departamento de Ingeniería Informática, EPS de Algeciras, Universidad de Cádiz, Cádiz, Spain
Javier González-Enrique & Ignacio J. Turias
Departamento de Ingeniería Civil, EPS de Algeciras, Universidad de Cádiz, Cádiz, Spain
Juan J. Ruiz-Aguilar
Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain
Francisco J. Veredas & Jose M. Jerez

Authors

Daniel Urda
View author publications
You can also search for this author in PubMed Google Scholar
Francisco J. Veredas
View author publications
You can also search for this author in PubMed Google Scholar
Javier González-Enrique
View author publications
You can also search for this author in PubMed Google Scholar
Juan J. Ruiz-Aguilar
View author publications
You can also search for this author in PubMed Google Scholar
Jose M. Jerez
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio J. Turias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Urda.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Urda, D., Veredas, F.J., González-Enrique, J. et al. Deep neural networks architecture driven by problem-specific information. Neural Comput & Applic 33, 9403–9423 (2021). https://doi.org/10.1007/s00521-021-05702-7

Download citation

Received: 11 October 2019
Accepted: 05 January 2021
Published: 29 January 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00521-021-05702-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep neural networks architecture driven by problem-specific information

Abstract

Access this article

Similar content being viewed by others

A Max-Flow Based Approach for Neural Architecture Search

Architecture Search for Deep Neural Network

Artificial Neural Networks Training Acceleration Through Network Science Strategies

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep neural networks architecture driven by problem-specific information

Abstract

Access this article

Similar content being viewed by others

A Max-Flow Based Approach for Neural Architecture Search

Architecture Search for Deep Neural Network

Artificial Neural Networks Training Acceleration Through Network Science Strategies

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation