Abstract
One of the comments on AI has been that we don’t really understand what’s going inside the black box of an AI model. What are all the hidden neurons doing when a CNN is recognizing a cat? How is an AI model able to generalize to unseen examples? Another persistent comment has been that we don’t really know what’s happening during the training process. What is the nature of this optimization landscape? Why doesn’t the training get stuck in a local minimum?
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Antonio Auffinger, Gérard Ben Arous, and Jiří Černý “Random Matrices and Complexity of Spin Glasses”. In: Communications on Pure and Applied Mathematics 66.2 (2013), pp. 165–201. doi: https://doi.org/10.1002/cpa.21422.
A. Choromanska et al. “The Loss Surfaces of Multilayer Networks”. In: Journal of Machine Learning Research 38.8 (2015), pp. 192–204.
Y. Dauphin et al. “Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-convex Optimization”. In: Proceedings of Neural Information Processing Systems. 2014.
David S. Dean and Satya N. Majumdar. “Large Deviations of Extreme Eigenvalues of Random Matrices”. In: Phys. Rev. Lett. 97 16 (2006), pp. 160–201. doi: https://doi.org/10.1103/PhysRevLett.97.160201.
Jean-Pierre Dedieu and Gregorio Malajovich. “On the Number of Minima of a Random Polynomial”. In: Journal of Complexity 24.2 (2008), pp. 89–108. issn: 0885-064X. doi: https://doi.org/10.1016/j.jco.2007.09.003.
Timur Garipov et al. “Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs”. In: NeurIPS. 2018.
Ian J. Goodfellow and Oriol Vinyals. “Qualitatively Characterizing Neural Network Optimization Problems”. In: ICLR. 2015.
Geoffrey E. Hinton, James L. McClelland, and David E. Rumelhart. “Distributed Representations”. In: The Philosophy of Artificial Intelligence. 1990.
Scott Kirkpatrick and Bart Selman. “Critical Behavior in the Satisfiability of Random Boolean Expressions”. In: Science 264 (1994), pp. 1297–1301.
Yann LeCun et al. “Efficient BackProp”. In: Neural Networks: Tricks of the Trade. 1998.
Chunyuan Li et al. “Measuring the Intrinsic Dimension of Objective Landscapes”. In: ICLR. 2018.
Hao Li et al. “Visualizing the Loss Landscape of Neural Nets”. In: Neural Information Processing Systems. 2018.
Andrew Lucas. “Ising Formulations of Many NP Problems”. In: Frontiers in Physics 2 (2014), p. 5. doi: 10.3389/fphy.2014.00005.
Guido F Montufar et al. “On the Number of Linear Regions of Deep Neural Networks”. In: Advances in Neural Information Processing Systems 27. Curran Associates, Inc., 2014, pp. 2924–2932.
Maithra Raghu et al. “On the Expressive Power of Deep Neural Networks”. In: Proceedings of the 34th International Conference on Machine Learning. Vol. 70. Proceedings of Machine Learning Research. 2017, pp. 2847–2854.
Richard P. Stanley. “An Introduction to Hyperplane Arrangements”. In: IAS/Park City Mathematics Series. 2006.
Liwen Zhang, Gregory Naitzat, and Lek-Heng Lim. “Tropical Geometry of Deep Neural Networks”. In: Proceedings of the 35th International Conference on Machine Learning. Vol. 80. Proceedings of Machine Learning Research. 2018, pp. 5824–5832.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Dube, S. (2021). Why AI Works. In: An Intuitive Exploration of Artificial Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-68624-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-68624-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68623-9
Online ISBN: 978-3-030-68624-6
eBook Packages: Computer ScienceComputer Science (R0)