Abstract
Issues regarding explainable AI involve four components: users, laws and regulations, explanations and algorithms. Together these components provide a context in which explanation methods can be evaluated regarding their adequacy. The goal of this chapter is to bridge the gap between expert users and lay users. Different kinds of users are identified and their concerns revealed, relevant statements from the General Data Protection Regulation are analyzed in the context of Deep Neural Networks (DNNs), a taxonomy for the classification of existing explanation methods is introduced, and finally, the various classes of explanation methods are analyzed to verify if user concerns are justified. Overall, it is clear that (visual) explanations can be given about various aspects of the influence of the input on the output. However, it is noted that explanation methods or interfaces for lay users are missing and we speculate which criteria these methods/interfaces should satisfy. Finally it is noted that two important concerns are difficult to address with explanation methods: the concern about bias in datasets that leads to biased DNNs, as well as the suspicion about unfair outcomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adler, P., Falk, C., Friedler, S. A., Rybeck, G., Scheidegger, C., Smith, B., and Venkatasubramanian, S. (2016). Auditing black-box models for indirect influence. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE.
Ancona, M., Ceolini, E., Oztireli, C., and Gross, M. (2018). Towards better understanding of gradient-based attribution methods for deep neural networks. In 6th International Conference on Learning Representations (ICLR 2018).
Andrews, R., Diederich, J., and Tickle, A. B. (1995). Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems, 8(6):373ā389.
Arbatli, A. D. and Akin, H. L. (1997). Rule extraction from trained neural networks using genetic algorithms. Nonlinear Analysis: Theory, Methods & Applications, 30(3):1639ā1648.
Bach, S., Binder, A., Montavon, G., Klauschen, F., MĆ¼ller, K.-R., and Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7).
Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., and MĆ¼ller, K.-R. (2010). How to explain individual classification decisions. Journal of Machine Learning Research (JMLR), 11:1803ā1831.
Barocas, S. and Selbst, A. D. (2016). Big dataās disparate impact. Cal. L. Rev., 104:671.
Binder, A., Bach, S., Montavon, G., MĆ¼ller, K.-R., and Samek, W. (2016). Layer-wise relevance propagation for deep neural network architectures. In Information Science and Applications (ICISA) 2016, pages 913ā922. Springer.
Bojarski, M., Yeres, P., Choromanska, A., Choromanski, K., Firner, B., Jackel, L., and Muller, U. (2017). Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv preprint arXiv:1704.07911.
Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183ā186.
Carlini, N. and Wagner, D. (2018). Audio adversarial examples: Targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944.
Chiappa, S. and Gillam, T. P. (2018). Path-specific counterfactual fairness. arXiv preprint arXiv:1802.08139.
Craven, M. W. and Shavlik, J. W. (1994). Using sampling and queries to extract rules from trained neural networks. In Machine Learning Proceedings 1994, pages 37ā45. Elsevier.
Cubuk, E. D., Zoph, B., Schoenholz, S. S., and Le, Q. V. (2017). Intriguing properties of adversarial examples. arXiv preprint arXiv:1711.02846.
Danks, D. and London, A. J. (2017). Algorithmic bias in autonomous systems. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pages 4691ā4697. AAAI Press.
Dong, Y., Su, H., Zhu, J., and Bao, F. (2017a). Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493.
Dong, Y., Su, H., Zhu, J., and Zhang, B. (2017b). Improving interpretability of deep neural networks with semantic information. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
Doran, D., Schulz, S., and Besold, T. R. (2017). What does explainable AI really mean? a new conceptualization of perspectives. arXiv preprint arXiv:1710.00794.
Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
Doshi-Velez, F., Kortz, M., Budish, R., Bavitz, C., Gershman, S. J., OāBrien, D., Shieber, S., Waldo, J., Weinberger, D., and Wood, A. (2017). Accountability of AI under the law: The role of explanation. SSRN Electronic Journal.
Elman, J. L. (1989). Representation and structure in connectionist models. Technical report.
Erhan, D., Bengio, Y., Courville, A., and Vincent, P. (2009). Visualizing higher-layer features of a deep network. University of Montreal, 1341:3.
Floridi, L., Fresco, N., and Primiero, G. (2015). On malfunctioning software. Synthese, 192(4):1199ā1220.
Fong, R. C. and Vedaldi, A. (2017). Interpretable explanations of black boxes by meaningful perturbation. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE.
Goudet, O., Kalainathan, D., Caillou, P., Lopez-Paz, D., Guyon, I., Sebag, M., Tritas, A., and Tubaro, P. (2017). Learning functional causal models with generative neural networks. arXiv preprint arXiv:1709.05321.
GrĆ¼n, F., Rupprecht, C., Navab, N., and Tombari, F. (2016). A taxonomy and library for visualizing learned features in convolutional neural networks. arXiv preprint arXiv:1606.07757.
GuƧlĆ¼tĆ¼rk, Y., GĆ¼Ć§lĆ¼, U., Perez, M., Jair Escalante, H., Baro, X., Guyon, I., Andujar, C., Jacques Junior, J., Madadi, M., Escalera, S., van Gerven, M. A. J., and van Lier, R. (2017). Visualizing apparent personality analysis with deep residual networks. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pages 3101ā3109.
Gunning, D. (2017). Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA).
Hall, P., Phan, W., and Ambati, S. (2017). Ideas on interpreting machine learning. Available online at: https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning.
Holzinger, A., Biemann, C., Pattichis, C. S., and Kell, D. B. (2017a). What do we need to build explainable AI systems for the medical domain? arXiv preprint arXiv:1712.09923.
Holzinger, A., Plass, M., Holzinger, K., CriÅan, G. C., Pintea, C.-M., and Palade, V. (2017b). A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop. arXiv preprint arXiv:1708.01104.
Kindermans, P.-J., Hooker, S., Adebayo, J., Alber, M., SchĆ¼tt, K. T., DƤhne, S., Erhan, D., and Kim, B. (2017). The (un) reliability of saliency methods. arXiv preprint arXiv:1711.00867.
Kindermans, P.-J., SchĆ¼tt, K. T., MĆ¼ller, K.-R., and DƤhne, S. (2016). Investigating the influence of noise and distractors on the interpretation of neural networks. arXiv preprint arXiv:1611.07270.
Koh, P. W. and Liang, P. (2017). Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research (PMLR), pages 1885ā1894.
Lakkaraju, H., Kamar, E., Caruana, R., and Leskovec, J. (2017). Interpretable & explorable approximations of black box models. arXiv preprint arXiv:1707.01154.
Lee, H., Tajmir, S., Lee, J., Zissen, M., Yeshiwas, B. A., Alkasab, T. K., Choy, G., and Do, S. (2017). Fully automated deep learning system for bone age assessment. Journal of Digital Imaging (JDI), 30(4):427ā441.
Li, J., Monroe, W., and Jurafsky, D. (2016). Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220.
Li, X., Wu, T., Song, X., and Krim, H. (2017). AOGNets: Deep AND-OR grammar networks for visual recognition. arXiv preprint arXiv:1711.05847.
Lin, Y.-C., Liu, M.-Y., Sun, M., and Huang, J.-B. (2017). Detecting adversarial attacks on neural network policies with visual foresight. arXiv preprint arXiv:1710.00814.
Lockett, A., Jefferies, T., Etheridge, N., and Brewer, A. White paper tag predictions: How DISCO AI is bringing deep learning to legal technology. Available online at: https://www.csdisco.com/disco-ai.
Louizos, C., Shalit, U., Mooij, J. M., Sontag, D., Zemel, R., and Welling, M. (2017). Causal effect inference with deep latent-variable models. In Advances in Neural Information Processing Systems 30 (NIPS), pages 6446ā6456.
Lu, J., Tokinaga, S., and Ikeda, Y. (2006). Explanatory rule extraction based on the trained neural network and the genetic programming. Journal of the Operations Research Society of Japan (JORSJ), 49(1):66ā82.
Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.
Markowska-Kaczmar, U. and Wnuk-LipiÅski, P. (2004). Rule extraction from neural network by genetic algorithm with pareto optimization. Artificial Intelligence and Soft Computing-ICAISC 2004, pages 450ā455.
Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., and Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2).
Montavon, G., Samek, W., and MĆ¼ller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73:1ā15.
Murdoch, W. J., Liu, P. J., and Yu, B. (2018). Beyond word importance: Contextual decomposition to extract interactions from LSTMs. In International Conference on Learning Representations (ICLR).
Murdoch, W. J. and Szlam, A. (2017). Automatic rule extraction from long short term memory networks. In International Conference on Learning Representations (ICLR).
Olah, C., Mordvintsev, A., and Schubert, L. (2017). Feature visualization. Distill. Available online at: https://distill.pub/2017/feature-visualization.
Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., and Mordvintsev, A. (2018). The building blocks of interpretability. Distill. Available online at: https://distill.pub/2018/building-blocks.
Palm, R. B., Paquet, U., and Winther, O. (2017). Recurrent relational networks for complex relational reasoning. arXiv preprint arXiv:1711.08028.
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., and Swami, A. (2017). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS ā17), pages 506ā519.
Rawat, A., Wistuba, M., and Nicolae, M.-I. (2017). Adversarial phenomenon in the eyes of Bayesian deep learning. arXiv preprint arXiv:1711.08244.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016a). Nothing else matters: Model-agnostic explanations by identifying prediction invariance. In NIPS Workshop on Interpretable Machine Learning in Complex Systems.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016b). āWhy should I trust you?ā Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ā16), pages 1135ā1144.
Robnik-Å ikonja, M. and Kononenko, I. (2008). Explaining classifications for individual instances. IEEE Transactions on Knowledge and Data Engineering, 20(5):589ā600.
Samek, W., Wiegand, T., and MĆ¼ller, K.-R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296.
Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., and Lillicrap, T. (2017). A simple neural network module for relational reasoning. arXiv preprint arXiv:1706.01427.
Seifert, C., Aamir, A., Balagopalan, A., Jain, D., Sharma, A., Grottel, S., and Gumhold, S. (2017). Visualizations of deep neural networks in computer vision: A survey. In Transparent Data Mining for Big and Small Data, pages 123ā144. Springer.
Shrikumar, A., Greenside, P., and Kundaje, A. (2017). Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research (PMLR).
Simonyan, K., Vedaldi, A., and Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
von Neumann, J. and Morgenstern, O. (1953). Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 3rd edition.
Wachter, S., Mittelstadt, B., and Floridi, L. (2017). Transparent, explainable, and accountable AI for robotics. Science Robotics, 2(6).
Weller, A. (2017). Challenges for transparency. Workshop on Human Interpretability in Machine Learning ā ICML 2017.
Wu, T., Li, X., Song, X., Sun, W., Dong, L., and Li, B. (2017). Interpretable R-CNN. arXiv preprint arXiv:1711.05226.
Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV), pages 818ā833. Springer.
Zeiler, M. D., Krishnan, D., Taylor, G. W., and Fergus, R. (2010). Deconvolutional networks. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2528ā2535. IEEE.
Zeng, H. (2016). Towards better understanding of deep learning with visualization.
Zilke, J. R., MencĆa, E. L., and Janssen, F. (2016). DeepRED ā Rule extraction from deep neural networks. In International Conference on Discovery Science (ICDS), pages 457ā473. Springer.
Zintgraf, L. M., Cohen, T. S., Adel, T., and Welling, M. (2017). Visualizing deep neural network decisions: Prediction difference analysis. In International Conference on Learning Representations (ICLR).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Ras, G., van Gerven, M., Haselager, P. (2018). Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges. In: Escalante, H., et al. Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-98131-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-98131-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98130-7
Online ISBN: 978-3-319-98131-4
eBook Packages: Computer ScienceComputer Science (R0)