Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges

Ras, Gabriëlle; van Gerven, Marcel; Haselager, Pim

doi:10.1007/978-3-319-98131-4_2

Gabriëlle Ras¹¹,
Marcel van Gerven¹¹ &
Pim Haselager¹¹

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

6087 Accesses
88 Citations

Abstract

Issues regarding explainable AI involve four components: users, laws and regulations, explanations and algorithms. Together these components provide a context in which explanation methods can be evaluated regarding their adequacy. The goal of this chapter is to bridge the gap between expert users and lay users. Different kinds of users are identified and their concerns revealed, relevant statements from the General Data Protection Regulation are analyzed in the context of Deep Neural Networks (DNNs), a taxonomy for the classification of existing explanation methods is introduced, and finally, the various classes of explanation methods are analyzed to verify if user concerns are justified. Overall, it is clear that (visual) explanations can be given about various aspects of the influence of the input on the output. However, it is noted that explanation methods or interfaces for lay users are missing and we speculate which criteria these methods/interfaces should satisfy. Finally it is noted that two important concerns are difficult to address with explanation methods: the concern about bias in datasets that leads to biased DNNs, as well as the suspicion about unfair outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover + eBook: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Adler, P., Falk, C., Friedler, S. A., Rybeck, G., Scheidegger, C., Smith, B., and Venkatasubramanian, S. (2016). Auditing black-box models for indirect influence. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE.
Google Scholar
Ancona, M., Ceolini, E., Oztireli, C., and Gross, M. (2018). Towards better understanding of gradient-based attribution methods for deep neural networks. In 6th International Conference on Learning Representations (ICLR 2018).
Google Scholar
Andrews, R., Diederich, J., and Tickle, A. B. (1995). Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems, 8(6):373–389.
Article Google Scholar
Arbatli, A. D. and Akin, H. L. (1997). Rule extraction from trained neural networks using genetic algorithms. Nonlinear Analysis: Theory, Methods & Applications, 30(3):1639–1648.
Article Google Scholar
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., and Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7).
Article Google Scholar
Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., and Müller, K.-R. (2010). How to explain individual classification decisions. Journal of Machine Learning Research (JMLR), 11:1803–1831.
MathSciNet MATH Google Scholar
Barocas, S. and Selbst, A. D. (2016). Big data’s disparate impact. Cal. L. Rev., 104:671.
Google Scholar
Binder, A., Bach, S., Montavon, G., Müller, K.-R., and Samek, W. (2016). Layer-wise relevance propagation for deep neural network architectures. In Information Science and Applications (ICISA) 2016, pages 913–922. Springer.
Google Scholar
Bojarski, M., Yeres, P., Choromanska, A., Choromanski, K., Firner, B., Jackel, L., and Muller, U. (2017). Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv preprint arXiv:1704.07911.
Google Scholar
Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
Article Google Scholar
Carlini, N. and Wagner, D. (2018). Audio adversarial examples: Targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944.
Google Scholar
Chiappa, S. and Gillam, T. P. (2018). Path-specific counterfactual fairness. arXiv preprint arXiv:1802.08139.
Google Scholar
Craven, M. W. and Shavlik, J. W. (1994). Using sampling and queries to extract rules from trained neural networks. In Machine Learning Proceedings 1994, pages 37–45. Elsevier.
Google Scholar
Cubuk, E. D., Zoph, B., Schoenholz, S. S., and Le, Q. V. (2017). Intriguing properties of adversarial examples. arXiv preprint arXiv:1711.02846.
Google Scholar
Danks, D. and London, A. J. (2017). Algorithmic bias in autonomous systems. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pages 4691–4697. AAAI Press.
Google Scholar
Dong, Y., Su, H., Zhu, J., and Bao, F. (2017a). Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493.
Google Scholar
Dong, Y., Su, H., Zhu, J., and Zhang, B. (2017b). Improving interpretability of deep neural networks with semantic information. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
Google Scholar
Doran, D., Schulz, S., and Besold, T. R. (2017). What does explainable AI really mean? a new conceptualization of perspectives. arXiv preprint arXiv:1710.00794.
Google Scholar
Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
Google Scholar
Doshi-Velez, F., Kortz, M., Budish, R., Bavitz, C., Gershman, S. J., O’Brien, D., Shieber, S., Waldo, J., Weinberger, D., and Wood, A. (2017). Accountability of AI under the law: The role of explanation. SSRN Electronic Journal.
Google Scholar
Elman, J. L. (1989). Representation and structure in connectionist models. Technical report.
Google Scholar
Erhan, D., Bengio, Y., Courville, A., and Vincent, P. (2009). Visualizing higher-layer features of a deep network. University of Montreal, 1341:3.
Google Scholar
Floridi, L., Fresco, N., and Primiero, G. (2015). On malfunctioning software. Synthese, 192(4):1199–1220.
Article Google Scholar
Fong, R. C. and Vedaldi, A. (2017). Interpretable explanations of black boxes by meaningful perturbation. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE.
Google Scholar
Goudet, O., Kalainathan, D., Caillou, P., Lopez-Paz, D., Guyon, I., Sebag, M., Tritas, A., and Tubaro, P. (2017). Learning functional causal models with generative neural networks. arXiv preprint arXiv:1709.05321.
Google Scholar
Grün, F., Rupprecht, C., Navab, N., and Tombari, F. (2016). A taxonomy and library for visualizing learned features in convolutional neural networks. arXiv preprint arXiv:1606.07757.
Google Scholar
Guçlütürk, Y., Güçlü, U., Perez, M., Jair Escalante, H., Baro, X., Guyon, I., Andujar, C., Jacques Junior, J., Madadi, M., Escalera, S., van Gerven, M. A. J., and van Lier, R. (2017). Visualizing apparent personality analysis with deep residual networks. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pages 3101–3109.
Chapter Google Scholar
Gunning, D. (2017). Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA).
Google Scholar
Hall, P., Phan, W., and Ambati, S. (2017). Ideas on interpreting machine learning. Available online at: https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning.
Holzinger, A., Biemann, C., Pattichis, C. S., and Kell, D. B. (2017a). What do we need to build explainable AI systems for the medical domain? arXiv preprint arXiv:1712.09923.
Google Scholar
Holzinger, A., Plass, M., Holzinger, K., Crişan, G. C., Pintea, C.-M., and Palade, V. (2017b). A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop. arXiv preprint arXiv:1708.01104.
Google Scholar
Kindermans, P.-J., Hooker, S., Adebayo, J., Alber, M., Schütt, K. T., Dähne, S., Erhan, D., and Kim, B. (2017). The (un) reliability of saliency methods. arXiv preprint arXiv:1711.00867.
Google Scholar
Kindermans, P.-J., Schütt, K. T., Müller, K.-R., and Dähne, S. (2016). Investigating the influence of noise and distractors on the interpretation of neural networks. arXiv preprint arXiv:1611.07270.
Google Scholar
Koh, P. W. and Liang, P. (2017). Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research (PMLR), pages 1885–1894.
Google Scholar
Lakkaraju, H., Kamar, E., Caruana, R., and Leskovec, J. (2017). Interpretable & explorable approximations of black box models. arXiv preprint arXiv:1707.01154.
Google Scholar
Lee, H., Tajmir, S., Lee, J., Zissen, M., Yeshiwas, B. A., Alkasab, T. K., Choy, G., and Do, S. (2017). Fully automated deep learning system for bone age assessment. Journal of Digital Imaging (JDI), 30(4):427–441.
Article Google Scholar
Li, J., Monroe, W., and Jurafsky, D. (2016). Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220.
Google Scholar
Li, X., Wu, T., Song, X., and Krim, H. (2017). AOGNets: Deep AND-OR grammar networks for visual recognition. arXiv preprint arXiv:1711.05847.
Google Scholar
Lin, Y.-C., Liu, M.-Y., Sun, M., and Huang, J.-B. (2017). Detecting adversarial attacks on neural network policies with visual foresight. arXiv preprint arXiv:1710.00814.
Google Scholar
Lockett, A., Jefferies, T., Etheridge, N., and Brewer, A. White paper tag predictions: How DISCO AI is bringing deep learning to legal technology. Available online at: https://www.csdisco.com/disco-ai.
Louizos, C., Shalit, U., Mooij, J. M., Sontag, D., Zemel, R., and Welling, M. (2017). Causal effect inference with deep latent-variable models. In Advances in Neural Information Processing Systems 30 (NIPS), pages 6446–6456.
Google Scholar
Lu, J., Tokinaga, S., and Ikeda, Y. (2006). Explanatory rule extraction based on the trained neural network and the genetic programming. Journal of the Operations Research Society of Japan (JORSJ), 49(1):66–82.
Article MathSciNet Google Scholar
Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.
Google Scholar
Markowska-Kaczmar, U. and Wnuk-Lipiński, P. (2004). Rule extraction from neural network by genetic algorithm with pareto optimization. Artificial Intelligence and Soft Computing-ICAISC 2004, pages 450–455.
MATH Google Scholar
Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., and Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2).
Article Google Scholar
Montavon, G., Samek, W., and Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73:1–15.
Article MathSciNet Google Scholar
Murdoch, W. J., Liu, P. J., and Yu, B. (2018). Beyond word importance: Contextual decomposition to extract interactions from LSTMs. In International Conference on Learning Representations (ICLR).
Google Scholar
Murdoch, W. J. and Szlam, A. (2017). Automatic rule extraction from long short term memory networks. In International Conference on Learning Representations (ICLR).
Google Scholar
Olah, C., Mordvintsev, A., and Schubert, L. (2017). Feature visualization. Distill. Available online at: https://distill.pub/2017/feature-visualization.
Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., and Mordvintsev, A. (2018). The building blocks of interpretability. Distill. Available online at: https://distill.pub/2018/building-blocks.
Palm, R. B., Paquet, U., and Winther, O. (2017). Recurrent relational networks for complex relational reasoning. arXiv preprint arXiv:1711.08028.
Google Scholar
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., and Swami, A. (2017). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS ’17), pages 506–519.
Google Scholar
Rawat, A., Wistuba, M., and Nicolae, M.-I. (2017). Adversarial phenomenon in the eyes of Bayesian deep learning. arXiv preprint arXiv:1711.08244.
Google Scholar
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016a). Nothing else matters: Model-agnostic explanations by identifying prediction invariance. In NIPS Workshop on Interpretable Machine Learning in Complex Systems.
Google Scholar
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016b). “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), pages 1135–1144.
Chapter Google Scholar
Robnik-Šikonja, M. and Kononenko, I. (2008). Explaining classifications for individual instances. IEEE Transactions on Knowledge and Data Engineering, 20(5):589–600.
Article Google Scholar
Samek, W., Wiegand, T., and Müller, K.-R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296.
Google Scholar
Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., and Lillicrap, T. (2017). A simple neural network module for relational reasoning. arXiv preprint arXiv:1706.01427.
Google Scholar
Seifert, C., Aamir, A., Balagopalan, A., Jain, D., Sharma, A., Grottel, S., and Gumhold, S. (2017). Visualizations of deep neural networks in computer vision: A survey. In Transparent Data Mining for Big and Small Data, pages 123–144. Springer.
Google Scholar
Shrikumar, A., Greenside, P., and Kundaje, A. (2017). Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research (PMLR).
Google Scholar
Simonyan, K., Vedaldi, A., and Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.
Google Scholar
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
Google Scholar
von Neumann, J. and Morgenstern, O. (1953). Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 3rd edition.
Google Scholar
Wachter, S., Mittelstadt, B., and Floridi, L. (2017). Transparent, explainable, and accountable AI for robotics. Science Robotics, 2(6).
Article Google Scholar
Weller, A. (2017). Challenges for transparency. Workshop on Human Interpretability in Machine Learning – ICML 2017.
Google Scholar
Wu, T., Li, X., Song, X., Sun, W., Dong, L., and Li, B. (2017). Interpretable R-CNN. arXiv preprint arXiv:1711.05226.
Google Scholar
Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV), pages 818–833. Springer.
Google Scholar
Zeiler, M. D., Krishnan, D., Taylor, G. W., and Fergus, R. (2010). Deconvolutional networks. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2528–2535. IEEE.
Google Scholar
Zeng, H. (2016). Towards better understanding of deep learning with visualization.
Google Scholar
Zilke, J. R., Mencía, E. L., and Janssen, F. (2016). DeepRED – Rule extraction from deep neural networks. In International Conference on Discovery Science (ICDS), pages 457–473. Springer.
Google Scholar
Zintgraf, L. M., Cohen, T. S., Adel, T., and Welling, M. (2017). Visualizing deep neural network decisions: Prediction difference analysis. In International Conference on Learning Representations (ICLR).
Google Scholar

Download references

Author information

Authors and Affiliations

Radboud University, Nijmegen, The Netherlands
Gabriëlle Ras, Marcel van Gerven & Pim Haselager

Authors

Gabriëlle Ras
View author publications
You can also search for this author in PubMed Google Scholar
Marcel van Gerven
View author publications
You can also search for this author in PubMed Google Scholar
Pim Haselager
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriëlle Ras .

Editor information

Editors and Affiliations

INAOE, Puebla, Mexico
Hugo Jair Escalante
University of Barcelona, Barcelona, Spain
Sergio Escalera
INRIA, Université Paris Sud, Université Paris Saclay, Paris, France
Isabelle Guyon
Open University of Catalonia, Barcelona, Spain
Xavier Baró
Radboud University Nijmegen, Nijmegen, The Netherlands
Yağmur Güçlütürk
Radboud University Nijmegen, Nijmegen, The Netherlands
Umut Güçlü
Radboud University Nijmegen, Nijmegen, The Netherlands
Marcel van Gerven

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ras, G., van Gerven, M., Haselager, P. (2018). Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges. In: Escalante, H., et al. Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-98131-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-98131-4_2
Published: 30 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98130-7
Online ISBN: 978-3-319-98131-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics