Skip to main content

Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges

  • Chapter
  • First Online:
Explainable and Interpretable Models in Computer Vision and Machine Learning

Abstract

Issues regarding explainable AI involve four components: users, laws and regulations, explanations and algorithms. Together these components provide a context in which explanation methods can be evaluated regarding their adequacy. The goal of this chapter is to bridge the gap between expert users and lay users. Different kinds of users are identified and their concerns revealed, relevant statements from the General Data Protection Regulation are analyzed in the context of Deep Neural Networks (DNNs), a taxonomy for the classification of existing explanation methods is introduced, and finally, the various classes of explanation methods are analyzed to verify if user concerns are justified. Overall, it is clear that (visual) explanations can be given about various aspects of the influence of the input on the output. However, it is noted that explanation methods or interfaces for lay users are missing and we speculate which criteria these methods/interfaces should satisfy. Finally it is noted that two important concerns are difficult to address with explanation methods: the concern about bias in datasets that leads to biased DNNs, as well as the suspicion about unfair outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover + eBook
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Available as EPUB and PDF

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.eugdpr.org

  2. 2.

    https://bijvoorbaatverdacht.nl

References

  • Adler, P., Falk, C., Friedler, S. A., Rybeck, G., Scheidegger, C., Smith, B., and Venkatasubramanian, S. (2016). Auditing black-box models for indirect influence. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE.

    Google ScholarĀ 

  • Ancona, M., Ceolini, E., Oztireli, C., and Gross, M. (2018). Towards better understanding of gradient-based attribution methods for deep neural networks. In 6th International Conference on Learning Representations (ICLR 2018).

    Google ScholarĀ 

  • Andrews, R., Diederich, J., and Tickle, A. B. (1995). Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-Based Systems, 8(6):373ā€“389.

    ArticleĀ  Google ScholarĀ 

  • Arbatli, A. D. and Akin, H. L. (1997). Rule extraction from trained neural networks using genetic algorithms. Nonlinear Analysis: Theory, Methods & Applications, 30(3):1639ā€“1648.

    ArticleĀ  Google ScholarĀ 

  • Bach, S., Binder, A., Montavon, G., Klauschen, F., MĆ¼ller, K.-R., and Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7).

    ArticleĀ  Google ScholarĀ 

  • Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., and MĆ¼ller, K.-R. (2010). How to explain individual classification decisions. Journal of Machine Learning Research (JMLR), 11:1803ā€“1831.

    MathSciNetĀ  MATHĀ  Google ScholarĀ 

  • Barocas, S. and Selbst, A. D. (2016). Big dataā€™s disparate impact. Cal. L. Rev., 104:671.

    Google ScholarĀ 

  • Binder, A., Bach, S., Montavon, G., MĆ¼ller, K.-R., and Samek, W. (2016). Layer-wise relevance propagation for deep neural network architectures. In Information Science and Applications (ICISA) 2016, pages 913ā€“922. Springer.

    Google ScholarĀ 

  • Bojarski, M., Yeres, P., Choromanska, A., Choromanski, K., Firner, B., Jackel, L., and Muller, U. (2017). Explaining how a deep neural network trained with end-to-end learning steers a car. arXiv preprint arXiv:1704.07911.

    Google ScholarĀ 

  • Caliskan, A., Bryson, J. J., and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183ā€“186.

    ArticleĀ  Google ScholarĀ 

  • Carlini, N. and Wagner, D. (2018). Audio adversarial examples: Targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944.

    Google ScholarĀ 

  • Chiappa, S. and Gillam, T. P. (2018). Path-specific counterfactual fairness. arXiv preprint arXiv:1802.08139.

    Google ScholarĀ 

  • Craven, M. W. and Shavlik, J. W. (1994). Using sampling and queries to extract rules from trained neural networks. In Machine Learning Proceedings 1994, pages 37ā€“45. Elsevier.

    Google ScholarĀ 

  • Cubuk, E. D., Zoph, B., Schoenholz, S. S., and Le, Q. V. (2017). Intriguing properties of adversarial examples. arXiv preprint arXiv:1711.02846.

    Google ScholarĀ 

  • Danks, D. and London, A. J. (2017). Algorithmic bias in autonomous systems. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pages 4691ā€“4697. AAAI Press.

    Google ScholarĀ 

  • Dong, Y., Su, H., Zhu, J., and Bao, F. (2017a). Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493.

    Google ScholarĀ 

  • Dong, Y., Su, H., Zhu, J., and Zhang, B. (2017b). Improving interpretability of deep neural networks with semantic information. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

    Google ScholarĀ 

  • Doran, D., Schulz, S., and Besold, T. R. (2017). What does explainable AI really mean? a new conceptualization of perspectives. arXiv preprint arXiv:1710.00794.

    Google ScholarĀ 

  • Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

    Google ScholarĀ 

  • Doshi-Velez, F., Kortz, M., Budish, R., Bavitz, C., Gershman, S. J., Oā€™Brien, D., Shieber, S., Waldo, J., Weinberger, D., and Wood, A. (2017). Accountability of AI under the law: The role of explanation. SSRN Electronic Journal.

    Google ScholarĀ 

  • Elman, J. L. (1989). Representation and structure in connectionist models. Technical report.

    Google ScholarĀ 

  • Erhan, D., Bengio, Y., Courville, A., and Vincent, P. (2009). Visualizing higher-layer features of a deep network. University of Montreal, 1341:3.

    Google ScholarĀ 

  • Floridi, L., Fresco, N., and Primiero, G. (2015). On malfunctioning software. Synthese, 192(4):1199ā€“1220.

    ArticleĀ  Google ScholarĀ 

  • Fong, R. C. and Vedaldi, A. (2017). Interpretable explanations of black boxes by meaningful perturbation. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE.

    Google ScholarĀ 

  • Goudet, O., Kalainathan, D., Caillou, P., Lopez-Paz, D., Guyon, I., Sebag, M., Tritas, A., and Tubaro, P. (2017). Learning functional causal models with generative neural networks. arXiv preprint arXiv:1709.05321.

    Google ScholarĀ 

  • GrĆ¼n, F., Rupprecht, C., Navab, N., and Tombari, F. (2016). A taxonomy and library for visualizing learned features in convolutional neural networks. arXiv preprint arXiv:1606.07757.

    Google ScholarĀ 

  • GuƧlĆ¼tĆ¼rk, Y., GĆ¼Ć§lĆ¼, U., Perez, M., Jair Escalante, H., Baro, X., Guyon, I., Andujar, C., Jacques Junior, J., Madadi, M., Escalera, S., van Gerven, M. A. J., and van Lier, R. (2017). Visualizing apparent personality analysis with deep residual networks. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pages 3101ā€“3109.

    ChapterĀ  Google ScholarĀ 

  • Gunning, D. (2017). Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA).

    Google ScholarĀ 

  • Hall, P., Phan, W., and Ambati, S. (2017). Ideas on interpreting machine learning. Available online at: https://www.oreilly.com/ideas/ideas-on-interpreting-machine-learning.

  • Holzinger, A., Biemann, C., Pattichis, C. S., and Kell, D. B. (2017a). What do we need to build explainable AI systems for the medical domain? arXiv preprint arXiv:1712.09923.

    Google ScholarĀ 

  • Holzinger, A., Plass, M., Holzinger, K., Crişan, G. C., Pintea, C.-M., and Palade, V. (2017b). A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop. arXiv preprint arXiv:1708.01104.

    Google ScholarĀ 

  • Kindermans, P.-J., Hooker, S., Adebayo, J., Alber, M., SchĆ¼tt, K. T., DƤhne, S., Erhan, D., and Kim, B. (2017). The (un) reliability of saliency methods. arXiv preprint arXiv:1711.00867.

    Google ScholarĀ 

  • Kindermans, P.-J., SchĆ¼tt, K. T., MĆ¼ller, K.-R., and DƤhne, S. (2016). Investigating the influence of noise and distractors on the interpretation of neural networks. arXiv preprint arXiv:1611.07270.

    Google ScholarĀ 

  • Koh, P. W. and Liang, P. (2017). Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research (PMLR), pages 1885ā€“1894.

    Google ScholarĀ 

  • Lakkaraju, H., Kamar, E., Caruana, R., and Leskovec, J. (2017). Interpretable & explorable approximations of black box models. arXiv preprint arXiv:1707.01154.

    Google ScholarĀ 

  • Lee, H., Tajmir, S., Lee, J., Zissen, M., Yeshiwas, B. A., Alkasab, T. K., Choy, G., and Do, S. (2017). Fully automated deep learning system for bone age assessment. Journal of Digital Imaging (JDI), 30(4):427ā€“441.

    ArticleĀ  Google ScholarĀ 

  • Li, J., Monroe, W., and Jurafsky, D. (2016). Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220.

    Google ScholarĀ 

  • Li, X., Wu, T., Song, X., and Krim, H. (2017). AOGNets: Deep AND-OR grammar networks for visual recognition. arXiv preprint arXiv:1711.05847.

    Google ScholarĀ 

  • Lin, Y.-C., Liu, M.-Y., Sun, M., and Huang, J.-B. (2017). Detecting adversarial attacks on neural network policies with visual foresight. arXiv preprint arXiv:1710.00814.

    Google ScholarĀ 

  • Lockett, A., Jefferies, T., Etheridge, N., and Brewer, A. White paper tag predictions: How DISCO AI is bringing deep learning to legal technology. Available online at: https://www.csdisco.com/disco-ai.

  • Louizos, C., Shalit, U., Mooij, J. M., Sontag, D., Zemel, R., and Welling, M. (2017). Causal effect inference with deep latent-variable models. In Advances in Neural Information Processing Systems 30 (NIPS), pages 6446ā€“6456.

    Google ScholarĀ 

  • Lu, J., Tokinaga, S., and Ikeda, Y. (2006). Explanatory rule extraction based on the trained neural network and the genetic programming. Journal of the Operations Research Society of Japan (JORSJ), 49(1):66ā€“82.

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  • Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.

    Google ScholarĀ 

  • Markowska-Kaczmar, U. and Wnuk-Lipiński, P. (2004). Rule extraction from neural network by genetic algorithm with pareto optimization. Artificial Intelligence and Soft Computing-ICAISC 2004, pages 450ā€“455.

    MATHĀ  Google ScholarĀ 

  • Mittelstadt, B. D., Allo, P., Taddeo, M., Wachter, S., and Floridi, L. (2016). The ethics of algorithms: Mapping the debate. Big Data & Society, 3(2).

    ArticleĀ  Google ScholarĀ 

  • Montavon, G., Samek, W., and MĆ¼ller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73:1ā€“15.

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  • Murdoch, W. J., Liu, P. J., and Yu, B. (2018). Beyond word importance: Contextual decomposition to extract interactions from LSTMs. In International Conference on Learning Representations (ICLR).

    Google ScholarĀ 

  • Murdoch, W. J. and Szlam, A. (2017). Automatic rule extraction from long short term memory networks. In International Conference on Learning Representations (ICLR).

    Google ScholarĀ 

  • Olah, C., Mordvintsev, A., and Schubert, L. (2017). Feature visualization. Distill. Available online at: https://distill.pub/2017/feature-visualization.

  • Olah, C., Satyanarayan, A., Johnson, I., Carter, S., Schubert, L., Ye, K., and Mordvintsev, A. (2018). The building blocks of interpretability. Distill. Available online at: https://distill.pub/2018/building-blocks.

  • Palm, R. B., Paquet, U., and Winther, O. (2017). Recurrent relational networks for complex relational reasoning. arXiv preprint arXiv:1711.08028.

    Google ScholarĀ 

  • Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., and Swami, A. (2017). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ASIA CCS ā€™17), pages 506ā€“519.

    Google ScholarĀ 

  • Rawat, A., Wistuba, M., and Nicolae, M.-I. (2017). Adversarial phenomenon in the eyes of Bayesian deep learning. arXiv preprint arXiv:1711.08244.

    Google ScholarĀ 

  • Ribeiro, M. T., Singh, S., and Guestrin, C. (2016a). Nothing else matters: Model-agnostic explanations by identifying prediction invariance. In NIPS Workshop on Interpretable Machine Learning in Complex Systems.

    Google ScholarĀ 

  • Ribeiro, M. T., Singh, S., and Guestrin, C. (2016b). ā€œWhy should I trust you?ā€ Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ā€™16), pages 1135ā€“1144.

    ChapterĀ  Google ScholarĀ 

  • Robnik-Å ikonja, M. and Kononenko, I. (2008). Explaining classifications for individual instances. IEEE Transactions on Knowledge and Data Engineering, 20(5):589ā€“600.

    ArticleĀ  Google ScholarĀ 

  • Samek, W., Wiegand, T., and MĆ¼ller, K.-R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296.

    Google ScholarĀ 

  • Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., and Lillicrap, T. (2017). A simple neural network module for relational reasoning. arXiv preprint arXiv:1706.01427.

    Google ScholarĀ 

  • Seifert, C., Aamir, A., Balagopalan, A., Jain, D., Sharma, A., Grottel, S., and Gumhold, S. (2017). Visualizations of deep neural networks in computer vision: A survey. In Transparent Data Mining for Big and Small Data, pages 123ā€“144. Springer.

    Google ScholarĀ 

  • Shrikumar, A., Greenside, P., and Kundaje, A. (2017). Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research (PMLR).

    Google ScholarĀ 

  • Simonyan, K., Vedaldi, A., and Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.

    Google ScholarĀ 

  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.

    Google ScholarĀ 

  • von Neumann, J. and Morgenstern, O. (1953). Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, 3rd edition.

    Google ScholarĀ 

  • Wachter, S., Mittelstadt, B., and Floridi, L. (2017). Transparent, explainable, and accountable AI for robotics. Science Robotics, 2(6).

    ArticleĀ  Google ScholarĀ 

  • Weller, A. (2017). Challenges for transparency. Workshop on Human Interpretability in Machine Learning ā€“ ICML 2017.

    Google ScholarĀ 

  • Wu, T., Li, X., Song, X., Sun, W., Dong, L., and Li, B. (2017). Interpretable R-CNN. arXiv preprint arXiv:1711.05226.

    Google ScholarĀ 

  • Zeiler, M. D. and Fergus, R. (2014). Visualizing and understanding convolutional networks. In European Conference on Computer Vision (ECCV), pages 818ā€“833. Springer.

    Google ScholarĀ 

  • Zeiler, M. D., Krishnan, D., Taylor, G. W., and Fergus, R. (2010). Deconvolutional networks. In 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2528ā€“2535. IEEE.

    Google ScholarĀ 

  • Zeng, H. (2016). Towards better understanding of deep learning with visualization.

    Google ScholarĀ 

  • Zilke, J. R., MencĆ­a, E. L., and Janssen, F. (2016). DeepRED ā€“ Rule extraction from deep neural networks. In International Conference on Discovery Science (ICDS), pages 457ā€“473. Springer.

    Google ScholarĀ 

  • Zintgraf, L. M., Cohen, T. S., Adel, T., and Welling, M. (2017). Visualizing deep neural network decisions: Prediction difference analysis. In International Conference on Learning Representations (ICLR).

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriƫlle Ras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ras, G., van Gerven, M., Haselager, P. (2018). Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges. In: Escalante, H., et al. Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-98131-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98131-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98130-7

  • Online ISBN: 978-3-319-98131-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics