skip to main content
10.1145/3351095.3375624acmconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open Access

Explainable machine learning in deployment

Published:27 January 2020Publication History

ABSTRACT

Explainable machine learning offers the potential to provide stakeholders with insights into model behavior by using various methods such as feature importance scores, counterfactual explanations, or influential training data. Yet there is little understanding of how organizations use these methods in practice. This study explores how organizations view and use explainability for stakeholder consumption. We find that, currently, the majority of deployments are not for end users affected by the model but rather for machine learning engineers, who use explainability to debug the model itself. There is thus a gap between explainability in practice and the goal of transparency, since explanations primarily serve internal stakeholders rather than external ones. Our study synthesizes the limitations of current explainability techniques that hamper their use for end users. To facilitate end user interaction, we develop a framework for establishing clear goals for explainability. We end by discussing concerns raised regarding explainability.

References

  1. 2019. IBM'S Principles for Data Trust and Transparency. https://www.ibm.com/blogs/policy/trust-principles/Google ScholarGoogle Scholar
  2. 2019. Our approach: Microsoft AI principles. https://www.microsoft.com/en-us/ai/our-approach-to-aiGoogle ScholarGoogle Scholar
  3. Tameem Adel, Zoubin Ghahramani, and Adrian Weller. 2018. Discovering interpretable representations for both deep generative and discriminative models. In International Conference on Machine Learning. 50--59.Google ScholarGoogle Scholar
  4. Sarah Adel Bargal, Andrea Zunino, Donghyun Kim, Jianming Zhang, Vittorio Murino, and Stan Sclaroff. 2018. Excitation backprop for RNNs. In 'Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition'. 1440--1449.Google ScholarGoogle Scholar
  5. Oscar Alvarado and Annika Waern. 2018. Towards algorithmic experience: Initial efforts for social media contexts. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016).Google ScholarGoogle Scholar
  7. Marco Ancona, Enea Ceolini, Cengiz Oztireli, and Markus Gross. 2018. Towards better understanding of gradient-based attribution methods for Deep Neural Networks. In 6th International Conference on Learning Representations (ICLR 2018).Google ScholarGoogle Scholar
  8. Marco Ancona, Cengiz Oztireli, and Markus Gross. 2019. Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 272--281.Google ScholarGoogle Scholar
  9. David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert MÞller. 2010. How to explain individual classification decisions. Journal of Machine Learning Research 11, Jun (2010), 1803--1831.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rajiv Khanna Been Kim and Sanmi Koyejo. 2016. Examples are not Enough, Learn to Criticize! Criticism for Interpretability. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  11. Umang Bhatt, Pradeep Ravikumar, and José M. F. Moura. 2019. Towards Aggregating Weighted Feature Attributions. abs/1901.10040 (2019).Google ScholarGoogle Scholar
  12. Miles Brundage, Shahar Avin, Jack Clark, Helen Toner, Peter Eckersley, Ben Garfinkel, Allan Dafoe, Paul Scharre, Thomas Zeitzoff, Bobby Filar, et al. 2018. The malicious use of artificial intelligence: Forecasting, prevention, and mitigation. arXiv preprint arXiv:1802.07228 (2018).Google ScholarGoogle Scholar
  13. Aditya Chattopadhyay, Piyushi Manupriya, Anirban Sarkar, and Vineeth N Balasubramanian. 2019. Neural Network Attributions: A Causal Perspective. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 981--990.Google ScholarGoogle Scholar
  14. Jianbo Chen, Le Song, Martin J Wainwright, and Michael I Jordan. [n. d.]. L-shapley and c-shapley: Efficient model interpretation for structured data. 7th International Conference on Learning Representations (ICLR 2019) ([n. d.]).Google ScholarGoogle Scholar
  15. R Dennis Cook. 1977. Detection of influential observation in linear regression. Technometrics 19, 1 (1977), 15--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jeffrey De Fauw, Joseph R Ledsam, Bernardino Romera-Paredes, Stanislav Nikolov, Nenad Tomasev, Sam Blackwell, Harry Askham, Xavier Glorot, Brendan O'Donoghue, Daniel Visentin, et al. 2018. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature medicine 24, 9 (2018), 1342.Google ScholarGoogle Scholar
  17. Amit Dhurandhar, Karthikeyan Shanmugam, Ronny Luss, and Peder A Olsen. 2018. Improving simple models with confidence profiles. In Advances in Neural Information Processing Systems. 10296--10306.Google ScholarGoogle Scholar
  18. Ann-Kathrin Dombrowski, Maximilian Alber, Christopher J Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel. 2019. Explanations can be manipulated and geometry is to blame. arXiv preprint arXiv:1906.07983 (2019).Google ScholarGoogle Scholar
  19. Finale Doshi-Velez and Been Kim. 2017. Towards A Rigorous Science of Interpretable Machine Learning. (2017).Google ScholarGoogle Scholar
  20. William DuMouchel. 2002. Data squashing: constructing summary data sets. In Handbook of Massive Data Sets. Springer, 579--591.Google ScholarGoogle Scholar
  21. Christian Etmann, Sebastian Lunz, Peter Maass, and Carola Schoenlieb. 2019. On the Connection Between Adversarial Robustness and Saliency Map Interpretability. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 1823--1832.Google ScholarGoogle Scholar
  22. Ruth Fong and Andrea Vedaldi. 2017. Interpretable Explanations of Black Boxes by Meaningful Perturbation. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). (2017). arXiv:arXiv:1704.03296 Google ScholarGoogle ScholarCross RefCross Ref
  23. Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. AAAI (2019).Google ScholarGoogle Scholar
  24. Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. 2018. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA). IEEE, 80--89.Google ScholarGoogle ScholarCross RefCross Ref
  25. Frederik Harder, Matthias Bauer, and Mijung Park. 2019. Interpretable and Differentially Private Predictions. arXiv preprint arXiv:1906.02004 (2019).Google ScholarGoogle Scholar
  26. JB Heaton, Nicholas G Polson, and Jan Hendrik Witte. 2016. Deep learning in finance. arXiv preprint arXiv:1602.06561 (2016).Google ScholarGoogle Scholar
  27. Paul W. Holland. 1986. Statistics and Causal Inference. J. Amer. Statist. Assoc. 81, 396 (1986), 945--960.Google ScholarGoogle ScholarCross RefCross Ref
  28. Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 600.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Giles Hooker and Lucas Mentch. 2019. Please Stop Permuting Features: An Explanation and Alternatives. arXiv preprint arXiv:1905.03151 (2019).Google ScholarGoogle Scholar
  30. Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. 2019. Adversarial Examples Are Not Bugs, They Are Features. http://arxiv.org/abs/1905.02175 cite arxiv:1905.02175.Google ScholarGoogle Scholar
  31. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, and Rory Sayres. 2017. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). arXiv preprint arXiv:1711.11279 (2017).Google ScholarGoogle Scholar
  32. Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (ICML 2017). Journal of Machine Learning Research, 1885--1894.Google ScholarGoogle Scholar
  33. Bruno Lepri, Nuria Oliver, Emmanuel Letouzé, Alex Pentland, and Patrick Vinck. 2018. Fair, transparent, and accountable algorithmic decision-making processes. Philosophy & Technology 31, 4 (2018), 611--627.Google ScholarGoogle ScholarCross RefCross Ref
  34. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017), I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774.Google ScholarGoogle Scholar
  35. Scott M Lundberg, Bala Nair, Monica S Vavilala, Mayumi Horibe, Michael J Eisses, Trevor Adams, David E Liston, Daniel King-Wai Low, Shu-Fang Newman, Jerry Kim, et al. 2018. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature biomedical engineering 2, 10 (2018), 749.Google ScholarGoogle Scholar
  36. Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).Google ScholarGoogle Scholar
  37. Tim Miller. 2018. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence (2018).Google ScholarGoogle Scholar
  38. Smitha Milli, Ludwig Schmidt, Anca Dragan, and Moritz Hardt. 2019. Model Reconstruction from Model Explanations. In Proceedings of ACM FAT* 2019 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 220--229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Brent Mittelstadt, Chris Russell, and Sandra Wachter. 2019. Explaining explanations in AI. In Proceedings of the conference on fairness, accountability, and transparency. ACM, 279--288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-Robert Müller. 2017. Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognition 65 (2017), 211--222.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yilin Niu, Chao Qiao, Hang Li, and Minlie Huang. 2018. Word Embedding based Edit Distance. arXiv preprint arXiv:1810.10752 (2018).Google ScholarGoogle Scholar
  43. Board of Governors of the Federal Reserve System. 2011. Supervisory Guidance on Model Risk Management. https://www.federalreserve.gov/supervisionreg/srletters/sr1107a1.pdf (2011).Google ScholarGoogle Scholar
  44. Onora O'Neill. 2018. Linking trust to trustworthiness. International Journal of Philosophical Studies 26, 2 (2018), 293--300.Google ScholarGoogle ScholarCross RefCross Ref
  45. European Parliament and Council of European Union. 2018. European Union General Data Protection Regulation, Articles 13-15. http://www.privacy-regulation.eu/en/13.htm (2018).Google ScholarGoogle Scholar
  46. Judea Pearl. 2000. Causality: models, reasoning and inference. Vol. 29. Springer.Google ScholarGoogle Scholar
  47. Fábio Pinto, Marco OP Sampaio, and Pedro Bizarro. 2019. Automatic Model Monitoring for Data Streams. arXiv preprint arXiv:1908.04240 (2019).Google ScholarGoogle Scholar
  48. Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and measuring model interpretability. arXiv preprint arXiv:1802.07810 (2018).Google ScholarGoogle Scholar
  49. Alun Preece, Dan Harborne, Dave Braines, Richard Tomsett, and Supriyo Chakraborty. 2018. Stakeholders in explainable AI. arXiv preprint arXiv:1810.00184 (2018).Google ScholarGoogle Scholar
  50. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Andrew Slavin Ross, Michael C Hughes, and Finale Doshi-Velez. 2017. Right for the right reasons: training differentiable models by constraining their explanations. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 2662--2670.Google ScholarGoogle ScholarCross RefCross Ref
  52. Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206.Google ScholarGoogle ScholarCross RefCross Ref
  53. Andrew D Selbst and Solon Barocas. 2018. The intuitive appeal of explainable machines. Fordham L. Rev. 87 (2018), 1085.Google ScholarGoogle Scholar
  54. Lloyd S Shapley. 1953. A Value for n-Person Games. In Contributions to the Theory of Games II. 307--317.Google ScholarGoogle Scholar
  55. Shubham Sharma, Jette Henderson, and Joydeep Ghosh. 2019. CERTIFAI: Counterfactual Explanations for Robustness, Transparency, Interpretability, and Fairness of Artificial Intelligence models. arXiv preprint arXiv:1905.07857 (2019).Google ScholarGoogle Scholar
  56. Reza Shokri, Martin Strobel, and Yair Zick. 2019. Privacy Risks of Explaining Machine Learning Models. arXiv preprint arXiv:1907.00164 (2019).Google ScholarGoogle Scholar
  57. Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (ICML 2017). Journal of Machine Learning Research, 3145--3153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Avanti Shrikumar, Eva Prakash, and Anshul Kundaje. 2018. Gkmexplain: Fast and Accurate Interpretation of Nonlinear Gapped k-mer Support Vector Machines Using Integrated Gradients. BioRxiv (2018), 457606.Google ScholarGoogle Scholar
  59. Sahil Singla, Eric Wallace, Shi Feng, and Soheil Feizi. 2019. Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 5848--5856.Google ScholarGoogle Scholar
  60. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. 2017. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017).Google ScholarGoogle Scholar
  61. Erik Štrumbelj and Igor Kononenko. 2014. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems 41, 3 (2014), 647--665.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (ICML 2017). Journal of Machine Learning Research, 3319--3328.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Florian Tramèr, Fan Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2016. Stealing machine learning models via prediction apis. In 25th {USENIX} Security Symposium ({USENIX} Security 16). 601--618.Google ScholarGoogle Scholar
  64. Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. 2019. Robustness May Be at Odds with Accuracy. In International Conference on Learning Representations. https://openreview.net/forum?id=SyxAb30cY7Google ScholarGoogle Scholar
  65. Berk Ustun, Alexander Spangher, and Yang Liu. 2019. Actionable recourse in linear classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 10--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GPDR. Harv. JL & Tech. 31 (2017), 841.Google ScholarGoogle Scholar
  67. Adrian Weller. 2019. Transparency: motivations and challenges. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer, 23--40.Google ScholarGoogle Scholar
  68. James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viegas, and Jimbo Wilson. 2019. The What-If Tool: Interactive Probing of Machine Learning Models. arXiv preprint arXiv:1907.04135 (2019).Google ScholarGoogle Scholar
  69. Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Sai Suggala, David Inouye, and Pradeep Ravikumar. 2019. How Sensitive are Sensitivity-Based Explanations? arXiv preprint arXiv:1901.09392 (2019).Google ScholarGoogle Scholar
  70. Jianming Zhang, Sarah Adel Bargal, Zhe Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. 2018. Top-down neural attention by excitation backprop. International Journal of Computer Vision 126, 10 (2018), 1084--1102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Yujia Zhang, Kuangyan Song, Yiming Sun, Sarah Tan, and Madeleine Udell. 2019. "Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations. arXiv:arXiv:1904.12991Google ScholarGoogle Scholar

Index Terms

  1. Explainable machine learning in deployment

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader