skip to main content
10.1145/3377325.3377480acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Honorable Mention

How do visual explanations foster end users' appropriate trust in machine learning?

Published:17 March 2020Publication History

ABSTRACT

We investigated the effects of example-based explanations for a machine learning classifier on end users' appropriate trust. We explored the effects of spatial layout and visual representation in an in-person user study with 33 participants. We measured participants' appropriate trust in the classifier, quantified the effects of different spatial layouts and visual representations, and observed changes in users' trust over time. The results show that each explanation improved users' trust in the classifier, and the combination of explanation, human, and classification algorithm yielded much better decisions than the human and classification algorithm separately. Yet these visual explanations lead to different levels of trust and may cause inappropriate trust if an explanation is difficult to understand. Visual representation and performance feedback strongly affect users' trust, and spatial layout shows a moderate effect. Our results do not support that individual differences (e.g., propensity to trust) affect users' trust in the classifier. This work advances the state-of-the-art in trust-able machine learning and informs the design and appropriate use of automated systems.

References

  1. Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y Lim, and Mohan Kankanhalli. 2018. Trends and trajectories for explainable, accountable and intelligible systems: An HCI research agenda. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, Article Paper 582, 18 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Saleema Amershi, James Fogarty, Ashish Kapoor, and Desney Tan. 2010. Examining multiple potential models in end-user interactive concept learning. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1357--1360.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, Raja Chatila, and Francisco Herrera. 2019. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. arXiv:cs.AI/1910.10045Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, and Eric Horvitz. 2019. Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2429--2437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michelle A Borkin, Azalea A Vo, Zoya Bylinskii, Phillip Isola, Shashank Sunkavalli, Aude Oliva, and Hanspeter Pfister. 2013. What makes a visualization memorable? IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2306--2315.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Eli T Brown, Jingjing Liu, Carla E Brodley, and Remco Chang. 2012. Dis-function: Learning distance functions interactively. In Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on. IEEE, 83--92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Adrian Bussone, Simone Stumpf, and Dympna O'Sullivan. 2015. The role of explanations on trust and reliance in clinical decision support systems. In Healthcare Informatics (ICHI), 2015 International Conference on. IEEE, 160--169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Carrie J Cai, Jonas Jongejan, and Jess Holbrook. 2019. The effects of example-based explanations in a machine learning interface. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM, 258--262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Stuart K Card and Jock Mackinlay. 1997. The structure of the information visualization design space. In Proceedings of VIZ'97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium. IEEE, 92--99.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jaegul Choo and Shixia Liu. 2018. Visual Analytics for Explainable Deep Learning. arXiv preprint arXiv:1804.02527 (2018).Google ScholarGoogle Scholar
  11. Sven Coppers, Jan Van den Bergh, Kris Luyten, Karin Coninx, Iulianna van der Lek-Ciudin, Tom Vanallemeersch, and Vincent Vandeghinste. 2018. Intellingo: An Intelligible Translation Environment (CHI '18). ACM, New York, NY, USA, Article 524, 13 pages.Google ScholarGoogle Scholar
  12. Geoff Cumming. 2013. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.Google ScholarGoogle Scholar
  13. Geoff Cumming. 2014. The new statistics: Why and how. Psych. Sci. 25, 1 (2014), 7--29.Google ScholarGoogle ScholarCross RefCross Ref
  14. Aritra Dasgupta, Joon-Yong Lee, Ryan Wilson, Robert Lafrance, Nick Cramer, Kristin Cook, and Samuel Payne. 2017. Familiarity vs trust: A comparative study of domain scientists' trust in visual analytics and conventional analysis methods. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2017), 271--280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ewart de Visser, Marvin Cohen, Amos Freedy, and Raja Parasuraman. 2014. A design methodology for trust cue calibration in cognitive agents. In International Conference on Virtual, Augmented and Mixed Reality. Springer, 251--262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ewart de Visser and Raja Parasuraman. 2011. Adaptive aiding of human-robot teaming: Effects of imperfect automation on performance, trust, and workload. Journal of Cognitive Engineering and Decision Making 5, 2 (2011), 209--231.Google ScholarGoogle ScholarCross RefCross Ref
  17. Peter de Vries, Cees Midden, and Don Bouwhuis. 2003. The effects of errors on system trust, self-confidence, and the allocation of control in route planning. International Journal of Human-Computer Studies 58, 6 (2003), 719--735.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General 144, 1 (2015), 114.Google ScholarGoogle ScholarCross RefCross Ref
  19. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).Google ScholarGoogle Scholar
  20. Pierre Dragicevic. 2016. Fair statistical communication in HCI. Springer Int. Publishing, Cham, 291--330.Google ScholarGoogle Scholar
  21. Mary T Dzindolet, Scott A Peterson, Regina A Pomranky, Linda G Pierce, and Hall P Beck. 2003. The role of trust in automation reliance. International Journal of Human-Computer Studies 58, 6 (2003), 697--718.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mary T Dzindolet, Linda G Pierce, Hall P Beck, and Lloyd A Dawe. 2002. The perceived utility of human and automated aids in a visual detection task. Human Factors 44, 1 (2002), 79--94.Google ScholarGoogle ScholarCross RefCross Ref
  23. Malin Eiband, Charlotte Anlauff, Tim Ordenewitz, Martin Zürn, and Heinrich Hussmann. 2019. Understanding Algorithms Through Exploration: Supporting Knowledge Acquisition in Primary Tasks (MuC'19). ACM, New York, NY, USA, 127--136.Google ScholarGoogle Scholar
  24. John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen C North, and Gordon Woodhull. 2001. Graphviz---open source graph drawing tools. In International Symposium on Graph Drawing. Springer, 483--484.Google ScholarGoogle Scholar
  25. Greg Elofson. 2001. Developing trust with intelligent agents: An exploratory study. In Trust and Deception in Virtual Societies. Springer, 125--138.Google ScholarGoogle Scholar
  26. Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2009. Visualizing higher-layer features of a deep network. Technical Report. University of Montreal.Google ScholarGoogle Scholar
  27. Marina G Falleti, Paul Maruff, Alexander Collie, and David G Darby. 2006. Practice effects associated with the repeated assessment of cognitive function using the CogState battery at 10-minute, one week and one month test-retest intervals. Journal of Clinical and Experimental Neuropsychology 28, 7 (2006), 1095--1112.Google ScholarGoogle ScholarCross RefCross Ref
  28. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, Aug (2008), 1871--1874.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Amos Freedy, Ewart DeVisser, Gershon Weltman, and Nicole Coeyman. 2007. Measurement of trust in human-robot collaboration. In Collaborative Technologies and Systems, International Symposium on. IEEE, 106--114.Google ScholarGoogle Scholar
  30. Yashesh Gaur, Walter S Lasecki, Florian Metze, and Jeffrey P Bigham. 2016. The effects of automatic speech recognition quality on human transcription latency. In Proceedings of the 13th Web for All Conference. ACM, Article Article 23, 8 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. David Gefen. 2000. E-commerce: the role of familiarity and trust. Omega 28, 6 (2000), 725--737.Google ScholarGoogle ScholarCross RefCross Ref
  32. David Gefen, Elena Karahanna, and Detmar W Straub. 2003. Trust and TAM in online shopping: An integrated model. MIS quarterly 27, 1 (2003), 51--90.Google ScholarGoogle Scholar
  33. Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine Learning 63, 1 (2006), 3--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Alyssa Glass, Deborah L McGuinness, and Michael Wolverton. 2008. Toward establishing trust in adaptive agents. In Proceedings of the 13th International Conference on Intelligent User Interfaces. ACM, 227--236.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michael Gleicher. 2013. Explainers: Expert explorations with crafted projections. IEEE Transactions on Visualization and Computer Graphics 12 (2013), 2042--2051.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google ScholarGoogle Scholar
  37. Tyrone Grandison and Morris Sloman. 2000. A survey of trust in internet applications. IEEE Communications Surveys & Tutorials 3, 4 (2000), 2--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Samuel Gratzl, Alexander Lex, Nils Gehlenborg, Hanspeter Pfister, and Marc Streit. 2013. Lineup: Visual analysis of multi-attribute rankings. IEEE Transactions on Visualization and Computer Graphics 19, 12 (2013), 2277--2286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. David Gunning. 2017. Explainable artificial intelligence (XAI). Defense Advanced Research Projects Agency (DARPA), 2nd Web (2017).Google ScholarGoogle Scholar
  40. Ivan Herman, Guy Melancon, and M Scott Marshall. 2000. Graph visualization and navigation in information visualization: A survey. IEEE Transactions on Visualization and Computer Graphics 6, 1 (2000), 24--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tom Hitron, Iddo Wald, Hadas Erel, and Oren Zuckerman. 2018. Introducing children to machine learning concepts through hands-on experience. In Proceedings of the 17th ACM Conference on Interaction Design and Children. ACM, 563--568.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Robert R Hoffman and Gary Klein. 2017. Explaining explanation, part 1: theoretical foundations. IEEE Intelligent Systems 3 (2017), 68--73.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Fred Matthew Hohman, Minsuk Kahng, Robert Pienta, and Duen Horng Chau. 2018. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers. IEEE Transactions on Visualization and Computer Graphics (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Marc W Howard and Michael J Kahana. 1999. Contextual variability and serial position effects in free recall. Journal of Experimental Psychology: Learning, Memory, and Cognition 25, 4 (1999), 923.Google ScholarGoogle ScholarCross RefCross Ref
  45. Yu-Chen Hsu. 2006. The effects of metaphors on novice and expert learnersâĂŸ performance and mental-model development. Interacting with Computers 18, 4 (2006), 770--792.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. T Inagaki, N Moray, and M Itoh. 1998. Trust, self-confidence and authority in human-machine systems. IFAC Proceedings Volumes 31, 26 (1998), 431--436.Google ScholarGoogle ScholarCross RefCross Ref
  47. Jiun-Yin Jian, Ann M Bisantz, and Colin G Drury. 2000. Foundations for an empirically determined scale of trust in automated systems. International Journal of Cognitive Ergonomics 4, 1 (2000), 53--71.Google ScholarGoogle ScholarCross RefCross Ref
  48. Devon Johnson and Kent Grayson. 2005. Cognitive and affective trust in service relationships. Journal of Business research 58, 4 (2005), 500--507.Google ScholarGoogle ScholarCross RefCross Ref
  49. Lawrence K Jones and Mary F Chenery. 1980. Multiple subtypes among vocationally undecided college students: A model and assessment instrument. Journal of Counseling Psychology 27, 5 (1980), 469.Google ScholarGoogle ScholarCross RefCross Ref
  50. Minsuk Kahng, Pierre Y Andrews, Aditya Kalro, and Duen Horng Polo Chau. 2018. ActiVis: Visual exploration of industry-scale deep neural network models. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 88--97.Google ScholarGoogle ScholarCross RefCross Ref
  51. Mohammad T Khasawneh, Shannon R Bowling, Xiaochun Jiang, Anand K Gramopadhye, and Brian J Melloy. 2003. A model for predicting human trust in automated systems. Origins 5 (2003).Google ScholarGoogle Scholar
  52. René F Kizilcec. 2016. How much information?: Effects of transparency on trust in an algorithmic interface. In Proceedings of the Conference on Human Factors in Computing Systems. ACM, 2390--2395.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Sherrie YX Komiak and Izak Benbasat. 2006. The effects of personalization and familiarity on trust and adoption of recommendation agents. MIS quarterly (2006), 941--960.Google ScholarGoogle Scholar
  55. Josua Krause, Aritra Dasgupta, Jordan Swartz, Yindalon Aphinyanaphongs, and Enrico Bertini. 2017. A workflow for visual diagnostics of binary classifiers using instance-level explanations. arXiv preprint arXiv:1705.01968 (2017).Google ScholarGoogle Scholar
  56. Josua Krause, Adam Perer, and Enrico Bertini. 2014. INFUSE: interactive feature selection for predictive modeling of high dimensional data. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 1614--1623.Google ScholarGoogle ScholarCross RefCross Ref
  57. Josua Krause, Adam Perer, and Enrico Bertini. 2018. A user study on the effect of aggregating explanations for interpreting machine learning models. In KDD Workshop on Interactive Data Exploration and Analytics (IDEA).Google ScholarGoogle Scholar
  58. Josua Krause, Adam Perer, and Kenney Ng. 2016. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 5686--5697.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces. ACM, 126--137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. John D Lee and Neville Moray. 1992. Trust, control strategies and allocation of function in human-machine systems. Ergonomics 35, 10 (1992), 1243--1270.Google ScholarGoogle ScholarCross RefCross Ref
  61. John D Lee and Neville Moray. 1994. Trust, self-confidence, and operators' adaptation to automation. International Journal of Human-computer Studies 40, 1 (1994), 153--184.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. John D Lee and Katrina A See. 2004. Trust in automation: Designing for appropriate reliance. Human Factors 46, 1 (2004), 50--80.Google ScholarGoogle ScholarCross RefCross Ref
  63. Min Kyung Lee. 2018. Understanding perception of algorithmic decisions: Fairness, trust, and emotion in response to algorithmic management. Big Data & Society 5, 1 (2018), 2053951718756684.Google ScholarGoogle ScholarCross RefCross Ref
  64. Matthew KO Lee and Efraim Turban. 2001. A trust model for consumer internet shopping. International Journal of electronic commerce 6, 1 (2001), 75--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Scott LeeTiernan, Edward Cutrell, Mary Czerwinski, and Hunter G Hoffman. 2001. Effective notification systems depend on user trust. In INTERACT. 684--685.Google ScholarGoogle Scholar
  66. Brian Y Lim. 2012. Improving understanding and trust with intelligibility in context-aware applications. Ph.D. Dissertation. Carnegie Mellon University.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Brian Y Lim, Anind K Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2119--2128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Brian Y Lim, Qian Yang, Ashraf M Abdul, and Danding Wang. 2019. Why these Explanations? Selecting Intelligibility Types for Explanation Goals. In IUI Workshops.Google ScholarGoogle Scholar
  69. Shixia Liu, Jiannan Xiao, Junlin Liu, Xiting Wang, Jing Wu, and Jun Zhu. 2018. Visual diagnosis of tree boosting methods. IEEE Transactions on Visualization and Computer Graphics 24 (2018), 163--173.Google ScholarGoogle ScholarCross RefCross Ref
  70. Poornima Madhavan and Douglas A Wiegmann. 2007. Similarities and differences between human-human and human-automation trust: an integrative review. Theoretical Issues in Ergonomics Science 8, 4 (2007), 277--301.Google ScholarGoogle ScholarCross RefCross Ref
  71. Azad Madni and Carla Madni. 2018. Architectural Framework for Exploring Adaptive Human-Machine Teaming Options in Simulated Dynamic Environments. Systems 6, 4 (2018), 44.Google ScholarGoogle ScholarCross RefCross Ref
  72. Maria Madsen and Shirley Gregor. 2000. Measuring human-computer trust. In 11th Australasian Conference on Information Systems, Vol. 53. Citeseer, 6--8.Google ScholarGoogle Scholar
  73. Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5188--5196.Google ScholarGoogle ScholarCross RefCross Ref
  74. Stephen Marsh and Mark R Dibben. 2005. Trust, untrust, distrust and mistrust-an exploration of the dark (er) side. In International Conference on Trust Management. Springer, 17--33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Ronald Scott Marshall. 2003. Building trust early: The influence of first and second order expectations on trust in international channels of distribution. International Business Review 12, 4 (2003), 421--443.Google ScholarGoogle ScholarCross RefCross Ref
  76. Reena Master, Xiaochun Jiang, Mohammad T Khasawneh, Shannon R Bowling, Larry Grimes, Anand K Gramopadhye, and Brian J Melloy. 2005. Measurement of trust over time in hybrid inspection systems. Human Factors and Ergonomics in Manufacturing & Service Industries 15, 2 (2005), 177--196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Roger C Mayer, James H Davis, and F David Schoorman. 1995. An integrative model of organizational trust. Academy of Management Review 20, 3 (1995), 709--734.Google ScholarGoogle ScholarCross RefCross Ref
  78. Daniel J McAllister. 1995. Affect-and cognition-based trust as foundations for interpersonal cooperation in organizations. Academy of Management Journal 38, 1 (1995), 24--59.Google ScholarGoogle ScholarCross RefCross Ref
  79. Maranda McBride and Shona Morgan. 2010. Trust calibration for automated decision aids. Institute for Homeland Security Solutions (2010).Google ScholarGoogle Scholar
  80. John M McGuirl and Nadine B Sarter. 2006. Supporting trust calibration and the effective use of decision aids by presenting dynamic system confidence information. Human Factors 48, 4 (2006), 656--665.Google ScholarGoogle ScholarCross RefCross Ref
  81. D Harrison McKnight and Norman L Chervany. 2001. Trust and distrust definitions: One bite at a time. In Trust in Cyber-societies. Springer, 27--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. D Harrison McKnight, Vivek Choudhury, and Charles Kacmar. 2002. Developing and validating trust measures for e-commerce: An integrative typology. Information systems research 13, 3 (2002), 334--359.Google ScholarGoogle Scholar
  83. David L McLain and Katarina Hackman. 1999. Trust, risk, and decision-making in organizational change. Public Administration Quarterly (1999), 152--176.Google ScholarGoogle Scholar
  84. Stephanie M Merritt. 2011. Affective processes in human-automation interactions. Human Factors 53, 4 (2011), 356--370.Google ScholarGoogle ScholarCross RefCross Ref
  85. Stephanie M Merritt, Heather Heimbaugh, Jennifer LaChapell, and Deborah Lee. 2013. I trust it, but I don't know why: Effects of implicit attitudes toward automation on trust in an automated system. Human factors 55, 3 (2013), 520--534.Google ScholarGoogle Scholar
  86. Stephanie M Merritt and Daniel R Ilgen. 2008. Not all trust is created equal: Dispositional and history-based trust in human-automation interactions. Human Factors 50, 2 (2008), 194--210.Google ScholarGoogle ScholarCross RefCross Ref
  87. Stephanie M Merritt, Deborah Lee, Jennifer L Unnerstall, and Kelli Huber. 2015. Are well-calibrated users effective users? Associations between calibration of trust and performance on an automation-aided task. Human Factors 57, 1 (2015), 34--47.Google ScholarGoogle ScholarCross RefCross Ref
  88. Malgorzata A. Migut, Jan C. van Gemert, and Marcel Worring. 2011. Interactive decision making using dissimilarity to visually represented prototypes. In 2011 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 141--149.Google ScholarGoogle ScholarCross RefCross Ref
  89. Christopher A Miller. 2005. Trust in adaptive automation: the role of etiquette in tuning trust via analogic and affective methods. In Proceedings of the 1st International Conference on Augmented Cognition. Citeseer, 22--27.Google ScholarGoogle Scholar
  90. Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267 (2019), 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  91. Yao Ming. 2017. A survey on visualization for explainable classifiers. (2017). http://www.cse.ust.hk/~huamin/explainable_AI_yao.pdfGoogle ScholarGoogle Scholar
  92. Bonnie M Muir. 1987. Trust between humans and machines, and the design of decision aids. International Journal of Man-Machine Studies 27, 5--6 (1987), 527--539.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Bonnie M Muir. 1994. Trust in automation: Part I. Theoretical issues in the study of trust and human intervention in automated systems. Ergonomics 37, 11 (1994), 1905--1922.Google ScholarGoogle ScholarCross RefCross Ref
  94. Bonnie M Muir and Neville Moray. 1996. Trust in automation. Part II. Experimental studies of trust and human intervention in a process control simulation. Ergonomics 39, 3 (1996), 429--460.Google ScholarGoogle ScholarCross RefCross Ref
  95. Julian D Olden and Donald A Jackson. 2002. Illuminating the "black-box": a randomization approach for understanding variable contributions in artificial neural networks. Ecological Modelling 154, 1--2 (2002), 135--150.Google ScholarGoogle ScholarCross RefCross Ref
  96. Anshul Vikram Pandey, Anjali Manivannan, Oded Nov, Margaret Satterthwaite, and Enrico Bertini. 2014. The persuasive power of data visualization. IEEE Transactions on Visualization and Computer Graphics 20, 12 (2014), 2211--2220.Google ScholarGoogle ScholarCross RefCross Ref
  97. Raja Parasuraman and Victor Riley. 1997. Humans and automation: Use, misuse, disuse, abuse. Human factors 39, 2 (1997), 230--253.Google ScholarGoogle Scholar
  98. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 10 (2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Pearl Pu and Li Chen. 2006. Trust building with explanation interfaces. In Proceedings of the 11th international conference on Intelligent user interfaces. ACM, 93--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. John K Rempel, John G Holmes, and Mark P Zanna. 1985. Trust in close relationships. Journal of Personality and Social Psychology 49, 1 (1985), 95--112.Google ScholarGoogle ScholarCross RefCross Ref
  101. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135--1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Jader Sant'Ana, Emerson Franchini, Vinicius da Silva, and Fernando Diefenthaeler. 2017. Effect of fatigue on reaction time, response time, performance time, and kick impact in taekwondo roundhouse kick. Sports Biomechanics 16, 2 (2017), 201--209.Google ScholarGoogle ScholarCross RefCross Ref
  103. Christin Seifert, Aisha Aamir, Aparna Balagopalan, Dhruv Jain, Abhinav Sharma, Sebastian Grottel, and Stefan Gumhold. 2017. Visualizations of deep neural networks in computer vision: A survey. In Transparent Data Mining for Big and Small Data. Springer, 123--144.Google ScholarGoogle Scholar
  104. Younho Seong and Ann M Bisantz. 2008. The impact of cognitive feedback on judgment performance and trust with decision aids. International Journal of Industrial Ergonomics 38, 7--8 (2008), 608--625.Google ScholarGoogle ScholarCross RefCross Ref
  105. Pedro FB Silva, Andre RS Marcal, and Rubim M Almeida da Silva. 2013. Evaluation of features for leaf discrimination. In International Conference Image Analysis and Recognition. Springer, 197--204.Google ScholarGoogle ScholarCross RefCross Ref
  106. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).Google ScholarGoogle Scholar
  107. Archana Singh, Avantika Yadav, and Ajay Rana. 2013. K-means with three different distance metrics. International Journal of Computer Applications 67, 10 (2013).Google ScholarGoogle ScholarCross RefCross Ref
  108. Erik Štrumbelj, Igor Kononenko, and M Robnik Šikonja. 2009. Explaining instance classifications with interactions of subsets of feature values. Data & Knowledge Engineering 68, 10 (2009), 886--904.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Simone Stumpf, Adrian Bussone, and Dympna O'Sullivan. 2016. Explanations considered harmful? user interactions with machine learning systems. In Human Centred Machine Learning at CHI 2016.Google ScholarGoogle Scholar
  110. Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen Wong, Margaret Burnett, Thomas Dietterich, Erin Sullivan, and Jonathan Herlocker. 2009. Interacting meaningfully with machine learning systems: Three experiments. International Journal of Human-Computer Studies 67, 8 (2009), 639--662.Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Simone Stumpf, Simonas Skrebe, Graeme Aymer, and Julie Hobson. 2018. Explaining smart heating systems to discourage fiddling with optimized behavior. In CEUR Workshop Proceedings.Google ScholarGoogle Scholar
  112. Paolo Tamagnini, Josua Krause, Aritra Dasgupta, and Enrico Bertini. 2017. Interpreting black-box classifiers using instance-level visual explanations. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. ACM, 6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Sandra Wachter, Brent Mittelstadt, and Chris Russell. [n.d.]. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GPDR. Harvard Journal of Law & Technology 31, 2 ([n. d.]).Google ScholarGoogle Scholar
  114. Connie R Wanberg and Paul M Muchinsky. 1992. A typology of career decision status: Validity extension of the vocational decision status model. Journal of Counseling Psychology 39, 1 (1992), 71--80.Google ScholarGoogle ScholarCross RefCross Ref
  115. Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, and Andrew H Beck. 2016. Deep learning for identifying metastatic breast cancer. arXiv preprint arXiv:1606.05718 (2016).Google ScholarGoogle Scholar
  116. Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y Lim. 2019. Designing Theory-Driven User-Centric Explainable AI. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, Article Paper 601, 15 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Ning Wang, David V Pynadath, and Susan G Hill. 2016. Trust calibration within a human-robot team: Comparing automatically generated explanations. In The Eleventh ACM/IEEE International Conference on Human Robot Interaction. IEEE Press, 109--116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Weiquan Wang and Izak Benbasat. 2007. Recommendation agents for electronic commerce: Effects of explanation facilities on trusting beliefs. Journal of Management Information Systems 23, 4 (2007), 217--246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Marcia L Watson. 2005. Can there be just one trust? A cross-disciplinary identification of trust definitions and measurement. The Institute for Public Relations (2005), 1--25.Google ScholarGoogle Scholar
  120. Peter H Westfall, S Stanley Young, et al. 1993. Resampling-based multiple testing: Examples and methods for p-value adjustment. Vol. 279. John Wiley & Sons.Google ScholarGoogle Scholar
  121. Kanit Wongsuphasawat, Daniel Smilkov, James Wexler, Jimbo Wilson, Dandelion Mané, Doug Fritz, Dilip Krishnan, Fernanda B Viégas, and Martin Wattenberg. 2018. Visualizing dataflow graphs of deep learning models in TensorFlow. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  122. Scott Cheng-Hsin Yang and Patrick Shafto. 2017. Explainable artificial intelligence via bayesian teaching. In NIPS 2017 Workshop on Teaching Machines, Robots, and Humans.Google ScholarGoogle Scholar
  123. Vahan Yoghourdjian, Daniel Archambault, Stephan Diehl, Tim Dwyer, Karsten Klein, Helen C Purchase, and Hsiang-Yun Wu. 2018. Exploring the limits of complexity: A survey of empirical studies on graph visualisation. Visual Informatics 2, 4 (2018), 264--282.Google ScholarGoogle ScholarCross RefCross Ref
  124. Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. 2015. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015).Google ScholarGoogle Scholar
  125. Jason Yosinski, Jeff Clune, Anh Mai Nguyen, Thomas J. Fuchs, and Hod Lipson. 2015. Understanding Neural Networks Through Deep Visualization. CoRR abs/1506.06579 (2015). arXiv:1506.06579Google ScholarGoogle Scholar
  126. Beste F Yuksel, Penny Collisson, and Mary Czerwinski. 2017. Brains or beauty: How to engender trust in user-agent interactions. ACM Transactions on Internet Technology (TOIT) 17, 1, Article Article 2 (Jan. 2017), 20 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Hongjie Zhang, Yanyan Hou, Jianye Zhang, Xiangyang Qi, and Fujun Wang. 2015. A new method for nondestructive quality evaluation of the resistance spot welding based on the radar chart method and the decision tree classifier. The International Journal of Advanced Manufacturing Technology 78, 5--8 (2015), 841--851.Google ScholarGoogle ScholarCross RefCross Ref
  128. Quan-Shi Zhang and Song-Chun Zhu. 2018. Visual interpretability for deep learning: a survey. Frontiers of Information Technology & Electronic Engineering 19, 1 (2018), 27--39.Google ScholarGoogle ScholarCross RefCross Ref
  129. Jianlong Zhou and Fang Chen. 2018. 2D Transparency Space---Bring Domain Users and Machine Learning Experts Together. In Human and Machine Learning. Springer, 3--19.Google ScholarGoogle Scholar

Index Terms

  1. How do visual explanations foster end users' appropriate trust in machine learning?

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader