skip to main content
research-article

Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers

Published:05 January 2021Publication History
Skip Abstract Section

Abstract

The wide adoption of Machine Learning (ML) technologies has created a growing demand for people who can train ML models. Some advocated the term "machine teacher'' to refer to the role of people who inject domain knowledge into ML models. This "teaching'' perspective emphasizes supporting the productivity and mental wellbeing of machine teachers through efficient learning algorithms and thoughtful design of human-AI interfaces. One promising learning paradigm is Active Learning (AL), by which the model intelligently selects instances to query a machine teacher for labels, so that the labeling workload could be largely reduced. However, in current AL settings, the human-AI interface remains minimal and opaque. A dearth of empirical studies further hinders us from developing teacher-friendly interfaces for AL algorithms. In this work, we begin considering AI explanations as a core element of the human-AI interface for teaching machines. When a human student learns, it is a common pattern to present one's own reasoning and solicit feedback from the teacher. When a ML model learns and still makes mistakes, the teacher ought to be able to understand the reasoning underlying its mistakes. When the model matures, the teacher should be able to recognize its progress in order to trust and feel confident about their teaching outcome. Toward this vision, we propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the surging field of explainable AI (XAI) into an AL setting. We conducted an empirical study comparing the model learning outcomes, feedback content and experience with XAL, to that of traditional AL and coactive learning (providing the model's prediction without explanation). Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and additional cognitive workload. Our study also reveals important individual factors that mediate a machine teacher's reception to AI explanations, including task knowledge, AI experience and Need for Cognition. By reflecting on the results, we suggest future directions and design implications for XAL, and more broadly, machine teaching through AI explanations.

References

  1. 2019 (accessed July, 2019). US Inflation Calculator. https://www.usinflationcalculator.com/.Google ScholarGoogle Scholar
  2. Jae-wook Ahn, Peter Brusilovsky, Jonathan Grady, Daqing He, and Sue Yeon Syn. 2007. Open user profiles for adaptive news systems: help or harm?. In Proceedings of the 16th international conference on World Wide Web. 11--20.Google ScholarGoogle Scholar
  3. Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. AI Magazine, Vol. 35, 4 (2014), 105--120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Saleema Amershi, Max Chickering, Steven M Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. Modeltracker: Redesigning performance analysis tools for machine learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 337--346.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David A. Anisi. 2003. Optimal Motion Control of a Ground Vehicle. Master's thesis. Royal Institute of Technology (KTH), Stockholm, Sweden.Google ScholarGoogle Scholar
  6. Vijay Arya, Rachel KE Bellamy, Pin-Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C Hoffman, Stephanie Houde, Q Vera Liao, Ronny Luss, Aleksandra Mojsilović, et almbox. 2019. One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques. arXiv preprint arXiv:1909.03012 (2019).Google ScholarGoogle Scholar
  7. Maria-Florina Balcan, Andrei Broder, and Tong Zhang. 2007. Margin based active learning. In International Conference on Computational Learning Theory. Springer, 35--50.Google ScholarGoogle ScholarCross RefCross Ref
  8. Garrett Beatty, Ethan Kochis, and Michael Bloodgood. 2018. Impact of batch size on stopping active learning for text classification. In 2018 IEEE 12th International Conference on Semantic Computing (ICSC). IEEE, 306--307.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jacob Bien, Robert Tibshirani, et almbox. 2011. Prototype selection for interpretable classification. The Annals of Applied Statistics, Vol. 5, 4 (2011), 2403--2424.Google ScholarGoogle ScholarCross RefCross Ref
  10. Michael Brooks, Saleema Amershi, Bongshin Lee, Steven M Drucker, Ashish Kapoor, and Patrice Simard. 2015. FeatureInsight: Visual support for error-driven feature ideation in text classification. In 2015 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 105--112.Google ScholarGoogle ScholarCross RefCross Ref
  11. Zana Bucc inca, Phoebe Lin, Krzysztof Z Gajos, and Elena L Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 454--464.Google ScholarGoogle Scholar
  12. John T Cacioppo and Richard E Petty. 1982. The need for cognition. Journal of personality and social psychology, Vol. 42, 1 (1982), 116.Google ScholarGoogle ScholarCross RefCross Ref
  13. John T Cacioppo, Richard E Petty, and Katherine J Morris. 1983. Effects of need for cognition on message evaluation, recall, and persuasion. Journal of personality and social psychology, Vol. 45, 4 (1983), 805.Google ScholarGoogle ScholarCross RefCross Ref
  14. Maya Cakmak, Crystal Chao, and Andrea L Thomaz. 2010. Designing interactions for robot active learners. IEEE Transactions on Autonomous Mental Development, Vol. 2, 2 (2010), 108--118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Maya Cakmak and Andrea L Thomaz. 2012. Designing robot learners that ask good questions. In Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction. ACM, 17--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Diogo V Carvalho, Eduardo M Pereira, and Jaime S Cardoso. 2019. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics, Vol. 8, 8 (2019), 832.Google ScholarGoogle Scholar
  17. Tathagata Chakraborti, Sarath Sreedharan, and Subbarao Kambhampati. 2020. The Emerging Landscape of Explainable AI Planning and Decision Making. arXiv preprint arXiv:2002.11697 (2020).Google ScholarGoogle Scholar
  18. Crystal Chao, Maya Cakmak, and Andrea L Thomaz. 2010. Transparent active learning for robots. In 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 317--324.Google ScholarGoogle Scholar
  19. Hao-Fei Cheng, Ruotong Wang, Zheng Zhang, Fiona O'Connell, Terrance Gray, F Maxwell Harper, and Haiyi Zhu. 2019. Explaining Decision-Making Algorithms through UI: Strategies to Help Non-Expert Stakeholders. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 559.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Justin Cheng, Jaime Teevan, and Michael S Bernstein. 2015. Measuring crowdsourcing effort with error-time curves. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 1365--1374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jaegul Choo, Changhyun Lee, Chandan K Reddy, and Haesun Park. 2013. Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE transactions on visualization and computer graphics, Vol. 19, 12 (2013), 1992--2001.Google ScholarGoogle Scholar
  22. Sophie Chou, William Li, and Ramesh Sridharan. 2014. Democratizing data science. In Proceedings of the KDD 2014 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA. Citeseer, 24--27.Google ScholarGoogle Scholar
  23. David Cohn, Les Atlas, and Richard Ladner. 1994. Improving generalization with active learning. Machine learning, Vol. 15, 2 (1994), 201--221.Google ScholarGoogle Scholar
  24. Duncan Cramer and Dennis Laurence Howitt. 2004. The Sage dictionary of statistics: a practical resource for students in the social sciences. Sage.Google ScholarGoogle Scholar
  25. Aron Culotta and Andrew McCallum. 2005. Reducing labeling effort for structured prediction tasks. In AAAI, Vol. 5. 746--751.Google ScholarGoogle Scholar
  26. Sanjoy Dasgupta and Daniel Hsu. 2008. Hierarchical sampling for active learning. In Proceedings of the 25th international conference on Machine learning. ACM, 208--215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jonathan Dodge, Q Vera Liao, Yunfeng Zhang, Rachel KE Bellamy, and Casey Dugan. 2019. Explaining models: an empirical study of how explanations impact fairness judgment. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM, 275--285.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pinar Donmez and Jaime G Carbonell. 2008. Proactive learning: cost-sensitive active learning with multiple imperfect oracles. In Proceedings of the 17th ACM conference on Information and knowledge management. 619--628.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).Google ScholarGoogle Scholar
  30. Gregory Druck, Burr Settles, and Andrew McCallum. 2009. Active learning by labeling features. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 81--90.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jerry Alan Fails and Dan R Olsen Jr. 2003. Interactive machine learning. In Proceedings of the 8th international conference on Intelligent user interfaces. ACM, 39--45.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: interactive concept learning in image search. In Proceedings of the sigchi conference on human factors in computing systems. 29--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yoav Freund, H Sebastian Seung, Eli Shamir, and Naftali Tishby. 1997. Selective sampling using the query by committee algorithm. Machine learning, Vol. 28, 2--3 (1997), 133--168.Google ScholarGoogle Scholar
  34. Victor Gonzalez-Pacheco, Maria Malfaz, and Miguel A Salichs. 2014. Asking rank queries in pose learning. In Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction. ACM, 164--165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2019. A survey of methods for explaining black box models. ACM computing surveys (CSUR), Vol. 51, 5 (2019), 93.Google ScholarGoogle Scholar
  36. David Gunning. 2017. Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web, Vol. 2 (2017).Google ScholarGoogle Scholar
  37. Karthik S Gurumoorthy, Amit Dhurandhar, and Guillermo Cecchi. 2017. Protodash: Fast interpretable prototype selection. arXiv preprint arXiv:1707.01212 (2017).Google ScholarGoogle Scholar
  38. Maeda F Hanafi, Azza Abouzied, Laura Chiticariu, and Yunyao Li. 2017. Seer: Auto-generating information extraction rules from user-specified examples. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 6672--6682.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Curtis P Haugtvedt and Richard E Petty. 1992. Personality and persuasion: Need for cognition moderates the persistence and resistance of attitude changes. Journal of Personality and Social psychology, Vol. 63, 2 (1992), 308.Google ScholarGoogle ScholarCross RefCross Ref
  40. Fred Hohman, Andrew Head, Rich Caruana, Robert DeLine, and Steven M Drucker. 2019. Gamut: A design probe to understand how data scientists understand machine learning models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 579.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sheng-Jun Huang, Rong Jin, and Zhi-Hua Zhou. 2010. Active learning by querying informative and representative examples. In Advances in neural information processing systems. 892--900.Google ScholarGoogle Scholar
  42. Ashish Kapoor, Bongshin Lee, Desney Tan, and Eric Horvitz. 2010. Interactive optimization for steering machine classification. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1343--1352.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Rafal Kocielnik, Saleema Amershi, and Paul N Bennett. 2019. Will you accept an imperfect ai? exploring designs for adjusting end-user expectations of ai systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ronny Kohavi and Barry Becker. 1996. Adult Income dataset (UCI Machine Learning Repository). https://archive.ics.uci.edu/ml/datasets/AdultGoogle ScholarGoogle Scholar
  45. Moritz Körber. 2018. Theoretical considerations and development of a questionnaire to measure trust in automation. In Congress of the International Ergonomics Association. Springer, 13--30.Google ScholarGoogle ScholarCross RefCross Ref
  46. Josua Krause, Adam Perer, and Enrico Bertini. 2014. INFUSE: interactive feature selection for predictive modeling of high dimensional data. IEEE transactions on visualization and computer graphics, Vol. 20, 12 (2014), 1614--1623.Google ScholarGoogle Scholar
  47. Josua Krause, Adam Perer, and Kenney Ng. 2016. Interacting with predictions: Visual inspection of black-box machine learning models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 5686--5697.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. 2015. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th international conference on intelligent user interfaces. ACM, 126--137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Todd Kulesza, Simone Stumpf, Margaret Burnett, Sherry Yang, Irwin Kwan, and Weng-Keen Wong. 2013. Too much, too little, or just right? Ways explanations impact end users' mental models. In 2013 IEEE Symposium on Visual Languages and Human Centric Computing. IEEE, 3--10.Google ScholarGoogle ScholarCross RefCross Ref
  50. Todd Kulesza, Simone Stumpf, Weng-Keen Wong, Margaret M Burnett, Stephen Perona, Andrew Ko, and Ian Oberst. 2011. Why-oriented end-user debugging of naive Bayes text classification. ACM Transactions on Interactive Intelligent Systems (TiiS), Vol. 1, 1 (2011), 1--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Vivian Lai and Chenhao Tan. 2019. On Human Predictions with Explanations and Predictions of Machine Learning Models: A Case Study on Deception Detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 29--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. David D Lewis and William A Gale. 1994 a. A sequential algorithm for training text classifiers. In SIGIR?94. Springer, 3--12.Google ScholarGoogle Scholar
  53. David D Lewis and William A Gale. 1994 b. A sequential algorithm for training text classifiers. In SIGIR?94. Springer, 3--12.Google ScholarGoogle Scholar
  54. JR Lewis. 1995. Computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use. International Journal of Human-Computer Interaction, Vol. 7, 1 (1995), 57--78.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Q Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Brian Y Lim and Anind K Dey. 2010. Toolkit to support intelligibility in context-aware applications. In Proceedings of the 12th ACM international conference on Ubiquitous computing. 13--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Brian Y Lim, Anind K Dey, and Daniel Avrahami. 2009. Why and why not explanations improve the intelligibility of context-aware intelligent systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2119--2128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Brian Y Lim, Qian Yang, Ashraf M Abdul, and Danding Wang. 2019. Why these Explanations? Selecting Intelligibility Types for Explanation Goals.. In IUI Workshops.Google ScholarGoogle Scholar
  59. Zachary C Lipton. 2018. The mythos of model interpretability. Queue, Vol. 16, 3 (2018), 31--57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Rachel Lomasky, Carla E Brodley, Matthew Aernecke, David Walt, and Mark Friedl. 2007. Active class selection. In European Conference on Machine Learning. Springer, 640--647.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Tania Lombrozo. 2012. Explanation and Abductive Inference. In The Oxford Handbook of Thinking and Reasoning. Oxford University Press.Google ScholarGoogle Scholar
  62. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems. 4765--4774.Google ScholarGoogle Scholar
  63. D Harrison McKnight, Vivek Choudhury, and Charles Kacmar. 2002. Developing and validating trust measures for e-commerce: An integrative typology. Information systems research, Vol. 13, 3 (2002), 334--359.Google ScholarGoogle Scholar
  64. D Harrison McKnight, Larry L Cummings, and Norman L Chervany. 1998. Initial trust formation in new organizational relationships. Academy of Management review, Vol. 23, 3 (1998), 473--490.Google ScholarGoogle ScholarCross RefCross Ref
  65. Karen Meyer and Earl Woodruff. 1997. Consensually driven explanation in science teaching. Science Education, Vol. 81, 2 (1997), 173--192.Google ScholarGoogle ScholarCross RefCross Ref
  66. Sina Mohseni, Niloofar Zarei, and Eric D Ragan. 2018. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems. arXiv (2018), arXiv--1811.Google ScholarGoogle Scholar
  67. Menaka Narayanan, Emily Chen, Jeffrey He, Been Kim, Sam Gershman, and Finale Doshi-Velez. 2018. How do humans understand explanations from machine learning systems? an evaluation of the human-interpretability of explanation. arXiv preprint arXiv:1802.00682 (2018).Google ScholarGoogle Scholar
  68. Heather L O'Brien, Paul Cairns, and Mark Hall. 2018. A practical approach to measuring user engagement with the refined user engagement scale (UES) and new UES short form. International Journal of Human-Computer Studies, Vol. 112 (2018), 28--39.Google ScholarGoogle ScholarCross RefCross Ref
  69. Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2018. Manipulating and measuring model interpretability. arXiv preprint arXiv:1802.07810 (2018).Google ScholarGoogle Scholar
  70. Hema Raghavan, Omid Madani, and Rosie Jones. 2006. Active learning with feedback on features and instances. Journal of Machine Learning Research, Vol. 7, Aug (2006), 1655--1686.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Donghao Ren, Saleema Amershi, Bongshin Lee, Jina Suh, and Jason D Williams. 2016. Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE transactions on visualization and computer graphics, Vol. 23, 1 (2016), 61--70.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 1135--1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  74. Avi Rosenfeld and Ariella Richardson. 2019. Explainability in human--agent systems. Autonomous Agents and Multi-Agent Systems, Vol. 33, 6 (2019), 673--705.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Stephanie L Rosenthal and Anind K Dey. 2010. Towards maximizing the accuracy of human-labeled sensor data. In Proceedings of the 15th international conference on Intelligent user interfaces. ACM, 259--268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Giovanni Saponaro and Alexandre Bernardino. 2011. Generation of meaningful robot expressions with active learning. In 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 243--244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Burr Settles. 2009. Active learning literature survey . Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google ScholarGoogle Scholar
  78. Burr Settles. 2011. Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1467--1478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Burr Settles and Mark Craven. 2008. An analysis of active learning strategies for sequence labeling tasks. In Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 1070--1079.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. H Sebastian Seung, Manfred Opper, and Haim Sompolinsky. 1992. Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory. ACM, 287--294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Pannaga Shivaswamy and Thorsten Joachims. 2015. Coactive learning. Journal of Artificial Intelligence Research, Vol. 53 (2015), 1--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Patrice Y Simard, Saleema Amershi, David M Chickering, Alicia Edelman Pelton, Soroush Ghorashi, Christopher Meek, Gonzalo Ramos, Jina Suh, Johan Verwey, Mo Wang, et almbox. 2017. Machine teaching: A new paradigm for building machine learning systems. arXiv preprint arXiv:1707.06742 (2017).Google ScholarGoogle Scholar
  83. Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. 2018. Closing the loop: User-centered design and evaluation of a human-in-the-loop topic modeling system. In 23rd International Conference on Intelligent User Interfaces. 293--304.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Alison Smith-Renner, Ron Fan, Melissa Birchfield, Tongshuang Wu, Jordan Boyd-Graber, Dan Weld, and Leah Findlater. [n.d.]. No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML. ([n.,d.]).Google ScholarGoogle Scholar
  85. Kacper Sokol and Peter Flach. 2020. Explainability fact sheets: a framework for systematic assessment of explainable approaches. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 56--67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Jinhua Song, Hao Wang, Yang Gao, and Bo An. 2018. Active learning with confidence-based answers for crowdsourcing labeling tasks. Knowledge-Based Systems, Vol. 159 (2018), 244--258.Google ScholarGoogle ScholarCross RefCross Ref
  87. Aaron Springer and Steve Whittaker. 2019. Progressive disclosure: empirically motivated approaches to designing effective transparency. In Proceedings of the 24th International Conference on Intelligent User Interfaces. ACM, 107--120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Simone Stumpf, Adrian Bussone, and Dympna O?sullivan. 2016. Explanations considered harmful? user interactions with machine learning systems. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI) .Google ScholarGoogle Scholar
  89. Simone Stumpf, Vidya Rajaram, Lida Li, Margaret Burnett, Thomas Dietterich, Erin Sullivan, Russell Drummond, and Jonathan Herlocker. 2007. Toward harnessing user feedback for machine learning. In Proceedings of the 12th international conference on Intelligent user interfaces. ACM, 82--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen Wong, Margaret Burnett, Thomas Dietterich, Erin Sullivan, and Jonathan Herlocker. 2009. Interacting meaningfully with machine learning systems: Three experiments. International Journal of Human-Computer Studies, Vol. 67, 8 (2009), 639--662.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Justin Talbot, Bongshin Lee, Ashish Kapoor, and Desney S Tan. 2009. EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1283--1292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Stefano Teso and Kristian Kersting. 2018. "Why Should I Trust Interactive Learners?" Explaining Interactive Queries of Classifiers to Users. arXiv preprint arXiv:1805.08578 (2018).Google ScholarGoogle Scholar
  93. Richard Tomsett, Dave Braines, Dan Harborne, Alun Preece, and Supriyo Chakraborty. 2018. Interpretable to whom? A role-based model for analyzing interpretable machine learning systems. arXiv preprint arXiv:1806.07552 (2018).Google ScholarGoogle Scholar
  94. Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y Lim. 2019. Designing theory-driven user-centric explainable AI. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Henry M Wellman and Kristin H Lagattuta. 2004. Theory of mind for learning and teaching: The nature and role of explanation. Cognitive development, Vol. 19, 4 (2004), 479--497.Google ScholarGoogle Scholar
  96. James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viégas, and Jimbo Wilson. 2019. The What-If Tool: Interactive probing of machine learning models. IEEE transactions on visualization and computer graphics, Vol. 26, 1 (2019), 56--65.Google ScholarGoogle Scholar
  97. Tongshuang Wu, Daniel S Weld, and Jeffrey Heer. 2019. Local Decision Pitfalls in Interactive Machine Learning: An Investigation into Feature Selection in Sentiment Analysis. ACM Transactions on Computer-Human Interaction (TOCHI), Vol. 26, 4 (2019), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Yazhou Yang and Marco Loog. 2018. A benchmark and comparison of active learning for logistic regression. Pattern Recognition, Vol. 83 (2018), 401--415.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Jiawei Zhang, Yang Wang, Piero Molino, Lezhi Li, and David S Ebert. 2018. Manifold: A model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE transactions on visualization and computer graphics, Vol. 25, 1 (2018), 364--373.Google ScholarGoogle Scholar
  100. Yunfeng Zhang, Q Vera Liao, and Rachel KE Bellamy. 2020. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Zhiqiang Zheng and Balaji Padmanabhan. 2002. On active learning for data acquisition. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings. IEEE, 562--569.Google ScholarGoogle ScholarCross RefCross Ref
  102. Xiaojin Jerry Zhu. 2005. Semi-supervised learning literature survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google ScholarGoogle Scholar

Index Terms

  1. Explainable Active Learning (XAL): Toward AI Explanations as Interfaces for Machine Teachers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Human-Computer Interaction
        Proceedings of the ACM on Human-Computer Interaction  Volume 4, Issue CSCW3
        CSCW
        December 2020
        1825 pages
        EISSN:2573-0142
        DOI:10.1145/3446568
        Issue’s Table of Contents

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 5 January 2021
        Published in pacmhci Volume 4, Issue CSCW3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader