skip to main content
10.1145/3531146.3533179acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open Access

The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

Authors Info & Claims
Published:20 June 2022Publication History

ABSTRACT

Machine learning models in safety-critical settings like healthcare are often “blackboxes”: they contain a large number of parameters which are not transparent to users. Post-hoc explainability methods where a simple, human-interpretable model imitates the behavior of these blackbox models are often proposed to help users trust model predictions. In this work, we audit the quality of such explanations for different protected subgroups using real data from four settings in finance, healthcare, college admissions, and the US justice system. Across two different blackbox model architectures and four popular explainability methods, we find that the approximation quality of explanation models, also known as the fidelity, differs significantly between subgroups. We also demonstrate that pairing explainability methods with recent advances in robust machine learning can improve explanation fairness in some settings. However, we highlight the importance of communicating details of non-zero fidelity gaps to users, since a single solution might not exist across all settings. Finally, we discuss the implications of unfair explanation models as a challenging and understudied problem facing the machine learning community.

References

  1. Robert Adragna, Elliot Creager, David Madras, and Richard Zemel. 2020. Fairness and robustness in invariant learning: A case study in toxicity classification. arXiv preprint arXiv:2011.06485(2020).Google ScholarGoogle Scholar
  2. Alekh Agarwal, Alina Beygelzimer, Miroslav Dudik, John Langford, and Hanna Wallach. 2018. A Reductions Approach to Fair Classification. In International Conference on Machine Learning (ICML). 60–69.Google ScholarGoogle Scholar
  3. Muhammad Aurangzeb Ahmad, Carly Eckert, and Ankur Teredesai. 2018. Interpretable machine learning in healthcare. In Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. 559–560.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ulrich Aïvodji, Hiromi Arai, Olivier Fortineau, Sébastien Gambs, Satoshi Hara, and Alain Tapp. 2019. Fairwashing: the risk of rationalization. In International Conference on Machine Learning. PMLR, 161–170.Google ScholarGoogle Scholar
  5. Ulrich Aïvodji, Hiromi Arai, Sébastien Gambs, and Satoshi Hara. 2021. Characterizing the risk of fairwashing. arXiv preprint arXiv:2106.07504(2021).Google ScholarGoogle Scholar
  6. Benjamin Alarie, Anthony Niblett, and Albert H Yoon. 2016. Using machine learning to predict outcomes in tax law. Can. Bus. LJ 58(2016), 231.Google ScholarGoogle Scholar
  7. Hadis Anahideh, Abolfazl Asudeh, and Saravanan Thirumuruganathan. 2020. Fair active learning. arXiv preprint arXiv:2001.01796(2020).Google ScholarGoogle Scholar
  8. Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10, 7 (2015), e0130140.Google ScholarGoogle ScholarCross RefCross Ref
  9. Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, and Daniel S Weld. 2021. Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork. (2021).Google ScholarGoogle Scholar
  10. Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the whole exceed its parts? the effect of ai explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Richard Berk. 2017. An impact assessment of machine learning risk forecasts on parole board decisions and recidivism. Journal of Experimental Criminology 13, 2 (2017), 193–216.Google ScholarGoogle ScholarCross RefCross Ref
  12. Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 648–657.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Adrien Bibal, Michael Lognoul, Alexandre De Streel, and Benoît Frénay. 2021. Legal requirements on explainability in machine learning. Artificial Intelligence and Law 29, 2 (2021), 149–169.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tiago Botari, Frederik Hvilshøj, Rafael Izbicki, and Andre CPLF de Carvalho. 2020. MeLIME: meaningful local explanation for machine learning models. arXiv preprint arXiv:2009.05818(2020).Google ScholarGoogle Scholar
  15. Glenn W Brier 1950. Verification of forecasts expressed in terms of probability. Monthly weather review 78, 1 (1950), 1–3.Google ScholarGoogle Scholar
  16. Zana Buçinca, Phoebe Lin, Krzysztof Z Gajos, and Elena L Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable ai systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 454–464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1(2021), 1–21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Nadia Burkart and Marco F Huber. 2021. A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research 70 (2021), 245–317.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Niklas Bussmann, Paolo Giudici, Dimitri Marinelli, and Jochen Papenbrock. 2021. Explainable machine learning in credit risk management. Computational Economics 57, 1 (2021), 203–216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Simon Caton and Christian Haas. 2020. Fairness in machine learning: A survey. arXiv preprint arXiv:2010.04053(2020).Google ScholarGoogle Scholar
  21. Chun-Hao Chang, Sarah Tan, Ben Lengerich, Anna Goldenberg, and Rich Caruana. 2021. How interpretable and trustworthy are gams?. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 95–105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Irene Chen, Fredrik D Johansson, and David Sontag. 2018. Why is my classifier discriminatory?arXiv preprint arXiv:1805.12002(2018).Google ScholarGoogle Scholar
  23. Irene Y Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi. 2020. Ethical Machine Learning in Healthcare. Annual Review of Biomedical Data Science 4 (2020).Google ScholarGoogle Scholar
  24. John Chen, Ian Berlot-Attwell, Safwan Hossain, Xindi Wang, and Frank Rudzicz. 2020. Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP.Google ScholarGoogle Scholar
  25. Valerie Chen, Jeffrey Li, Joon Sik Kim, Gregory Plumb, and Ameet Talwalkar. 2022. Interpretable Machine Learning: Moving from mythos to diagnostics. Queue 19, 6 (2022), 28–56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153–163.Google ScholarGoogle Scholar
  27. Alexandra Chouldechova and Aaron Roth. 2018. The frontiers of fairness in machine learning. arXiv preprint arXiv:1810.08810(2018).Google ScholarGoogle Scholar
  28. Sam Corbett-Davies and Sharad Goel. 2018. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023(2018).Google ScholarGoogle Scholar
  29. The Law School Admission Council. 2018. Legal Education Data Library. https://www.lsac.org/data-research/data/current-volume-summaries-region-raceethnicity-gender-identity-lsat-scoreGoogle ScholarGoogle Scholar
  30. Mark Craven and Jude Shavlik. 1995. Extracting tree-structured representations of trained networks. Advances in neural information processing systems 8 (1995), 24–30.Google ScholarGoogle Scholar
  31. Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H Bach, and Himabindu Lakkaraju. 2022. Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jessica Dai, Sohini Upadhyay, Stephen H Bach, and Himabindu Lakkaraju. 2021. What will it take to generate fairness-preserving explanations?arXiv preprint arXiv:2106.13346(2021).Google ScholarGoogle Scholar
  33. Emily Diana, Wesley Gill, Michael Kearns, Krishnaram Kenthapadi, and Aaron Roth. 2021. Minimax group fairness: Algorithms and experiments. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 66–76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: People erroneously avoid algorithms after seeing them err.Journal of Experimental Psychology: General 144, 1 (2015), 114.Google ScholarGoogle Scholar
  35. Jaap J Dijkstra, Wim BG Liebrand, and Ellen Timminga. 1998. Persuasiveness of expert systems. Behaviour & Information Technology 17, 3 (1998), 155–163.Google ScholarGoogle ScholarCross RefCross Ref
  36. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608(2017).Google ScholarGoogle Scholar
  37. Filip Karlo Došilović, Mario Brčić, and Nikica Hlupić. 2018. Explainable artificial intelligence: A survey. In 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, 0210–0215.Google ScholarGoogle Scholar
  38. Jad Doughman, Wael Khreich, Maya El Gharib, Maha Wiss, and Zahraa Berjawi. 2021. Gender Bias in Text: Origin, Taxonomy, and Implications. In Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing. 34–44.Google ScholarGoogle ScholarCross RefCross Ref
  39. Mengnan Du, Ninghao Liu, and Xia Hu. 2019. Techniques for interpretable machine learning. Commun. ACM 63, 1 (2019), 68–77.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle ScholarGoogle Scholar
  41. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. 2018. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  43. Marzyeh Ghassemi, Luke Oakden-Rayner, and Andrew L Beam. 2021. The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health 3, 11 (2021), e745–e750.Google ScholarGoogle ScholarCross RefCross Ref
  44. Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681–3688.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sindhu CM Gowda, Shalmali Joshi, Haoran Zhang, and Marzyeh Ghassemi. 2021. Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 606–616.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gummadi, and Adrian Weller. 2016. The case for process fairness in learning: Feature selection for fair decision making. In NIPS symposium on machine learning and the law, Vol. 1. 2.Google ScholarGoogle Scholar
  47. Xudong Han, Timothy Baldwin, and Trevor Cohn. 2021. Balancing out Bias: Achieving Fairness Through Training Reweighting. arXiv preprint arXiv:2109.08253(2021).Google ScholarGoogle Scholar
  48. Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. arxiv:1610.02413 [cs.LG]Google ScholarGoogle Scholar
  49. Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. 2019. Multitask learning and benchmarking with clinical time series data. Scientific data 6, 1 (2019), 1–18.Google ScholarGoogle Scholar
  50. Trevor J Hastie and Robert J Tibshirani. 2017. Generalized additive models. Routledge.Google ScholarGoogle Scholar
  51. Andreas Holzinger. 2018. From machine learning to explainable AI. In 2018 world symposium on digital intelligence for systems and machines (DISA). IEEE, 55–66.Google ScholarGoogle Scholar
  52. Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058(2020).Google ScholarGoogle Scholar
  53. Lily Hu and Yiling Chen. 2020. Fair classification and social welfare. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 535–545.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Guido Imbens and Konrad Menzel. 2018. A Causal Bootstrap. Technical Report. National Bureau of Economic Research, Inc.Google ScholarGoogle Scholar
  55. Joseph Jamison. 2017. Applying Machine Learning to Predict Davidson College’s Admissions Yield. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education. 765–766.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Eun Seo Jo and Timnit Gebru. 2020. Lessons from archives: Strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 306–316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078(2015).Google ScholarGoogle Scholar
  58. Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2019. An empirical study of rich subgroup fairness for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 100–109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Sanjay Krishnan and Eugene Wu. 2017. Palm: Machine learning explanations for iterative debugging. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. 1–6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Himabindu Lakkaraju and Osbert Bastani. 2020. ” How do I fool you?” Manipulating User Trust via Misleading Black Box Explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 79–85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2017. Interpretable & explorable approximations of black box models. arXiv preprint arXiv:1707.01154(2017).Google ScholarGoogle Scholar
  62. Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2019. Faithful and customizable explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 131–138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky. 2015. Visualizing and understanding neural models in nlp. arXiv preprint arXiv:1506.01066(2015).Google ScholarGoogle Scholar
  64. Claire Liang, Julia Proft, Erik Andersen, and Ross A Knepper. 2019. Implicit communication of actionable information in human-ai teams. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Zachary C Lipton. 2018. The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery.Queue 16, 3 (2018), 31–57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Evan Z Liu, Behzad Haghgoo, Annie S Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, and Chelsea Finn. 2021. Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning. PMLR, 6781–6792.Google ScholarGoogle Scholar
  67. Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Raetsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. 2019. Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning. PMLR, 4114–4124.Google ScholarGoogle Scholar
  68. Jennifer M Logg, Julia A Minson, and Don A Moore. 2019. Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes 151 (2019), 90–103.Google ScholarGoogle ScholarCross RefCross Ref
  69. Yin Lou, Rich Caruana, and Johannes Gehrke. 2012. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 150–158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 1 (2020), 56–67.Google ScholarGoogle Scholar
  71. Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30 (2017), 4765–4774.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018. Learning adversarially fair and transferable representations. In International Conference on Machine Learning. PMLR, 3384–3393.Google ScholarGoogle Scholar
  73. Natalia Martinez, Martin Bertran, and Guillermo Sapiro. 2020. Minimax pareto fairness: A multi objective perspective. In International Conference on Machine Learning. PMLR, 6755–6764.Google ScholarGoogle Scholar
  74. Barbara Martinez Neda, Yue Zeng, and Sergio Gago-Masague. 2021. Using Machine Learning in Admissions: Reducing Human and Algorithmic Bias in the Selection Process. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 1323–1323.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR) 54, 6 (2021), 1–35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Aditya Krishna Menon, Ankit Singh Rawat, and Sanjiv Kumar. 2020. Overparameterisation and worst-case generalisation: friend or foe?. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  77. Brent Mittelstadt, Chris Russell, and Sandra Wachter. 2019. Explaining explanations in AI. In Proceedings of the conference on fairness, accountability, and transparency. 279–288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Hussein Mozannar and David Sontag. 2020. Consistent Estimators for Learning to Defer to an Expert. In Proceedings of the 37th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 7076–7087. https://proceedings.mlr.press/v119/mozannar20b.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  79. DJ Pangburn. 2019. Schools are using software to help pick who gets in. what could go wrong?https://www.fastcompany.com/90342596/schools-are-quietly-turning-to-ai-to-help-pick-who-gets-in-what-could-go-wrongGoogle ScholarGoogle Scholar
  80. Andrea Papenmeier, Gwenn Englebienne, and Christin Seifert. 2019. How model accuracy and explanation fidelity influence user trust in AI. In IJCAI Workshop on Explainable Artificial Intelligence (XAI) 2019.Google ScholarGoogle Scholar
  81. William Paul and Philippe Burlina. 2021. Generalizing Fairness: Discovery and Mitigation of Unknown Sensitive Attributes. arXiv preprint arXiv:2107.13625(2021).Google ScholarGoogle Scholar
  82. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825–2830.Google ScholarGoogle Scholar
  83. Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q. Weinberger. 2017. On Fairness and Calibration. In Advances in Neural Information Processing Systems (NeurIPS). 5684–5693.Google ScholarGoogle Scholar
  84. Gregory Plumb, Denali Molitor, and Ameet Talwalkar. 2018. Model agnostic supervised local explanations. arXiv preprint arXiv:1807.02910(2018).Google ScholarGoogle Scholar
  85. Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2021. Manipulating and measuring model interpretability. CHI 2021 (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Romila Pradhan, Jiongli Zhu, Boris Glavic, and Babak Salimi. 2021. Interpretable Data-Based Explanations for Fairness Debugging. arXiv preprint arXiv:2112.09745(2021).Google ScholarGoogle Scholar
  87. ProPublica. 2019. Compas recidivism risk score data and analysis.Google ScholarGoogle Scholar
  88. Nikaash Puri, Piyush Gupta, Pratiksha Agarwal, Sukriti Verma, and Balaji Krishnamurthy. 2017. Magix: Model agnostic globally interpretable explanations. arXiv preprint arXiv:1706.07160(2017).Google ScholarGoogle Scholar
  89. Hamed Rahimian and Sanjay Mehrotra. 2019. Distributionally robust optimization: A review. arXiv preprint arXiv:1908.05659(2019).Google ScholarGoogle Scholar
  90. Alvin Rajkomar, Michaela Hardt, Michael D Howell, Greg Corrado, and Marshall H Chin. 2018. Ensuring fairness in machine learning to advance health equity. Annals of internal medicine 169, 12 (2018), 866–872.Google ScholarGoogle ScholarCross RefCross Ref
  91. Shubham Rathi. 2019. Generating counterfactual and contrastive explanations using SHAP. arXiv preprint arXiv:1906.09293(2019).Google ScholarGoogle Scholar
  92. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386(2016).Google ScholarGoogle Scholar
  94. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  95. Ribana Roscher, Bastian Bohn, Marco F Duarte, and Jochen Garcke. 2020. Explainable machine learning for scientific insights and discoveries. Ieee Access 8(2020), 42200–42216.Google ScholarGoogle ScholarCross RefCross Ref
  96. Andrew Slavin Ross, Michael C Hughes, and Finale Doshi-Velez. 2017. Right for the right reasons: training differentiable models by constraining their explanations. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2662–2670.Google ScholarGoogle ScholarCross RefCross Ref
  97. Alvin E Roth. 1988. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press.Google ScholarGoogle Scholar
  98. Cynthia Rudin. 2014. Algorithms for interpretable machine learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1519–1519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.Google ScholarGoogle ScholarCross RefCross Ref
  100. Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. 2019. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731(2019).Google ScholarGoogle Scholar
  101. Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, and Percy Liang. 2020. Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization. arxiv:1911.08731 [cs.LG]Google ScholarGoogle Scholar
  102. Samira Samadi, Uthaipon Tantipongpipat, Jamie Morgenstern, Mohit Singh, and Santosh Vempala. 2018. The price of fair PCA: one extra dimension. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 10999–11010.Google ScholarGoogle Scholar
  103. Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In International Conference on Machine Learning. PMLR, 3145–3153.Google ScholarGoogle Scholar
  104. Vinith M Suriyakumar, Nicolas Papernot, Anna Goldenberg, and Marzyeh Ghassemi. 2021. Chasing Your Long Tails: Differentially Private Prediction in Health Care Settings. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 723–734.Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Sarah Tan, Rich Caruana, Giles Hooker, and Yin Lou. 2018. Distill-and-compare: Auditing black-box models using transparent model distillation. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 303–310.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Philipp Tschandl, Christoph Rinner, Zoe Apalla, Giuseppe Argenziano, Noel Codella, Allan Halpern, Monika Janda, Aimilios Lallas, Caterina Longo, Josep Malvehy, 2020. Human–computer collaboration for skin cancer recognition. Nature Medicine 26, 8 (2020), 1229–1234.Google ScholarGoogle ScholarCross RefCross Ref
  107. Lukas Tuggener, Mohammadreza Amirian, Katharina Rombach, Stefan Lörwald, Anastasia Varlet, Christian Westermann, and Thilo Stadelmann. 2019. Automated machine learning in practice: state of the art and recent results. In 2019 6th Swiss Conference on Data Science (SDS). IEEE, 31–36.Google ScholarGoogle ScholarCross RefCross Ref
  108. Berk Ustun and Cynthia Rudin. 2016. Supersparse linear integer models for optimized medical scoring systems. Machine Learning 102, 3 (2016), 349–391.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Christina Wadsworth, Francesca Vera, and Chris Piech. 2018. Achieving fairness through adversarial learning: an application to recidivism prediction. arXiv preprint arXiv:1807.00199(2018).Google ScholarGoogle Scholar
  110. Caroline Wang, Bin Han, Bhrij Patel, Feroze Mohideen, and Cynthia Rudin. 2020. In pursuit of interpretable, fair and accurate machine learning for criminal recidivism prediction. arXiv preprint arXiv:2005.04176(2020).Google ScholarGoogle Scholar
  111. Dakuo Wang, Elizabeth Churchill, Pattie Maes, Xiangmin Fan, Ben Shneiderman, Yuanchun Shi, and Qianying Wang. 2020. From human-human collaboration to human-ai collaboration: Designing ai systems that can work together with people. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1–6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. Qiong Wei and Roland L Dunbrack Jr. 2013. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PloS one 8, 7 (2013), e67863.Google ScholarGoogle ScholarCross RefCross Ref
  113. Michael Wick, Swetasudha Panda, and Jean-Baptiste Tristan. 2019. Unlocking fairness: a trade-off revisited. (2019).Google ScholarGoogle Scholar
  114. Linda F Wightman. 1998. LSAC national longitudinal bar passage study. Law School Admission Council.Google ScholarGoogle Scholar
  115. Eric Wong, Shibani Santurkar, and Aleksander Mądry. 2021. Leveraging Sparse Linear Layers for Debuggable Deep Networks. arXiv preprint arXiv:2105.04857(2021).Google ScholarGoogle Scholar
  116. Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. 2017. Fairness constraints: Mechanisms for fair classification. In Artificial Intelligence and Statistics. PMLR, 962–970.Google ScholarGoogle Scholar
  117. Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International conference on machine learning. PMLR, 325–333.Google ScholarGoogle Scholar
  118. Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 335–340.Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Haoran Zhang, Natalie Dullerud, Karsten Roth, Lauren Oakden-Rayner, Stephen Pfohl, and Marzyeh Ghassemi. 2022. Improving the Fairness of Chest X-ray Classifiers. In Conference on Health, Inference, and Learning. PMLR, 204–233.Google ScholarGoogle Scholar
  120. Haoran Zhang, Quaid Morris, Berk Ustun, and Marzyeh Ghassemi. 2021. Learning Optimal Predictive Checklists. Advances in Neural Information Processing Systems 34 (2021).Google ScholarGoogle Scholar
  121. Indre Zliobaite. 2015. On the relation between accuracy and fairness in binary classification. arXiv preprint arXiv:1505.05723(2015).Google ScholarGoogle Scholar

Index Terms

  1. The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations
                Index terms have been assigned to the content through auto-classification.

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Other conferences
                  FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
                  June 2022
                  2351 pages
                  ISBN:9781450393522
                  DOI:10.1145/3531146

                  Copyright © 2022 Owner/Author

                  This work is licensed under a Creative Commons Attribution International 4.0 License.

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 20 June 2022

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed limited

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                HTML Format

                View this article in HTML Format .

                View HTML Format