research-article

Open Access

The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

Authors:
Aparna Balagopalan

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Haoran Zhang

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Kimia Hamidieh

University of Toronto/Vector Institute, Canada

University of Toronto/Vector Institute, Canada
View Profile

,
Thomas Hartvigsen

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Frank Rudzicz

University of Toronto/Vector Institute, Canada and Unity Health Toronto, Canada

University of Toronto/Vector Institute, Canada and Unity Health Toronto, Canada
View Profile

,
Marzyeh Ghassemi

Massachusetts Institute of Technology, USA and Vector Institute, Canada

Massachusetts Institute of Technology, USA and Vector Institute, Canada
View Profile

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and TransparencyJune 2022Pages 1194–1206https://doi.org/10.1145/3531146.3533179

Published:20 June 2022Publication History

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

Pages 1194–1206

ABSTRACT

Machine learning models in safety-critical settings like healthcare are often “blackboxes”: they contain a large number of parameters which are not transparent to users. Post-hoc explainability methods where a simple, human-interpretable model imitates the behavior of these blackbox models are often proposed to help users trust model predictions. In this work, we audit the quality of such explanations for different protected subgroups using real data from four settings in finance, healthcare, college admissions, and the US justice system. Across two different blackbox model architectures and four popular explainability methods, we find that the approximation quality of explanation models, also known as the fidelity, differs significantly between subgroups. We also demonstrate that pairing explainability methods with recent advances in robust machine learning can improve explanation fairness in some settings. However, we highlight the importance of communicating details of non-zero fidelity gaps to users, since a single solution might not exist across all settings. Finally, we discuss the implications of unfair explanation models as a challenging and understudied problem facing the machine learning community.

References

Robert Adragna, Elliot Creager, David Madras, and Richard Zemel. 2020. Fairness and robustness in invariant learning: A case study in toxicity classification. arXiv preprint arXiv:2011.06485(2020).Google Scholar
Alekh Agarwal, Alina Beygelzimer, Miroslav Dudik, John Langford, and Hanna Wallach. 2018. A Reductions Approach to Fair Classification. In International Conference on Machine Learning (ICML). 60–69.Google Scholar
Muhammad Aurangzeb Ahmad, Carly Eckert, and Ankur Teredesai. 2018. Interpretable machine learning in healthcare. In Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. 559–560.Google ScholarDigital Library
Ulrich Aïvodji, Hiromi Arai, Olivier Fortineau, Sébastien Gambs, Satoshi Hara, and Alain Tapp. 2019. Fairwashing: the risk of rationalization. In International Conference on Machine Learning. PMLR, 161–170.Google Scholar
Ulrich Aïvodji, Hiromi Arai, Sébastien Gambs, and Satoshi Hara. 2021. Characterizing the risk of fairwashing. arXiv preprint arXiv:2106.07504(2021).Google Scholar
Benjamin Alarie, Anthony Niblett, and Albert H Yoon. 2016. Using machine learning to predict outcomes in tax law. Can. Bus. LJ 58(2016), 231.Google Scholar
Hadis Anahideh, Abolfazl Asudeh, and Saravanan Thirumuruganathan. 2020. Fair active learning. arXiv preprint arXiv:2001.01796(2020).Google Scholar
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. 2015. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10, 7 (2015), e0130140.Google ScholarCross Ref
Gagan Bansal, Besmira Nushi, Ece Kamar, Eric Horvitz, and Daniel S Weld. 2021. Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork. (2021).Google Scholar
Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the whole exceed its parts? the effect of ai explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.Google ScholarDigital Library
Richard Berk. 2017. An impact assessment of machine learning risk forecasts on parole board decisions and recidivism. Journal of Experimental Criminology 13, 2 (2017), 193–216.Google ScholarCross Ref
Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 648–657.Google ScholarDigital Library
Adrien Bibal, Michael Lognoul, Alexandre De Streel, and Benoît Frénay. 2021. Legal requirements on explainability in machine learning. Artificial Intelligence and Law 29, 2 (2021), 149–169.Google ScholarDigital Library
Tiago Botari, Frederik Hvilshøj, Rafael Izbicki, and Andre CPLF de Carvalho. 2020. MeLIME: meaningful local explanation for machine learning models. arXiv preprint arXiv:2009.05818(2020).Google Scholar
Glenn W Brier 1950. Verification of forecasts expressed in terms of probability. Monthly weather review 78, 1 (1950), 1–3.Google Scholar
Zana Buçinca, Phoebe Lin, Krzysztof Z Gajos, and Elena L Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable ai systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 454–464.Google ScholarDigital Library
Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1(2021), 1–21.Google ScholarDigital Library
Nadia Burkart and Marco F Huber. 2021. A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research 70 (2021), 245–317.Google ScholarDigital Library
Niklas Bussmann, Paolo Giudici, Dimitri Marinelli, and Jochen Papenbrock. 2021. Explainable machine learning in credit risk management. Computational Economics 57, 1 (2021), 203–216.Google ScholarDigital Library
Simon Caton and Christian Haas. 2020. Fairness in machine learning: A survey. arXiv preprint arXiv:2010.04053(2020).Google Scholar
Chun-Hao Chang, Sarah Tan, Ben Lengerich, Anna Goldenberg, and Rich Caruana. 2021. How interpretable and trustworthy are gams?. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 95–105.Google ScholarDigital Library
Irene Chen, Fredrik D Johansson, and David Sontag. 2018. Why is my classifier discriminatory?arXiv preprint arXiv:1805.12002(2018).Google Scholar
Irene Y Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi. 2020. Ethical Machine Learning in Healthcare. Annual Review of Biomedical Data Science 4 (2020).Google Scholar
John Chen, Ian Berlot-Attwell, Safwan Hossain, Xindi Wang, and Frank Rudzicz. 2020. Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP.Google Scholar
Valerie Chen, Jeffrey Li, Joon Sik Kim, Gregory Plumb, and Ameet Talwalkar. 2022. Interpretable Machine Learning: Moving from mythos to diagnostics. Queue 19, 6 (2022), 28–56.Google ScholarDigital Library
Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153–163.Google Scholar
Alexandra Chouldechova and Aaron Roth. 2018. The frontiers of fairness in machine learning. arXiv preprint arXiv:1810.08810(2018).Google Scholar
Sam Corbett-Davies and Sharad Goel. 2018. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023(2018).Google Scholar
The Law School Admission Council. 2018. Legal Education Data Library. https://www.lsac.org/data-research/data/current-volume-summaries-region-raceethnicity-gender-identity-lsat-scoreGoogle Scholar
Mark Craven and Jude Shavlik. 1995. Extracting tree-structured representations of trained networks. Advances in neural information processing systems 8 (1995), 24–30.Google Scholar
Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H Bach, and Himabindu Lakkaraju. 2022. Fairness via explanation quality: Evaluating disparities in the quality of post hoc explanations. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society.Google ScholarDigital Library
Jessica Dai, Sohini Upadhyay, Stephen H Bach, and Himabindu Lakkaraju. 2021. What will it take to generate fairness-preserving explanations?arXiv preprint arXiv:2106.13346(2021).Google Scholar
Emily Diana, Wesley Gill, Michael Kearns, Krishnaram Kenthapadi, and Aaron Roth. 2021. Minimax group fairness: Algorithms and experiments. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 66–76.Google ScholarDigital Library
Berkeley J Dietvorst, Joseph P Simmons, and Cade Massey. 2015. Algorithm aversion: People erroneously avoid algorithms after seeing them err.Journal of Experimental Psychology: General 144, 1 (2015), 114.Google Scholar
Jaap J Dijkstra, Wim BG Liebrand, and Ellen Timminga. 1998. Persuasiveness of expert systems. Behaviour & Information Technology 17, 3 (1998), 155–163.Google ScholarCross Ref
Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608(2017).Google Scholar
Filip Karlo Došilović, Mario Brčić, and Nikica Hlupić. 2018. Explainable artificial intelligence: A survey. In 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, 0210–0215.Google Scholar
Jad Doughman, Wael Khreich, Maya El Gharib, Maha Wiss, and Zahraa Berjawi. 2021. Gender Bias in Text: Origin, Taxonomy, and Implications. In Proceedings of the 3rd Workshop on Gender Bias in Natural Language Processing. 34–44.Google ScholarCross Ref
Mengnan Du, Ninghao Liu, and Xia Hu. 2019. Techniques for interpretable machine learning. Commun. ACM 63, 1 (2019), 68–77.Google ScholarDigital Library
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214–226.Google ScholarDigital Library
Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. 2018. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In International Conference on Learning Representations.Google Scholar
Marzyeh Ghassemi, Luke Oakden-Rayner, and Andrew L Beam. 2021. The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health 3, 11 (2021), e745–e750.Google ScholarCross Ref
Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3681–3688.Google ScholarDigital Library
Sindhu CM Gowda, Shalmali Joshi, Haoran Zhang, and Marzyeh Ghassemi. 2021. Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 606–616.Google ScholarDigital Library
Nina Grgic-Hlaca, Muhammad Bilal Zafar, Krishna P Gummadi, and Adrian Weller. 2016. The case for process fairness in learning: Feature selection for fair decision making. In NIPS symposium on machine learning and the law, Vol. 1. 2.Google Scholar
Xudong Han, Timothy Baldwin, and Trevor Cohn. 2021. Balancing out Bias: Achieving Fairness Through Training Reweighting. arXiv preprint arXiv:2109.08253(2021).Google Scholar
Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. arxiv:1610.02413 [cs.LG]Google Scholar
Hrayr Harutyunyan, Hrant Khachatrian, David C Kale, Greg Ver Steeg, and Aram Galstyan. 2019. Multitask learning and benchmarking with clinical time series data. Scientific data 6, 1 (2019), 1–18.Google Scholar
Trevor J Hastie and Robert J Tibshirani. 2017. Generalized additive models. Routledge.Google Scholar
Andreas Holzinger. 2018. From machine learning to explainable AI. In 2018 world symposium on digital intelligence for systems and machines (DISA). IEEE, 55–66.Google Scholar
Sara Hooker, Nyalleng Moorosi, Gregory Clark, Samy Bengio, and Emily Denton. 2020. Characterising bias in compressed models. arXiv preprint arXiv:2010.03058(2020).Google Scholar
Lily Hu and Yiling Chen. 2020. Fair classification and social welfare. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 535–545.Google ScholarDigital Library
Guido Imbens and Konrad Menzel. 2018. A Causal Bootstrap. Technical Report. National Bureau of Economic Research, Inc.Google Scholar
Joseph Jamison. 2017. Applying Machine Learning to Predict Davidson College’s Admissions Yield. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education. 765–766.Google ScholarDigital Library
Eun Seo Jo and Timnit Gebru. 2020. Lessons from archives: Strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 306–316.Google ScholarDigital Library
Andrej Karpathy, Justin Johnson, and Li Fei-Fei. 2015. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078(2015).Google Scholar
Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2019. An empirical study of rich subgroup fairness for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 100–109.Google ScholarDigital Library
Sanjay Krishnan and Eugene Wu. 2017. Palm: Machine learning explanations for iterative debugging. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. 1–6.Google ScholarDigital Library
Himabindu Lakkaraju and Osbert Bastani. 2020. ” How do I fool you?” Manipulating User Trust via Misleading Black Box Explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 79–85.Google ScholarDigital Library
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2017. Interpretable & explorable approximations of black box models. arXiv preprint arXiv:1707.01154(2017).Google Scholar
Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. 2019. Faithful and customizable explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 131–138.Google ScholarDigital Library
Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Jurafsky. 2015. Visualizing and understanding neural models in nlp. arXiv preprint arXiv:1506.01066(2015).Google Scholar
Claire Liang, Julia Proft, Erik Andersen, and Ross A Knepper. 2019. Implicit communication of actionable information in human-ai teams. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.Google ScholarDigital Library
Zachary C Lipton. 2018. The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery.Queue 16, 3 (2018), 31–57.Google ScholarDigital Library
Evan Z Liu, Behzad Haghgoo, Annie S Chen, Aditi Raghunathan, Pang Wei Koh, Shiori Sagawa, Percy Liang, and Chelsea Finn. 2021. Just train twice: Improving group robustness without training group information. In International Conference on Machine Learning. PMLR, 6781–6792.Google Scholar
Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Raetsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. 2019. Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning. PMLR, 4114–4124.Google Scholar
Jennifer M Logg, Julia A Minson, and Don A Moore. 2019. Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes 151 (2019), 90–103.Google ScholarCross Ref
Yin Lou, Rich Caruana, and Johannes Gehrke. 2012. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 150–158.Google ScholarDigital Library
Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 1 (2020), 56–67.Google Scholar
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30 (2017), 4765–4774.Google ScholarDigital Library
David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2018. Learning adversarially fair and transferable representations. In International Conference on Machine Learning. PMLR, 3384–3393.Google Scholar
Natalia Martinez, Martin Bertran, and Guillermo Sapiro. 2020. Minimax pareto fairness: A multi objective perspective. In International Conference on Machine Learning. PMLR, 6755–6764.Google Scholar
Barbara Martinez Neda, Yue Zeng, and Sergio Gago-Masague. 2021. Using Machine Learning in Admissions: Reducing Human and Algorithmic Bias in the Selection Process. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 1323–1323.Google ScholarDigital Library
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR) 54, 6 (2021), 1–35.Google ScholarDigital Library
Aditya Krishna Menon, Ankit Singh Rawat, and Sanjiv Kumar. 2020. Overparameterisation and worst-case generalisation: friend or foe?. In International Conference on Learning Representations.Google Scholar
Brent Mittelstadt, Chris Russell, and Sandra Wachter. 2019. Explaining explanations in AI. In Proceedings of the conference on fairness, accountability, and transparency. 279–288.Google ScholarDigital Library
Hussein Mozannar and David Sontag. 2020. Consistent Estimators for Learning to Defer to an Expert. In Proceedings of the 37th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 7076–7087. https://proceedings.mlr.press/v119/mozannar20b.htmlGoogle ScholarDigital Library
DJ Pangburn. 2019. Schools are using software to help pick who gets in. what could go wrong?https://www.fastcompany.com/90342596/schools-are-quietly-turning-to-ai-to-help-pick-who-gets-in-what-could-go-wrongGoogle Scholar
Andrea Papenmeier, Gwenn Englebienne, and Christin Seifert. 2019. How model accuracy and explanation fidelity influence user trust in AI. In IJCAI Workshop on Explainable Artificial Intelligence (XAI) 2019.Google Scholar
William Paul and Philippe Burlina. 2021. Generalizing Fairness: Discovery and Mitigation of Unknown Sensitive Attributes. arXiv preprint arXiv:2107.13625(2021).Google Scholar
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825–2830.Google Scholar
Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q. Weinberger. 2017. On Fairness and Calibration. In Advances in Neural Information Processing Systems (NeurIPS). 5684–5693.Google Scholar
Gregory Plumb, Denali Molitor, and Ameet Talwalkar. 2018. Model agnostic supervised local explanations. arXiv preprint arXiv:1807.02910(2018).Google Scholar
Forough Poursabzi-Sangdeh, Daniel G Goldstein, Jake M Hofman, Jennifer Wortman Vaughan, and Hanna Wallach. 2021. Manipulating and measuring model interpretability. CHI 2021 (2021).Google ScholarDigital Library
Romila Pradhan, Jiongli Zhu, Boris Glavic, and Babak Salimi. 2021. Interpretable Data-Based Explanations for Fairness Debugging. arXiv preprint arXiv:2112.09745(2021).Google Scholar
ProPublica. 2019. Compas recidivism risk score data and analysis.Google Scholar
Nikaash Puri, Piyush Gupta, Pratiksha Agarwal, Sukriti Verma, and Balaji Krishnamurthy. 2017. Magix: Model agnostic globally interpretable explanations. arXiv preprint arXiv:1706.07160(2017).Google Scholar
Hamed Rahimian and Sanjay Mehrotra. 2019. Distributionally robust optimization: A review. arXiv preprint arXiv:1908.05659(2019).Google Scholar
Alvin Rajkomar, Michaela Hardt, Michael D Howell, Greg Corrado, and Marshall H Chin. 2018. Ensuring fairness in machine learning to advance health equity. Annals of internal medicine 169, 12 (2018), 866–872.Google ScholarCross Ref
Shubham Rathi. 2019. Generating counterfactual and contrastive explanations using SHAP. arXiv preprint arXiv:1906.09293(2019).Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.Google ScholarDigital Library
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386(2016).Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarCross Ref
Ribana Roscher, Bastian Bohn, Marco F Duarte, and Jochen Garcke. 2020. Explainable machine learning for scientific insights and discoveries. Ieee Access 8(2020), 42200–42216.Google ScholarCross Ref
Andrew Slavin Ross, Michael C Hughes, and Finale Doshi-Velez. 2017. Right for the right reasons: training differentiable models by constraining their explanations. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2662–2670.Google ScholarCross Ref
Alvin E Roth. 1988. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge University Press.Google Scholar
Cynthia Rudin. 2014. Algorithms for interpretable machine learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1519–1519.Google ScholarDigital Library
Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.Google ScholarCross Ref
Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. 2019. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731(2019).Google Scholar
Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, and Percy Liang. 2020. Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization. arxiv:1911.08731 [cs.LG]Google Scholar
Samira Samadi, Uthaipon Tantipongpipat, Jamie Morgenstern, Mohit Singh, and Santosh Vempala. 2018. The price of fair PCA: one extra dimension. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. 10999–11010.Google Scholar
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In International Conference on Machine Learning. PMLR, 3145–3153.Google Scholar
Vinith M Suriyakumar, Nicolas Papernot, Anna Goldenberg, and Marzyeh Ghassemi. 2021. Chasing Your Long Tails: Differentially Private Prediction in Health Care Settings. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 723–734.Google ScholarDigital Library
Sarah Tan, Rich Caruana, Giles Hooker, and Yin Lou. 2018. Distill-and-compare: Auditing black-box models using transparent model distillation. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 303–310.Google ScholarDigital Library
Philipp Tschandl, Christoph Rinner, Zoe Apalla, Giuseppe Argenziano, Noel Codella, Allan Halpern, Monika Janda, Aimilios Lallas, Caterina Longo, Josep Malvehy, 2020. Human–computer collaboration for skin cancer recognition. Nature Medicine 26, 8 (2020), 1229–1234.Google ScholarCross Ref
Lukas Tuggener, Mohammadreza Amirian, Katharina Rombach, Stefan Lörwald, Anastasia Varlet, Christian Westermann, and Thilo Stadelmann. 2019. Automated machine learning in practice: state of the art and recent results. In 2019 6th Swiss Conference on Data Science (SDS). IEEE, 31–36.Google ScholarCross Ref
Berk Ustun and Cynthia Rudin. 2016. Supersparse linear integer models for optimized medical scoring systems. Machine Learning 102, 3 (2016), 349–391.Google ScholarDigital Library
Christina Wadsworth, Francesca Vera, and Chris Piech. 2018. Achieving fairness through adversarial learning: an application to recidivism prediction. arXiv preprint arXiv:1807.00199(2018).Google Scholar
Caroline Wang, Bin Han, Bhrij Patel, Feroze Mohideen, and Cynthia Rudin. 2020. In pursuit of interpretable, fair and accurate machine learning for criminal recidivism prediction. arXiv preprint arXiv:2005.04176(2020).Google Scholar
Dakuo Wang, Elizabeth Churchill, Pattie Maes, Xiangmin Fan, Ben Shneiderman, Yuanchun Shi, and Qianying Wang. 2020. From human-human collaboration to human-ai collaboration: Designing ai systems that can work together with people. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1–6.Google ScholarDigital Library
Qiong Wei and Roland L Dunbrack Jr. 2013. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PloS one 8, 7 (2013), e67863.Google ScholarCross Ref
Michael Wick, Swetasudha Panda, and Jean-Baptiste Tristan. 2019. Unlocking fairness: a trade-off revisited. (2019).Google Scholar
Linda F Wightman. 1998. LSAC national longitudinal bar passage study. Law School Admission Council.Google Scholar
Eric Wong, Shibani Santurkar, and Aleksander Mądry. 2021. Leveraging Sparse Linear Layers for Debuggable Deep Networks. arXiv preprint arXiv:2105.04857(2021).Google Scholar
Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P Gummadi. 2017. Fairness constraints: Mechanisms for fair classification. In Artificial Intelligence and Statistics. PMLR, 962–970.Google Scholar
Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. Learning fair representations. In International conference on machine learning. PMLR, 325–333.Google Scholar
Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. 2018. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. 335–340.Google ScholarDigital Library
Haoran Zhang, Natalie Dullerud, Karsten Roth, Lauren Oakden-Rayner, Stephen Pfohl, and Marzyeh Ghassemi. 2022. Improving the Fairness of Chest X-ray Classifiers. In Conference on Health, Inference, and Learning. PMLR, 204–233.Google Scholar
Haoran Zhang, Quaid Morris, Berk Ustun, and Marzyeh Ghassemi. 2021. Learning Optimal Predictive Checklists. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
Indre Zliobaite. 2015. On the relation between accuracy and fairness in binary classification. arXiv preprint arXiv:1505.05723(2015).Google Scholar

Index Terms

The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

Index terms have been assigned to the content through auto-classification.

Recommendations

Towards Explainability for AI Fairness
xxAI - Beyond Explainable AI
Abstract
AI explainability is becoming indispensable to allow users to gain insights into the AI system’s decision-making process. Meanwhile, fairness is another rising concern that algorithmic predictions may be misaligned to the designer’s intent or ...
Read More
Machine Learning Explainability and Robustness: Connected at the Hip
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

This tutorial examines the synergistic relationship between explainability methods for machine learning and a significant problem related to model quality: robustness against adversarial perturbations. We begin with a broad overview of approaches to ...
Read More
Fairness, explainability and in-between: understanding the impact of different explanation methods on non-expert users’ perceptions of fairness toward an algorithmic system
Abstract
In light of the widespread use of algorithmic (intelligent) systems across numerous domains, there is an increasing awareness about the need to explain their underlying decision-making process and resulting outcomes. Since oftentimes these systems ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
June 2022
2351 pages
ISBN:9781450393522
DOI:10.1145/3531146

Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2022
Check for updates
Author Tags
explainability
fairness
machine learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 2,168
  Total Downloads
- Downloads (Last 12 months)1,451
- Downloads (Last 6 weeks)127
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards Explainability for AI Fairness

Machine Learning Explainability and Robustness: Connected at the Hip

Fairness, explainability and in-between: understanding the impact of different explanation methods on non-expert users’ perceptions of fairness toward an algorithmic system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards Explainability for AI Fairness

Machine Learning Explainability and Robustness: Connected at the Hip

Fairness, explainability and in-between: understanding the impact of different explanation methods on non-expert users’ perceptions of fairness toward an algorithmic system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media