Skip to main content

Advertisement

Log in

Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law

  • Original Research
  • Published:
AI and Ethics Aims and scope Submit manuscript

Abstract

Organizations are increasingly employing complex black-box machine learning models in high-stakes decision-making. A popular approach to addressing the problem of opacity of black-box machine learning models is the use of post-hoc explainability methods. These methods approximate the logic of underlying machine learning models with the aim of explaining their internal workings, so that human examiners can understand them. In turn, it has been alluded that the insights from post-hoc explainability methods can be used to help regulate black-box machine learning. This article examines the validity of these claims. By examining whether the insights derived from post-hoc explainability methods in post-model deployment can prima facie meet legal definitions in European (read European Union) non-discrimination law, we argue that machine learning post-hoc explanation methods cannot guarantee the insights they generate.

Ultimately, we argue that the use of post-hoc explanatory methods is useful in many cases, but that these methods have limitations that prohibit reliance as the sole mechanism to guarantee fairness of model outcomes in high-stakes decision-making. By way of an ancillary function, the inadequacy of European Non-Discrimination Law for algorithmic decision-making is demonstrated too.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Adadi, A., Berrada, M.: Peeking inside the Black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/access.2018.2870052

    Article  Google Scholar 

  2. Ahmad, M.A., Eckert, C., Teredesai, A.:. Interpretable machine learning in healthcare. In: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. https://doi.org/10.1145/3233547.3233667 (2018)

  3. Alom, M., Taha, T., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M., Asari, V., et al.: The history began from alexnet: a comprehensive survey on deep learning approaches. https://arxiv.org/abs/1803.01164 (2018)

  4. Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine Bias. Retrieved from ProPublica: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

  5. Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Muller, K.: How to explain individual classification decisions. J Mach Learn Res 1803–1831. https://dl.acm.org/doi/pdf/10.5555/1756006.1859912 (2010)

  6. Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., Herrera, F.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012

    Article  Google Scholar 

  7. Bodria, F., Giannotti, F., Guidotti, R., Naretto, F., Pedreschi, D., Rinzivillo, S.:. Benchmarking and survey of explanation methods for black box models. https://arxiv.org/pdf/2102.13076.pdf (2021)

  8. Bratko, I.: Machine learning: between accuracy and interpretability. In: Della Riccia, G., Lenz, H.-J., Kruse, R. (eds.) Learning, Networks and Statistics. ICMS, vol. 382, pp. 163–177. Springer, Vienda (1997)

    Chapter  MATH  Google Scholar 

  9. Breiman, L.: Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci (2001). https://doi.org/10.1214/ss/1009213726

    Article  MATH  Google Scholar 

  10. Burkart, N., Huber, M.F.: A survey on the explainability of supervised machine learning. J Artif Intell Res 70, 245–317 (2021). https://doi.org/10.1613/jair.1.12228

    Article  MathSciNet  MATH  Google Scholar 

  11. Burrell, J.: How the machine ‘thinks’: understanding opacity in machine learning algorithms. Big Data Soc. 3(1), 205395171562251 (2016). https://doi.org/10.1177/2053951715622512

    Article  Google Scholar 

  12. Camburu, O., Giunchiglia, E., Foerster, J., Lukasiewicz, T., Blunsom, P.: Can I trust the explainer? Verifying post-hoc explanatory methods. https://arxiv.org/abs/1910.02065 (2019)

  13. Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning Interpretability: a survey on methods and metrics. Electronics 8(8), 832 (2019). https://doi.org/10.3390/electronics8080832

    Article  Google Scholar 

  14. Choi, E., Bahadori, M., Kulas, J., Schuetz, A., Stewart, W., Sun, J.: RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. Adv Neural Inf Process Syst 3504–3512 (2016). https://arxiv.org/abs/1608.05745

  15. Council of Europe: European Court of Human Rights: Handbook on European non-discrimination law. Council of Europe: European Court of Human Rights, Strasburg (2018)

    Google Scholar 

  16. Covert, I., Lundberg, S., Lee, S.: Explaining by removing: a unified framework for model explanation. https://arxiv.org/abs/2011.14878 (2020)

  17. Cranor, L.: A framework for reasoning about the human in the loop. https://www.usenix.org/legacy/event/upsec/tech/full_papers/cranor/cranor.pdf (2008)

  18. Deng, J., Dong, W., Socher, R., Li, L., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2009.5206848 (2009)

  19. Douglas-Scott, S.: The European Union and human rights after the treaty of Lisbon. Hum. Rights Law Rev. 11(4), 645–682 (2011). https://doi.org/10.1093/hrlr/ngr038

    Article  Google Scholar 

  20. Doyle, O.: Direct discrimination, indirect discrimination and autonomy. Oxf. J. Leg. Stud. 27(3), 537–553 (2007). https://doi.org/10.1093/ojls/gqm008

    Article  Google Scholar 

  21. Du, M., Liu, N., Hu, X.: Techniques for interpretable machine learning. Commun. ACM 63(1), 68–77 (2019). https://doi.org/10.1145/3359786

    Article  Google Scholar 

  22. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference on-ITCS '12. https://doi.org/10.1145/2090236.2090255 (2012)

  23. Dwork, C., Immorlica, N., Kalai, A.T., Leiserson, M.: Decoupled classifiers for fair and efficient machine learning. https://arxiv.org/abs/1707.06613 (2017)

  24. Ellis, E., Watson, P.: Key concepts in EU anti-discrimination law. EU Anti-Discrimination Law (2012). https://doi.org/10.1093/acprof:oso/9780199698462.003.0004

    Article  Google Scholar 

  25. Ernst, C.: Artificial intelligence and autonomy: self-determination in the age of automated systems. Regulat. Artif. Intell. (2019). https://doi.org/10.1007/978-3-030-32361-5_3

    Article  Google Scholar 

  26. Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2783258.2783311 (2015)

  27. Floridi, L., Chiriatti, M.: GPT-3: its nature, scope, limits, and consequences. Mind. Mach. 30(4), 681–694 (2020). https://doi.org/10.1007/s11023-020-09548-1

    Article  Google Scholar 

  28. Foster, K.R., Koprowski, R., Skufca, J.D.: Machine learning, medical diagnosis, and biomedical engineering research - commentary. Biomed. Eng. Online 13(1), 94 (2014). https://doi.org/10.1186/1475-925x-13-94

    Article  Google Scholar 

  29. Gerards, J., Xenidis, R.: Algorithmic discrimination in Europe: Challenges and opportunities for gender equality and non-discrimination law. Publications Office of the European Union (2021)

  30. Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). https://doi.org/10.1109/dsaa.2018.00018 (2018)

  31. Girasa, R.: AI US policies and regulations. Intell. Disrupt. Technol Artif (2020). https://doi.org/10.1007/978-3-030-35975-1_3

    Article  Google Scholar 

  32. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 1–42 (2019). https://doi.org/10.1145/3236009

    Article  Google Scholar 

  33. Guiraudon, V.: Equality in the making: implementing European non-discrimination law. Citizsh. Stud. 13(5), 527–549 (2009). https://doi.org/10.1080/13621020903174696

    Article  Google Scholar 

  34. Hall, P., Gill, N., Schmidt, P.: Proposed guidelines for the responsible use of explainable machine learning. https://arxiv.org/abs/1906.03533 (2019)

  35. Hall, P., Gill, N., Kurka, M., Phan, W.: Machine learning interpretability with H20 driverless AI. Mountain View: H20. https://www.h2o.ai/wp-content/uploads/2017/09/MLI.pdf (2017)

  36. Hand, D.J.: Classifier technology and the illusion of progress. Stat. Sci. (2006). https://doi.org/10.1214/088342306000000060

    Article  MathSciNet  MATH  Google Scholar 

  37. Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems. https://arxiv.org/abs/1610.02413 (2016)

  38. Kantola, J., Nousiainen, K.: The European Union: Initiator of a New European Anti-Discrimination Regime? In: Krizsan A, Skjeie H, Squires J (eds) Institutionalizing Intersectionality: The Changing Nature of European Equality Regimes. Palgrave Macmillan (2012)

  39. Larson, J., Mattu, S., Kirchner, L., Angwin, J.: How we analyzed the COMPAS recidivism algorithm. ProPublica 1–16 (2016). https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

  40. Laugel, T., Lesot, M., Marsala, C., Renard, X., Detyniecki, M.: The dangers of post-hoc interpretability: unjustified counterfactual explanations. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/388 (2019)

  41. Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., Lee, S.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 56–67 (2020). https://doi.org/10.1038/s42256-019-0138-9

    Article  Google Scholar 

  42. Mair, J.: Direct discrimination: limited by definition? Int. J. Discrim. Law 10(1), 3–17 (2009). https://doi.org/10.1177/135822910901000102

    Article  Google Scholar 

  43. Maliszewska-Nienartowicz, J.: Direct and indirect discrimination in European Union Law—how to draw a dividing line? Int. J. Soc. Sci. 41–55 (2014). https://www.iises.net/download/Soubory/soubory-puvodni/pp041-055_ijoss_2014v3n1.pdf

  44. Molnar, C., Konig, G., Herbinger, J., Freiesleben, T., Dandl, S., Scholbeck, C. A., Casalicchio G., Grosse-Wentrup M., Bischl, B.: Pitfalls to avoid when interpreting machine learning models. https://arxiv.org/abs/2007.04131 (2020)

  45. Meske, C., Bunde, E.: Transparency and trust in human–AI-interaction: the role of model-agnostic explanations in computer vision-based decision support. Artif. Intell. HCI (2020). https://doi.org/10.1007/978-3-030-50334-5_4

    Article  Google Scholar 

  46. Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65, 211–222 (2017). https://doi.org/10.1016/j.patcog.2016.11.008

    Article  Google Scholar 

  47. Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. 116(44), 22071–22080 (2019). https://doi.org/10.1073/pnas.1900654116

    Article  MathSciNet  MATH  Google Scholar 

  48. Narayanan, A.: Translation tutorial: 21 fairness definitions and their politics. In: Proceedings of the Conference on. Fairness Accountability Transparency. https://fairmlbook.org/tutorial2.html (2018)

  49. Nie, L., Wang, M., Zhang, L., Yan, S., Zhang, B., Chua, T.: Disease inference from health-related questions via sparse deep learning. IEEE Trans. Knowl. Data Eng. 27(8), 2107–2119 (2015). https://doi.org/10.1109/tkde.2015.2399298

    Article  Google Scholar 

  50. Onishi, T., Saha, S.K., Delgado-Montero, A., Ludwig, D.R., Onishi, T., Schelbert, E.B., Schwartzman, D., Gorcsan, J.: Global longitudinal strain and global circumferential strain by speckle-tracking echocardiography and feature-tracking cardiac magnetic resonance imaging: comparison with left ventricular ejection fraction. J. Am. Soc. Echocardiogr. 28(5), 587–596 (2015). https://doi.org/10.1016/j.echo.2014.11.018

    Article  Google Scholar 

  51. O’Sullivan, S., Nevejans, N., Allen, C., Blyth, A., Leonard, S., Pagallo, U., Holzinger, K., Holzinger, A., Sajid, M.I., Ashrafian, H.: Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. Int. J. Med. Robot. Comput. Assist. Surg. 15(1), e1968 (2019). https://doi.org/10.1002/rcs.1968

    Article  Google Scholar 

  52. Qian, K., Danilevsky, M., Katsis, Y., Kawas, B., Oduor, E., Popa, L., Li, Y.: XNLP: A living survey for XAI research in natural language processing. In: 26th International Conference on Intelligent User Interfaces. https://doi.org/10.1145/3397482.3450728 (2021)

  53. Pasquale, F.: The black box society, the secret algorithms that control money and information. Cambridge, MA: Harvard University Press. https://doi.org/10.4159/harvard.9780674736061 (2015)

  54. Pasquale, F.: Toward a fourth law of robotics: preserving attribution, responsibility, and explainability in an algorithmic society. Ohio State Law J. https://ssrn.com/abstract=3002546 (2017)

  55. Pedreschi, D., Giannotti, F., Guidotti, R., Monreale, A., Ruggieri, S., Turini, F.: Meaningful explanations of black box AI decision systems. Proc. AAAI Conf. Artif. Intell. 33, 9780–9784 (2019). https://doi.org/10.1609/aaai.v33i01.33019780

    Article  Google Scholar 

  56. Ribeiro, M., Singh, S., Guestrin, C.: Model-agnostic interpretability of machine learning. https://arxiv.org/abs/1606.05386 (2016)

  57. Ribeiro, M., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939778 (2016)

  58. Ringelheim, J.: The burden of proof in antidiscrimination proceedings. A focus on Belgium, France and Ireland. Eur. Equal. Law Rev. (2019). https://ssrn.com/abstract=3498346

  59. Rissland, E.: AI and legal reasoning. In: Proceedings of the 9th International Joint Conference on Artificial Intelligence. https://dl.acm.org/doi/abs/10.5555/1623611.1623724 (1985)

  60. Rissland, E.L., Ashley, K.D., Loui, R.: AI and law: a fruitful synergy. Artif. Intell. 150(1–2), 1–15 (2003). https://doi.org/10.1016/s0004-3702(03)00122-x

    Article  Google Scholar 

  61. Robnik-Šikonja, M., Bohanec, M.: Perturbation-based explanations of prediction models. Hum. Mach. Learn. (2018). https://doi.org/10.1007/978-3-319-90403-0_9

    Article  Google Scholar 

  62. Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x

    Article  Google Scholar 

  63. Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021). https://doi.org/10.1109/JPROC.2021.3060483

    Article  Google Scholar 

  64. Schwab, P., Karlen, W.: CXPlain: causal explanations for model interpretation under uncertainty. In: Advances in Neural Information Processing Systems. https://arxiv.org/abs/1910.12336 (2019)

  65. Selbst, A.D., Barocas, S.: The intuitive appeal of explainable machines. SSRN Electron. J. (2018). https://doi.org/10.2139/ssrn.3126971

    Article  Google Scholar 

  66. Suresh, H., Gong, J.J., Guttag, J.V.: Learning tasks for multitask learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/3219819.3219930 (2018)

  67. Suresh, H., Guttag, J.: A framework for understanding unintended consequences of machine learning. https://arxiv.org/abs/1901.10002 (2019)

  68. Tan, S., Caruana, R., Hooker, G., Lou, Y.: Distill-and-compare. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. https://doi.org/10.1145/3278721.3278725 (2018)

  69. Tischbirek, A.: Artificial intelligence and discrimination: discriminating against discriminatory systems. Regul. Artif. Intell. (2019). https://doi.org/10.1007/978-3-030-32361-5_5

    Article  Google Scholar 

  70. VanderWeele, T.J., Hernan, M.A.: Results on differential and dependent measurement error of the exposure and the outcome using signed directed acyclic graphs. Am. J. Epidemiol. 175(12), 1303–1310 (2012). https://doi.org/10.1093/aje/kwr458

    Article  Google Scholar 

  71. Verma, S., Rubin, J.: Fairness definitions explained. Proc. Int. Workshop Softw. Fairness (2018). https://doi.org/10.1145/3194770.3194776

    Article  Google Scholar 

  72. Viljoen, S.: Democratic data: a relational theory for data governance. SSRN Electron. J. (2020). https://doi.org/10.2139/ssrn.3727562

    Article  Google Scholar 

  73. Visani, G., Bagli, E., Chesani, F., Poluzzi, A., Capuzzo, D.: Statistical stability indices for LIME: obtaining reliable explanations for machine learning models. J. Oper. Res. Soc. (2021). https://doi.org/10.1080/01605682.2020.1865846

    Article  Google Scholar 

  74. Wachter, S., Mittelstadt, B., Russell, C.: Why fairness cannot be automated: bridging the gap between EU non-discrimination law and AI. SSRN Electron. J. (2020). https://doi.org/10.2139/ssrn.3547922

    Article  Google Scholar 

  75. Wachter, S., Mittelstadt, B., Russell, C.: Bias preservation in machine learning: the legality of fairness metrics under EU non-discrimination law. SSRN Electron. J. (2021). https://doi.org/10.2139/ssrn.3792772

    Article  Google Scholar 

  76. Wang, W., Siau, K., Keng, S.: Artificial intelligence: a study on governance, policies, and regulations. Association for Information Systems AIS Electronic Library. http://aisel.aisnet.org/mwais2018/40 (2018)

  77. Wischmeyer, T.: Artificial intelligence and transparency: opening the black box. Regul. Artif. Intell. (2019). https://doi.org/10.1007/978-3-030-32361-5_4

    Article  Google Scholar 

  78. Wischmeyer, T., Rademacher, T.: Regulating Artificial Intelligence. International Springer Publications, New York City (2020). https://doi.org/10.1007/978-3-030-32361-5

    Book  Google Scholar 

  79. Zafar, M., Khan, N.: DLIME: a deterministic local interpretable model-agnostic explaination approach for computer-aided diagnosis systems. https://arxiv.org/abs/1906.10263 (2019)

  80. Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: International Conference on Machine Learning, pp. 325–333. PMLR. https://proceedings.mlr.press/v28/zemel13.html (2013)

  81. Zhang, Y., Song, S., Sun, Y., Tan, S., Udell, M.: "Why Should You Trust My Explaination?" Understanding uncertainty in LIME explanations. https://arxiv.org/abs/1904.12991 (2019)

  82. Zuiderveen Borgesius, F.J.: Strengthening legal protection against discrimination by algorithms and artificial intelligence. Int. J. Hum. Rights 24(10), 1572–1593 (2020). https://doi.org/10.1080/13642987.2020.1743976

    Article  Google Scholar 

  83. Zuiderveen Borgesius, F.J.: Discrimination, artificial intelligence, and algorithmic decision-making. Council of Europe, Directorate General of Democracy. https://rm.coe.int/discrimination-artificial-intelligence-and-algorithmic-decision-making/1680925d73 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Vale.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vale, D., El-Sharif, A. & Ali, M. Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law. AI Ethics 2, 815–826 (2022). https://doi.org/10.1007/s43681-022-00142-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s43681-022-00142-y

Keywords

Navigation