Auditing fairness under unawareness through counterfactual reasoning

https://doi.org/10.1016/j.ipm.2022.103224Get rights and content

Highlights

  • The fairness under unawareness setting is insufficient in preventing decision bias

  • Sensitive user characteristics can be recovered by a non-sensitive data classifier

  • A new counterfactual methodology for revealing decision unfairness is proposed

  • The proposed counterfactual metric can better quantify decision discrimination

  • The methodology has been evaluated on two state-of-the-art datasets, Adult and German

Abstract

Artificial intelligence (AI) is rapidly becoming the pivotal solution to support critical judgments in many life-changing decisions. In fact, a biased AI tool can be particularly harmful since these systems can contribute to or demote people’s well-being. Consequently, government regulations are introducing specific rules to prohibit the use of sensitive features (e.g., gender, race, religion) in the algorithm’s decision-making process to avoid unfair outcomes. Unfortunately, such restrictions may not be sufficient to protect people from unfair decisions as algorithms can still behave in a discriminatory manner. Indeed, even when sensitive features are omitted (fairness through unawareness), they could be somehow related to other features, named proxy features. This study shows how to unveil whether a black-box model, complying with the regulations, is still biased or not. We propose an end-to-end bias detection approach exploiting a counterfactual reasoning module and an external classifier for sensitive features. In detail, the counterfactual analysis finds the minimum cost variations that grant a positive outcome, while the classifier detects non-linear patterns of non-sensitive features that proxy sensitive characteristics. The experimental evaluation reveals the proposed method’s efficacy in detecting classifiers that learn from proxy features. We also scrutinize the impact of state-of-the-art debiasing algorithms in alleviating the proxy feature problem.

Introduction

The Cambridge Dictionary defines discrimination as the act of “treating a person or particular group of people differently, especially in a worse way from the way in which you treat other people, because of their skin color, sex, sexuality, etc.”.1 Recently, various regulations have been designed to face the discrimination problem. For instance, Article 21 of the EU Charter of Fundamental Rights defines the non-discrimination requirements: “any discrimination based on any ground such as sex, race, color, ethnic or social origin, genetic features, language, religion or belief, political or any other opinion, membership of a national minority, property, birth, disability, age or sexual orientation shall be prohibited”.2 In 2015, the United Nations General Assembly set up the Sustainable Development Goals (SDGs) or Global Goals, a collection of 17 interlinked global goals designed to be a “blueprint for achieving a better and more sustainable future for all”.3 Most of the SDGs are somehow related to the discrimination problem, such as no poverty, zero hunger, gender equality, and reduced inequality. The discrimination problem is well-known and recognized in the financial domain where, for example, the decision to approve or deny credit has been regulated with precise and detailed regulatory compliance requirements (i.e., Equal Credit Opportunity Act,4 Federal Fair Lending Act,5 and Consumer Credit Directive for EU Community6). However, these laws were set to prevent discrimination in human decision-making processes and not in automated ones, such as those exploiting Machine Learning (ML) or, more generally, Artificial Intelligence (AI) systems. The EU Commission, in the wake of the GDPR7 (i.e., a regulation to safeguard personal data), seeks to regulate the use of AI systems with the “Ethics Guidelines for Trustworthy AI” and more recently with “The Proposal for Harmonized Rule on AI”. The regulated characteristics are various (e.g., technical robustness, privacy, data governance, transparency, accountability, societal and environmental well-being), and the European legislature deems adopting non-discriminatory AI models crucial. Therefore, the financial domain is the perfect workbench to test these regulations. Indeed, financial services are considered high-risk AI applications on the European AI risk scale (the levels are: minimal, limited, high, and unacceptable risk). As a consequence, a financial AI model must demonstrate fairness concerning sensitive characteristics to protect the social context in which it operates.

Since unfair treatment is strictly related to discriminatory behavior, fairness can be seen as the antonym of discrimination. Unfortunately, finding a strict and formal definition of fairness is challenging, and the subject is still under debate. Mehrabi, Morstatter, Saxena, Lerman, and Galstyan (2021) proposed a definition that could fit the financial domain and its discrimination-derived risks. They defined fairness as “the absence of any prejudice or favoritism towards an individual or a group based on their inherent or acquired characteristics”. Another relevant aspect of fairness is highlighted by Ekstrand, Das, Burke, Diaz, et al. (2022) that refer to unfairness when a system treats people, or groups of people, in a way that is considered “unfair” by some moral, legal, or ethical standard. The exciting aspect is that, in that case, “fairness” is related to the normative aspects of the system and its effects. For this work, the counterfactual fairness as defined by Pitoura, Stefanidis, and Koutrika (2022) is particularly relevant. The intuition, in this case, is that an output is fair towards an entity if it is the same in both the actual world and a counterfactual world where the entity belongs to a different group. Causal inference is used to formalize this notion of fairness. This definition inspired the design of our model. From a geometrical perspective that considers how a decision model works, Dwork, Hardt, Pitassi, Reingold, and Zemel (2012) say that items that are close in construct space shall also be close in decision space, which is widely known as individual fairness: similar individuals should receive similar outcomes. In contrast to individual fairness, Deldjoo, Jannach, Bellogin, Difonzo, and Zanzonelli (2022) define group fairness that aims to ensure that “similar groups have similar experiences”. Typical groups in such a context are a majority or dominant group and a protected group (e.g., an ethnic minority). Following this overview, some critical aspects of this work emerged: the legislation, the counterfact, and the group. More specifically, the legislation is the primary motivation behind this work, the counterfactual generation is the strategy we exploited for detecting unfairness, and the group is the subject of discrimination we want to catch. Although system designers train a model without any discriminatory purpose, several studies demonstrated that using AI systems without considering ethical aspects can promote discrimination (Bickel et al., 1975, Corbett-Davies et al., 2017, Dressel and Farid, 2018). Moreover, while the financial domain regulations strictly prohibit using sensitive characteristics for decision-making, some researchers defend their usage and believe the model should avoid unfair treatments (i.e., active bias detection) (Elliott et al., 2008, Ruf and Detyniecki, 2020). Nevertheless, only avoiding using sensitive features for training AI models does not guarantee the absence of biases in the outcome (Agarwal & Mishra, 2021). Indeed, there could be features in the dataset that can represent an implicit sensitive feature. In this study, we name these independent features as a proxy features for the sensitive one. For instance, education, smoking and drinking habits, pet ownership, and diet can be proxy variables for the feature age. The relationship between proxy and sensitive features generally depends on multicollinearity, namely a highly linear relationship between more than two variables. Unfortunately, non-linear relationships are more challenging to detect.

This investigation relies on the “Fairness Under Unawareness” –or “blindness” Pitoura et al. (2022)– definition (i.e., “an algorithm is fair as long as any protected attributes are not explicitly used in the decision-making process” (Chen, Kallus, Mao, Svacha, & Udell, 2019)). The choice of this definition is a logical consequence of current regulations. Indeed, like for other high-risk applications, the law dictates that AI applications in the financial domain cannot use sensitive information.

This work investigates a strategy to detect decision biases in a realistic scenario where sensitive features are absent, and there could be an unknown number of proxy features. We propose to tackle this challenging task by designing a system composed of three main modules. The first module encapsulates the classifier to analyze, named the outcome classifier. This predictor, as regulations suggest, is trained without any sensitive features. The second module trains a separate classifier, named sensitive feature classifier, on the same features to predict the sensitive characteristics. The third module calculates the minimal counterfactual samples, i.e., variants of the original sample, by modifying the values of non-sensitive features to obtain a different outcome with the outcome classifier. Finally, the sensitive feature predictor classifies the generated samples to check whether the samples do still belong to the original sensitive class. If this does not occur, the outcome predictor is biased, and its unfairness can be quantified.

To better explain the idea behind our approach, let us introduce a simple example regarding the loan granting process. Suppose our goal is to assess whether our loan classifier discriminates against women. In this example, the protected class is women, and the sensitive feature is gender. The outcome classifier is a whatsoever state-of-the-art classification model trained without gender. The sensitive feature classifier will then distinguish men from women by exploiting the other non-sensitive features in the dataset (e.g., car type, job, education). An event triggers the system’s operation: a woman uses the outcome classifier to obtain a loan, and her request is rejected. Therefore, the counterfactual module modifies the values of her non-sensitive attributes until the loan is approved (e.g., increasing income, reducing the loan duration). The sensitive-feature classifier then classifies the new approved counterfactual sample. Is she still classified as a woman by the system? What could we say if the features that approved the loan are the same that classified her as a man? The decision model may still be biased and thus unfair, and since it does not use sensitive features, this is due to proxy features.

Overall, this study proposes an approach for detecting bias in machine learning models using counterfactual reasoning, even when those models are trained without sensitive features, i.e., in the case of Fairness Under Unawareness. This setting could be summarized as outlined by Mehrabi et al. (2021): “An algorithm is fair as long as any protected attributes are not explicitly used in the decision-making process”. This research aims to investigate the presence of bias in an algorithm using counterfactual reasoning as an effective strategy for bias detection and evaluate if different counterfactual strategies have dissimilar efficacy in detecting biases. In detail, with this study, we intend to answer the following research questions:

  • RQ1: Is there a principled way to identify if proxy features exist in a dataset?

  • RQ2: Does the Fairness Under Unawareness setting ensure that decision biases are avoided?

  • RQ3: Is counterfactual reasoning suitable for discovering decision biases?

  • RQ4: Is our methodology effective for discovering discrimination and biases? Are there limitations in its application?

To provide an answer to the previous RQs, we performed an extensive experimental evaluation on three state-of-the-art datasets, broadly recognized as datasets containing Social Bias. The remainder of the paper is organized as follows: Section 2 provides an overview of the most relevant research in the fields of fairness and counterfactual reasoning, Section 3 provides the preliminaries of the work, while Section 4 describes the methodology. Section 5 introduces the experiments, while results are discussed in Section 6. Conclusion and future work are drawn in Section 8.

Section snippets

Related work

This study presents a strategy for detecting bias in machine learning models using Counterfactual Reasoning. This section aims to provide the reader with an adequate background, introducing the most relevant works in Fairness and Counterfactual Reasoning research fields.

Preliminaries

This section introduces some useful notation that is extensively used in the rest of the paper. To ease the reading and for a rapid understanding, the definition of protected groups has some commonalities with Chen et al. (2019), while some other aspects necessarily diverge from it due to the different nature of the study. The notation used is further condensed in Table 1, while in Table 2 we can find the list of acronyms used in the work.

In the following, we will refer to a set D, with |D|=m,

Methodology

The fairness under unawareness setting (see Section 2.1) poses several challenges to the identification of discriminatory behaviors performed by intelligent systems. On the one hand, the prohibition of exploiting sensitive features makes it extremely difficult to guarantee fair treatment for the various categories of users. On the other hand, proxy features can be non-linearly correlated with sensitive ones, thus making the commonly used statistical approaches useless. This section aims to

Experimental evaluation

This section details our experimental settings, designed to answer the research questions defined in Section 1. Two different models are trained: on the one hand, we train a model for making decisions for a specific task (i.e., income prediction or loan prediction), and on the other hand, we train the sensitive-feature classifiers to predict the sensitive group the samples belong to.

Specifically, we focus on the samples predicted as negative by the main task classifier. Next, we exploit

Discussion of the results

This Section depicts, describes, and discusses the experimental results. The rationale of the discussion is to provide the reader with an in-depth understanding of the critical classifiers and unveil how the proposed method highlights potential biases. For clarity, the discussion follows the research questions introduced in Section 1:

  • RQ1: Is there a principled way to identify if proxy features exist in a dataset?

  • RQ2: Does the Fairness Under Unawareness setting ensure that decision biases are

Limitations and future work

Our work proposes a new methodology for exploring and investigating bias by exploiting advances in counterfactual reasoning. Even though the outcomes presented are a notable achievement in bias identification, our work is not exempt from limitations. For instance, Section 6.3 explores the distances between counterfactuals classified as privileged and underprivileged. However, an overall distance does not highlight features that are the most important in the decision-making process and are, at

Conclusion

This study introduces a novel methodology for detecting and assessing biases in decision-making models, even if they operate in the context of “fairness under unawareness”, and thus do not use sensitive features. The role of counterfactual reasoning in the proposed approach is crucial. Adopting counterfactual reasoning in the proposed approach is crucial since it allows unveiling the characteristics of original samples that could reverse the decision-makers prediction. When the counterfactual

CRediT authorship contribution statement

Giandomenico Cornacchia: Conceptualization, Methodology, Writing – original draft, Writing – review & editing, Software. Vito Walter Anelli: Conceptualization, Methodology, Formal analysis, Writing – original draft, Writing – review & editing. Giovanni Maria Biancofiore: Conceptualization, Methodology, Writing – original draft. Fedelucio Narducci: Conceptualization, Methodology, Formal analysis, Writing – original draft, Writing – review & editing. Claudio Pomo: Conceptualization, Methodology,

Acknowledgments

This research was partially supported by the following projects: VHRWPD7 – CUP B97I19000980007 – COR 1462424 ERP 4.0, Grant Agreement Number 101016956 H2020 PASSEPARTOUT, Secure Safe Apulia, Codice Pratica 3PDW2R7 SERVIZI LOCALI 2.0, MISE CUP: I14E20000020001 CTEMT - Casa delle Tecnologie Emergenti Comune di Matera, PON ARS01_00876 BIO-D, CT_FINCONS_II.

References (55)

  • BottouL. et al.

    Counterfactual reasoning and learning systems: The example of computational advertising

    Journal of Machine Learning Research

    (2013)
  • BureauC.F.P.

    Using publicly available information to proxy for unidentified race and ethnicity: A methodology and assessment

    (2014)
  • CaldersT. et al.

    Efficient AUC optimization for classification

  • Chen, J. (2018). Fair lending needs explainable models for responsible recommendation. In FATREC’18 proceedings of the...
  • ChenJ. et al.

    Fairness under unawareness: Assessing disparity when protected class is unobserved

  • Corbett-DaviesS. et al.

    Algorithmic decision making and the cost of fairness

  • CornacchiaG. et al.

    A general model for fair and explainable recommendation in the loan domain (short paper)

  • CornacchiaG. et al.

    Improving the user experience and the trustworthiness of financial services

  • DasS. et al.

    Fairness measures for machine learning in finance

    The Journal of Financial Data Science

    (2021)
  • DeldjooY. et al.

    A survey of research on fair recommender systems

    (2022)
  • DeMartinoG.F.

    The confounding problem of the counterfactual in economic explanation

    Review of Social Economy

    (2020)
  • DentonE. et al.

    Image counterfactual sensitivity analysis for detecting unintended bias

    (2019)
  • Donini, M., Oneto, L., Ben-David, S., Shawe-Taylor, J., & Pontil, M. (2018). Empirical Risk Minimization Under Fairness...
  • DresselJ. et al.

    The accuracy, fairness, and limits of predicting recidivism

    Science Advances

    (2018)
  • DudíkM. et al.

    Doubly robust policy evaluation and learning

  • Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through awareness. In Proceedings of the...
  • EkstrandM.D. et al.

    Fairness in information access systems

    Foundations and Trends® in Information Retrieval

    (2022)
  • Cited by (20)

    View all citing articles on Scopus
    View full text