Auditing fairness under unawareness through counterfactual reasoning
Introduction
The Cambridge Dictionary defines discrimination as the act of “treating a person or particular group of people differently, especially in a worse way from the way in which you treat other people, because of their skin color, sex, sexuality, etc.”.1 Recently, various regulations have been designed to face the discrimination problem. For instance, Article 21 of the EU Charter of Fundamental Rights defines the non-discrimination requirements: “any discrimination based on any ground such as sex, race, color, ethnic or social origin, genetic features, language, religion or belief, political or any other opinion, membership of a national minority, property, birth, disability, age or sexual orientation shall be prohibited”.2 In 2015, the United Nations General Assembly set up the Sustainable Development Goals (SDGs) or Global Goals, a collection of 17 interlinked global goals designed to be a “blueprint for achieving a better and more sustainable future for all”.3 Most of the SDGs are somehow related to the discrimination problem, such as no poverty, zero hunger, gender equality, and reduced inequality. The discrimination problem is well-known and recognized in the financial domain where, for example, the decision to approve or deny credit has been regulated with precise and detailed regulatory compliance requirements (i.e., Equal Credit Opportunity Act,4 Federal Fair Lending Act,5 and Consumer Credit Directive for EU Community6). However, these laws were set to prevent discrimination in human decision-making processes and not in automated ones, such as those exploiting Machine Learning (ML) or, more generally, Artificial Intelligence (AI) systems. The EU Commission, in the wake of the GDPR7 (i.e., a regulation to safeguard personal data), seeks to regulate the use of AI systems with the “Ethics Guidelines for Trustworthy AI” and more recently with “The Proposal for Harmonized Rule on AI”. The regulated characteristics are various (e.g., technical robustness, privacy, data governance, transparency, accountability, societal and environmental well-being), and the European legislature deems adopting non-discriminatory AI models crucial. Therefore, the financial domain is the perfect workbench to test these regulations. Indeed, financial services are considered high-risk AI applications on the European AI risk scale (the levels are: minimal, limited, high, and unacceptable risk). As a consequence, a financial AI model must demonstrate fairness concerning sensitive characteristics to protect the social context in which it operates.
Since unfair treatment is strictly related to discriminatory behavior, fairness can be seen as the antonym of discrimination. Unfortunately, finding a strict and formal definition of fairness is challenging, and the subject is still under debate. Mehrabi, Morstatter, Saxena, Lerman, and Galstyan (2021) proposed a definition that could fit the financial domain and its discrimination-derived risks. They defined fairness as “the absence of any prejudice or favoritism towards an individual or a group based on their inherent or acquired characteristics”. Another relevant aspect of fairness is highlighted by Ekstrand, Das, Burke, Diaz, et al. (2022) that refer to unfairness when a system treats people, or groups of people, in a way that is considered “unfair” by some moral, legal, or ethical standard. The exciting aspect is that, in that case, “fairness” is related to the normative aspects of the system and its effects. For this work, the counterfactual fairness as defined by Pitoura, Stefanidis, and Koutrika (2022) is particularly relevant. The intuition, in this case, is that an output is fair towards an entity if it is the same in both the actual world and a counterfactual world where the entity belongs to a different group. Causal inference is used to formalize this notion of fairness. This definition inspired the design of our model. From a geometrical perspective that considers how a decision model works, Dwork, Hardt, Pitassi, Reingold, and Zemel (2012) say that items that are close in construct space shall also be close in decision space, which is widely known as individual fairness: similar individuals should receive similar outcomes. In contrast to individual fairness, Deldjoo, Jannach, Bellogin, Difonzo, and Zanzonelli (2022) define group fairness that aims to ensure that “similar groups have similar experiences”. Typical groups in such a context are a majority or dominant group and a protected group (e.g., an ethnic minority). Following this overview, some critical aspects of this work emerged: the legislation, the counterfact, and the group. More specifically, the legislation is the primary motivation behind this work, the counterfactual generation is the strategy we exploited for detecting unfairness, and the group is the subject of discrimination we want to catch. Although system designers train a model without any discriminatory purpose, several studies demonstrated that using AI systems without considering ethical aspects can promote discrimination (Bickel et al., 1975, Corbett-Davies et al., 2017, Dressel and Farid, 2018). Moreover, while the financial domain regulations strictly prohibit using sensitive characteristics for decision-making, some researchers defend their usage and believe the model should avoid unfair treatments (i.e., active bias detection) (Elliott et al., 2008, Ruf and Detyniecki, 2020). Nevertheless, only avoiding using sensitive features for training AI models does not guarantee the absence of biases in the outcome (Agarwal & Mishra, 2021). Indeed, there could be features in the dataset that can represent an implicit sensitive feature. In this study, we name these independent features as a proxy features for the sensitive one. For instance, education, smoking and drinking habits, pet ownership, and diet can be proxy variables for the feature age. The relationship between proxy and sensitive features generally depends on multicollinearity, namely a highly linear relationship between more than two variables. Unfortunately, non-linear relationships are more challenging to detect.
This investigation relies on the “Fairness Under Unawareness” –or “blindness” Pitoura et al. (2022)– definition (i.e., “an algorithm is fair as long as any protected attributes are not explicitly used in the decision-making process” (Chen, Kallus, Mao, Svacha, & Udell, 2019)). The choice of this definition is a logical consequence of current regulations. Indeed, like for other high-risk applications, the law dictates that AI applications in the financial domain cannot use sensitive information.
This work investigates a strategy to detect decision biases in a realistic scenario where sensitive features are absent, and there could be an unknown number of proxy features. We propose to tackle this challenging task by designing a system composed of three main modules. The first module encapsulates the classifier to analyze, named the outcome classifier. This predictor, as regulations suggest, is trained without any sensitive features. The second module trains a separate classifier, named sensitive feature classifier, on the same features to predict the sensitive characteristics. The third module calculates the minimal counterfactual samples, i.e., variants of the original sample, by modifying the values of non-sensitive features to obtain a different outcome with the outcome classifier. Finally, the sensitive feature predictor classifies the generated samples to check whether the samples do still belong to the original sensitive class. If this does not occur, the outcome predictor is biased, and its unfairness can be quantified.
To better explain the idea behind our approach, let us introduce a simple example regarding the loan granting process. Suppose our goal is to assess whether our loan classifier discriminates against women. In this example, the protected class is women, and the sensitive feature is gender. The outcome classifier is a whatsoever state-of-the-art classification model trained without gender. The sensitive feature classifier will then distinguish men from women by exploiting the other non-sensitive features in the dataset (e.g., car type, job, education). An event triggers the system’s operation: a woman uses the outcome classifier to obtain a loan, and her request is rejected. Therefore, the counterfactual module modifies the values of her non-sensitive attributes until the loan is approved (e.g., increasing income, reducing the loan duration). The sensitive-feature classifier then classifies the new approved counterfactual sample. Is she still classified as a woman by the system? What could we say if the features that approved the loan are the same that classified her as a man? The decision model may still be biased and thus unfair, and since it does not use sensitive features, this is due to proxy features.
Overall, this study proposes an approach for detecting bias in machine learning models using counterfactual reasoning, even when those models are trained without sensitive features, i.e., in the case of Fairness Under Unawareness. This setting could be summarized as outlined by Mehrabi et al. (2021): “An algorithm is fair as long as any protected attributes are not explicitly used in the decision-making process”. This research aims to investigate the presence of bias in an algorithm using counterfactual reasoning as an effective strategy for bias detection and evaluate if different counterfactual strategies have dissimilar efficacy in detecting biases. In detail, with this study, we intend to answer the following research questions:
- •
RQ1: Is there a principled way to identify if proxy features exist in a dataset?
- •
RQ2: Does the Fairness Under Unawareness setting ensure that decision biases are avoided?
- •
RQ3: Is counterfactual reasoning suitable for discovering decision biases?
- •
RQ4: Is our methodology effective for discovering discrimination and biases? Are there limitations in its application?
To provide an answer to the previous RQs, we performed an extensive experimental evaluation on three state-of-the-art datasets, broadly recognized as datasets containing Social Bias. The remainder of the paper is organized as follows: Section 2 provides an overview of the most relevant research in the fields of fairness and counterfactual reasoning, Section 3 provides the preliminaries of the work, while Section 4 describes the methodology. Section 5 introduces the experiments, while results are discussed in Section 6. Conclusion and future work are drawn in Section 8.
Section snippets
Related work
This study presents a strategy for detecting bias in machine learning models using Counterfactual Reasoning. This section aims to provide the reader with an adequate background, introducing the most relevant works in Fairness and Counterfactual Reasoning research fields.
Preliminaries
This section introduces some useful notation that is extensively used in the rest of the paper. To ease the reading and for a rapid understanding, the definition of protected groups has some commonalities with Chen et al. (2019), while some other aspects necessarily diverge from it due to the different nature of the study. The notation used is further condensed in Table 1, while in Table 2 we can find the list of acronyms used in the work.
In the following, we will refer to a set , with ,
Methodology
The fairness under unawareness setting (see Section 2.1) poses several challenges to the identification of discriminatory behaviors performed by intelligent systems. On the one hand, the prohibition of exploiting sensitive features makes it extremely difficult to guarantee fair treatment for the various categories of users. On the other hand, proxy features can be non-linearly correlated with sensitive ones, thus making the commonly used statistical approaches useless. This section aims to
Experimental evaluation
This section details our experimental settings, designed to answer the research questions defined in Section 1. Two different models are trained: on the one hand, we train a model for making decisions for a specific task (i.e., income prediction or loan prediction), and on the other hand, we train the sensitive-feature classifiers to predict the sensitive group the samples belong to.
Specifically, we focus on the samples predicted as negative by the main task classifier. Next, we exploit
Discussion of the results
This Section depicts, describes, and discusses the experimental results. The rationale of the discussion is to provide the reader with an in-depth understanding of the critical classifiers and unveil how the proposed method highlights potential biases. For clarity, the discussion follows the research questions introduced in Section 1:
- •
RQ1: Is there a principled way to identify if proxy features exist in a dataset?
- •
RQ2: Does the Fairness Under Unawareness setting ensure that decision biases are
Limitations and future work
Our work proposes a new methodology for exploring and investigating bias by exploiting advances in counterfactual reasoning. Even though the outcomes presented are a notable achievement in bias identification, our work is not exempt from limitations. For instance, Section 6.3 explores the distances between counterfactuals classified as privileged and underprivileged. However, an overall distance does not highlight features that are the most important in the decision-making process and are, at
Conclusion
This study introduces a novel methodology for detecting and assessing biases in decision-making models, even if they operate in the context of “fairness under unawareness”, and thus do not use sensitive features. The role of counterfactual reasoning in the proposed approach is crucial. Adopting counterfactual reasoning in the proposed approach is crucial since it allows unveiling the characteristics of original samples that could reverse the decision-makers prediction. When the counterfactual
CRediT authorship contribution statement
Giandomenico Cornacchia: Conceptualization, Methodology, Writing – original draft, Writing – review & editing, Software. Vito Walter Anelli: Conceptualization, Methodology, Formal analysis, Writing – original draft, Writing – review & editing. Giovanni Maria Biancofiore: Conceptualization, Methodology, Writing – original draft. Fedelucio Narducci: Conceptualization, Methodology, Formal analysis, Writing – original draft, Writing – review & editing. Claudio Pomo: Conceptualization, Methodology,
Acknowledgments
This research was partially supported by the following projects: VHRWPD7 – CUP B97I19000980007 – COR 1462424 ERP 4.0, Grant Agreement Number 101016956 H2020 PASSEPARTOUT, Secure Safe Apulia, Codice Pratica 3PDW2R7 SERVIZI LOCALI 2.0, MISE CUP: I14E20000020001 CTEMT - Casa delle Tecnologie Emergenti Comune di Matera, PON ARS01_00876 BIO-D, CT_FINCONS_II.
References (55)
- et al.
Fairness metrics and bias mitigation strategies for rating predictions
Information Processing & Management
(2021) Counterfactuals
Artificial Intelligence
(1986)- et al.
Provider fairness across continents in collaborative recommender systems
Information Processing & Management
(2022) Explanation in artificial intelligence: Insights from the social sciences
Artificial Intelligence
(2019)- et al.
FairLens: Auditing black-box clinical decision support systems
Information Processing & Management
(2021) - et al.
Responsible AI
(2021) - et al.
Fair normalizing flows
- et al.
Sex bias in graduate admissions: Data from berkeley
Science
(1975) - et al.
Ensuring fairness under prior probability shifts
- et al.
A training algorithm for optimal margin classifiers
Counterfactual reasoning and learning systems: The example of computational advertising
Journal of Machine Learning Research
Using publicly available information to proxy for unidentified race and ethnicity: A methodology and assessment
Efficient AUC optimization for classification
Fairness under unawareness: Assessing disparity when protected class is unobserved
Algorithmic decision making and the cost of fairness
A general model for fair and explainable recommendation in the loan domain (short paper)
Improving the user experience and the trustworthiness of financial services
Fairness measures for machine learning in finance
The Journal of Financial Data Science
A survey of research on fair recommender systems
The confounding problem of the counterfactual in economic explanation
Review of Social Economy
Image counterfactual sensitivity analysis for detecting unintended bias
The accuracy, fairness, and limits of predicting recidivism
Science Advances
Doubly robust policy evaluation and learning
Fairness in information access systems
Foundations and Trends® in Information Retrieval
Cited by (20)
A Survey of Explainable Knowledge Tracing
2024, arXiv[email protected]@u-paris.frOn Popularity Bias of Multimodal-aware Recommender Systems: A Modalities-driven Analysis
2023, MMIR 2023 - Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval, Co-located with: MM 2023Mitigating demographic bias of machine learning models on social media
2023, ACM International Conference Proceeding SeriesCLICK: Integrating Causal Inference and Commonsense Knowledge Incorporation for Counterfactual Story Generation
2023, Electronics (Switzerland)Counterfactual Reasoning for Bias Evaluation and Detection in a Fairness Under Unawareness Setting
2023, Frontiers in Artificial Intelligence and ApplicationsTrustworthy Recommender Systems: Technical, Ethical, Legal, and Regulatory Perspectives
2023, Proceedings of the 17th ACM Conference on Recommender Systems, RecSys 2023