A game-theoretic approach to adversarial linear Gaussian classification

https://doi.org/10.1016/j.ifacsc.2021.100163Get rights and content

Abstract

We employ a game-theoretic model to analyse the interaction between an adversary and a classifier. There are two (i.e., positive and negative) classes to which data points can belong. The adversary wants to maximize the probability of miss-detection for the positive class (i.e., false negative probability) while it does not want to significantly modify the data point so that it still maintains favourable traits of the original class. The classifier, on the other hand, wants maximize the probability of correct detection for the positive class (i.e., true positive probability) subject to a lower-bound on the probability of correct detection for the negative class (i.e., true negative probability). For conditionally Gaussian data points (conditioned on the class) and linear support vector machine classifiers, we rewrite the optimization problems of the adversary and the classifier as convex problems and use best response dynamics to learn an equilibrium of the game. This results in computing a linear support vector machine classifier that is robust against adversarial input manipulations.

Introduction

Machine learning algorithms are observed to be vulnerable to adversarial manipulations of their inputs after training and deployment, known as evasion attacks (Dalvi et al., 2004, Goodfellow et al., 2015, Yuan et al., 2019). In fact, some machine learning models are shown to be adversely influenced by very small perturbations to the inputs (Biggio et al., 2013, Goodfellow et al., 2015, Papernot et al., 2016). These observations severely restrict their applications in practice.

In this paper, we propose a game-theoretic approach to model and analyse the interactions between an adversary and a decision maker (i.e., a classifier). As a starting point for research, we focus on a binary classification problem using linear support vector machines with Gaussian-distributed data in each class. This way, we can compute optimal adversarial linear support vector machine. Note that the problem of detecting and mitigating evasion attacks in support vector machines is still an ongoing debate (Frederickson et al., 2018, Han and Rubinstein, 2018).

We particularly model the interaction between the adversary and the classifier using a constant-sum game. There are two classes (i.e, positive and negative classes) to which the data can belong. The adversary is interested in maximizing the probability of miss-detection for the positive class, i.e., the probability of classification of an input belonging to the negative class while it is from the positive class, also known as the false negative probability. However, the adversary does not want to significantly modify the data so that it still maintains the favourable traits of the original class. An example of such a classification problem is a simplified spam filtering in which the nature of the email determines its class (with the positive class denoting spam emails). The adversary’s objective is to modify the spam emails so that they pass the spam filtering algorithm. Manipulating the email by a large amount might negate the adversarial nature of spam emails. Also, note that the adversary cannot access all the emails and thus can only manipulate the spam emails. The classifier is interested in maximizing the probability of correct detection for the positive class, i.e., the probability of classification of an input belonging to the positive class if it is from the positive class, also known as the true positive probability. In the spam filtering example, the classifier aims to determine if an email is spam or not based on possibly modified spam email and unaltered genuine emails. Evidently, if the objective of the classifier was solely to correctly catch all data points belonging to the positive class, its optimal behaviour would have been to ignore the received data point and to mark it as belonging to the positive class. This would correctly identify all data points belonging to the positive class however it also miss-classifies all data points from the negative class. This is in fact impractical. For instance, such a policy, in the spam filtering example, would results in marking all emails as spam, which is undesirable. Therefore, the classifier enforces a lower bound on the probability of correct detection for the negative class, i.e., the probability of classification of an input belonging to the negative class if it is from the negative class, also known as the true negative probability. We rewrite the optimization problems of the adversary and the classifier as two convex optimization problems and use a best response dynamics to learn an equilibrium of the game.

Most common methods for securing machine learning algorithms against adversarial inputs are ad hoc in nature or based on heuristic; see, e.g., Goodfellow et al., 2015, Kurakin et al., 2017, Papernot et al., 2016. For instance, it has been shown that injecting adversarial examples into the training set, often referred to as adversarial training, can increase robustness to adversarial manipulations (Goodfellow et al., 2015). However, this approach is dependent on the method used for generating adversarial examples and the number of the required adversarial examples is often not known a priori. Other approach for generation of robust machine learning models is the use of regularization (Russu, Demontis, Biggio, Fumera, & Roli, 2016) and distributionally-robust optimization (Sinha, Namkoong, & Duchi, 2017). The tight relationship between distributionally-robust optimization and regularization is explored in Farokhi (2020). Another method is to rely on developing computationally-friendly robustness certificates (Raghunathan, Steinhardt, & Liang, 2018). Finally, game-theoretical models have also been used in adversarial machine learning, such as adversarial support vector machines (Brückner et al., 2012, Brückner and Scheffer, 2009, Globerson and Roweis, 2006, Weerasinghe et al., 2019, Zhou et al., 2012). In contrast with this paper, they do not optimize the probability of true positive for the classifier and false positive for the adversary. Instead, they consider hinge losses in their modelling (motivated by the formulation of support vector machines). Furthermore, in Brückner et al., 2012, Brückner and Scheffer, 2009, gradient descent is used for characterizing the equilibrium rather than semi-definite programming as in this paper. Probabilities of true positive and false positive for adversarial detection have been considered in Dritsoula, Loiseau, and Musacchio (2017). That paper is hence more aligned with our problem formulation. However, they only consider discrete decision spaces to get tractable results while, in this paper, we consider continuous decision spaces and use linearity for tractability.

The problem formulation of this paper is in essence close to cheap-talk and Bayesian persuasion games (Crawford and Sobel, 1982, Dughmi and Xu, 2016, Farokhi et al., 2016, Farrell and Rabin, 1996, Kamenica and Gentzkow, 2011, Nadendla et al., 2018, Sarıtas et al., 2019, Sarıtaş et al., 2016, Sarıtaş et al., 2020) in which a better-informed sender wants to communicate with a receiver in a strategic manner to sway its decision. However, there is a stark difference between those studies and the setup of this paper. Most importantly, in this paper, the classifier (i.e., the receiver) is restricted to follow a machine learning model (specifically, a linear support vector machine).

Section snippets

Problem formulation

The adversary has access to a random variable xRn, which can belong to two classes: positive and negative. The class to which x belongs is denoted by θ{1,+1}, which is a binary random variable itself with P{θ=+1}=1P{θ=1}=α>0. The random variable x is assumed to be Gaussian with mean μ+Rn and co-variance matrix Σ+0 if θ=+1 and is assumed to be Gaussian with mean μRn and co-variance matrix Σ0 if θ=1. The notation A0 implies that A is a symmetric positive definite matrix while A0

Main results

We can prove an important result regarding the adversarial classification game illustrating the direct conflict of interest between the adversary and the classifier, as expected.

Proposition 1 Constant-Sum Game

The adversarial classification game is a constant-sum game.

Proof

Note that Uc(γ,η)+Ua(γ,η)=P{z=+1|θ=+1}+P{z=1|θ=+1}=1 for any γ and η.  

In the remainder of this section, we provide a method for computing equilibria of an adversarial classification game. We first show that the best responses of the players can be computed

Numerical example

In this section, we illustrate the applicability of the developed game-theoretic framework on two numerical problems: an illustrative example using Gaussian data and a practical example using real data on heart disease classification.

Conclusions and future work

We used a constant-sum game to model the interaction between an adversary and a classifier. For Gaussian data and linear support vector machine classifiers, we transformed the optimization problems of the adversary and the classifier to convex optimization problems. We then utilized best response dynamics to learn an equilibrium of the game in order to extract linear support vector machine classifiers that are robust to adversarial input manipulations.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (31)

  • SarıtaşS. et al.

    Dynamic signaling games with quadratic criteria under Nash and stackelberg equilibria

    Automatica

    (2020)
  • WeerasingheS. et al.

    Support vector machines resilient against training data integrity attacks

    Pattern Recognition

    (2019)
  • ArrowK.J. et al.

    Existence of an equilibrium for a competitive economy

    Econometrica

    (1954)
  • BarronE.N. et al.

    Best response dynamics for continuous games

    Proceedings of the Americal Mathematical Society

    (2010)
  • Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., & Laskov, P., et al. (2013). Evasion attacks against...
  • BrücknerM. et al.

    Static prediction games for adversarial learning problems

    Journal of Machine Learning Research

    (2012)
  • BrücknerM. et al.

    Nash equilibria of static prediction games

  • CrawfordV.P. et al.

    Strategic information transmission

    Econometrica

    (1982)
  • DalviN. et al.

    Adversarial classification

  • DritsoulaL. et al.

    A game-theoretic analysis of adversarial classification

    IEEE Transactions on Information Forensics and Security

    (2017)
  • DughmiS. et al.

    Algorithmic bayesian persuasion

  • FarokhiF.

    Regularization helps with mitigating poisoning attacks: Distributionally-robust machine learning using the wasserstein distance

    (2020)
  • FarokhiF. et al.

    Quadratic Gaussian privacy games

  • FarokhiF. et al.

    Estimation with strategic sensors

    IEEE Transactions on Automatic Control

    (2016)
  • FarrellJ. et al.

    Cheap talk

    Journal of Economic Perspectives

    (1996)
  • Cited by (1)

    View full text