1 Introduction

In this paper we consider a discrete-time dynamic game of mean field type. In this game, a representative player takes actions in time so as to minimize a cost functional which depends on her type, her action, and the distribution of actions of the whole population of players. Crucially, players’ types may encode different characteristics or preferences, and may change progressively in time. The players’ actions on a given date are only allowed to depend on their types up to that date, introducing an adaptability, or non-anticipativity, constraint into the game. The solutions to this game are dubbed dynamic Cournot-Nash equilibria following Acciaio et al. [2]. As in mean field games, searching for equilibria in dynamic Cournot-Nash games boils down to solving a fixed point problem, and an equilibrium to these games allows to build approximate equilibria in related large population games. See [2, Section 2] for a detailed discussion on the connection between finite and infinite population versions of dynamic Cournot-Nash games.

Building on the work [16] by Blanchet and Carlier, it was shown in [2] that the emerging field of causal optimal transport provides the right framework to describe dynamic Cournot-Nash games. However, when it comes to establishing existence or uniqueness of equilibria, the aforementioned paper makes the crucial assumption of the game being of potential type. In a nutshell, this amounts to a structural assumption under which equilibria correspond to minimizers of an auxiliary variational problem. However the assumption of being potential type is not ideal for multiple reasons. First, there are commonly used games/models of non-potential structure. Second, the link between causal optimal transport and dynamic Cournot-Nash games is blurred when one superimposes such structural assumption. Finally, the proposed method in [2] was not only restricted to the potential case, but also a further cost-separability assumption was made, namely that the type of a player does not interact with the distribution of actions within the cost function. The goal of the present paper is to remedy these shortcomings, following the blueprint set forth in [15], by Blanchet and Carlier, for the static case.

We now summarize our contributions in some details.

In Sect. 2 we define the problem, recall the connection and the elements of causal optimal transport, and study the question of existence of (mixed) Nash equilibria. As customary, this is done by considering the best-response correspondence, which in our case assigns to any prior distribution \({\nu }\) of actions for the population of players the set \(\Phi ({\nu })\) of optimal responses by a single player. Using causal transport, we establish the closedness and convexity of the set \(\Phi ({\nu })\). Applying Kakutani fixed point theorem, we obtain the existence of equilibria in our games under suitable assumptions. Finally, a uniqueness result is derived from a Lasry-Lions monotonicity condition.

In Sect. 3 we assume a specific structure of the cost functional of the game, which allows us to find the equilibrium using the contraction mapping theorem. To do so, we use the structure of the game in order to get a hold on the best response correspondence. To this goal we use the fact that, conditioning on the past evolution of types, the optimal response can be constructed backwards (i.e. recursively) in time. Under appropriate Lipschitz and convexity assumptions, we prove that the best response is a contraction.

In Sect. 4, we introduce and study a simple optimal liquidation problem in a price impact model. We first describe this model, and then establish the applicability of the results of Sect. 3. We prove that the game is not of potential type, and hence cannot be covered by the existing literature. Furthermore, we provide an example which illustrates how to compute the optimal response map and equilibrium.

We close this introduction by giving a broader overview of the related literature.

1.1 Related literature

The games we are concerned with are closely related to mean field games (MFG) in a discrete-time setting (see e.g. Gomes et al. [20]). For this parallel, the different types of agents considered in our setup correspond to different subpopulations of players in the MFG. The theory of mean field games aims at studying dynamic games as the number of agents tends to infinity. It was established independently by Lasry and Lions [24, 25] and by Huang, Malhamé and Caines [21, 22], and has since seen a burst in activity, as e.g. documented in the monograph by Carmona and Delarue [18]. See Cardialaguet’s notes [17], based on P.L. Lions’ lectures at Collége de France, for seminal results on mean field games, and also Bayraktar et al. [9,10,11] or Cecchin and Fischer [19] for the study of finite state mean field games. The key assumption is that players are symmetric and weakly interacting through their empirical distributions, and the idea is to approximate large N-player systems by studying the behaviour as \(N \rightarrow \infty \).

On the other hand, the notion of Cournot-Nash games has been pioneered by Blanchet and Carlier [14, 16] who, building on the seminal contribution of Mas-Colell [27], developed a connection between static Cournot-Nash equilibria and optimal transport. From a probabilistic perspective, large static anonymous games have been studied by Lacker and Ramanan in [23], with an emphasis on large deviations and the asymptotic behaviour of the Price of Anarchy. We also refer to this paper for a thorough review on the (vast) game theoretic literature. Building from this body of work, Acciaio et al. introduced in [2] the concept of dynamic Cournot-Nash game/equilibria. Working in the so-called potential case, that article studied questions of existence, convergence from finite to infinite populations, and computational aspects. Crucially, the article observed that instead of optimal transport, it is the theory of causal optimal transport, which we discuss in the next paragraph, that plays the main role in the mathematical analysis of these games. Another article that took a similar, variational point of view is [12] wherein competitive games with mean field effect were studied. The advantage of the potential / variational setting, is that instead of studying an equilibrium problem, an auxiliary optimization problem is solved, which is in many ways better suited for analysis and computational resolution. To the best of our knowledge, the only article where non-potential (with non-separable costs) static Cournot-Nash games have been studied is Blanchet and Carlier’s [15]. That article serves us as inspiration as we carry out our analysis of the dynamic case in similar non-potential settings.

As already mentioned, to deal with our dynamic setting, it is the tools from causal optimal transport (COT) rather than classical optimal transport that play a role. In a nutshell, COT is a relative of the optimal transport problem where an extra constraint, which takes into account the arrow of time (filtrations), is added. This in turn is crucial to ensure, in our application, the adaptedness of players’ actions to their types in a dynamic framework. The theory of COT, used to reformulate our asymptotic equilibrium problem, has been developed in the works [7, 26]. This theory has been successfully employed in various applications, e.g. in mathematical finance and stochastic analysis [1, 3, 6, 8], in operations research [28,29,30], and in machine learning [4].

We close this part by clarifying the similarities and differences between Cournot-Nash and mean field games (see also [2, Remark 3.8] for a related explanation). To simplify the matter we only discuss a static situation. In an N-player symmetric game, if players adopt the decisions \(\{y^N_j\}_j\) then the cost faced by player i is \(F(y^N_i,\frac{1}{N}\sum _{j\ne i}\delta _{y^N_j})\). In a (pure) Nash equilibrium we have \(F(y^N_i,\frac{1}{N}\sum _{j\ne i}\delta _{y^N_j})\le F(z_i,\frac{1}{N}\sum _{j\ne i}\delta _{y^N_j}) \) for each \(i\le N\) and any \(z_i\). Taking averages in these inequalities and a compactness argument, which gives a subsequence that \(\frac{1}{N}\sum _{j\le N}\delta _{y^N_j}\rightarrow {\hat{\nu }}\), provides us heuristically with a static mean field equilibrium: \(\int F(y,{\hat{\nu }}){\hat{\nu }}(dy)\le \int F(y,{\hat{\nu }})\nu (dy)\) for all \(\nu \) probability measure over decisions. For the static Cournot-Nash case the N-player game story is quite similar, but now player i has a type \(x^N_i\) and faces the type-dependent cost \(F(x_i^N, y^N_i,\frac{1}{N}\sum _{j\ne i}\delta _{y^N_j})\). In a Nash equilibrium we thus have \(F(x_i^N, y^N_i,\frac{1}{N}\sum _{j\ne i}\delta _{y^N_j})\le F(x_i^N, z_i,\frac{1}{N}\sum _{j\ne i}\delta _{y^N_j}) \) for each \(i\le N\) and any \(z_i\). If we take averages, assume that the types \(x_i^N\) are e.g. i.i.d. samples distributed according to \(\eta \), and apply a compactness argument, which yields a subsequence \(\frac{1}{N}\sum _{j\le N}\delta _{(x_j^N, y^N_j)}\rightarrow {\hat{\pi }}\), we get heuristically a static Cournot-Nash equilibrium: \(\int F(x,y,{\hat{\nu }}){\hat{\pi }}(dx,dy)\le \int F(x,y,{\hat{\nu }})\pi (dx,dy)\) for all \(\pi \) probability measures over types and decisions with x-marginal \(\eta \), while here \({\hat{\nu }}\) is the y-marginal of \({\hat{\pi }}\). If we consider players of the same type belonging to the same sub-population, then Cournot-Nash games are very close to multi-population mean field games (cf. [5]), with the caveat that it is the aggregate distribution of decisions that is included in the cost criterion, i.e. we do not disaggregate the decisions of the population along the various sub-populations. Mathematically this corresponds to \({\hat{\nu }}\) (the second marginal of \({\hat{\pi }}\)) being the last argument of F, instead of \(x\mapsto {\hat{\pi }}_x\) (the family of conditional probabilities given the first coordinate).

Notation. Let \(\mathcal {X}_1, \cdots , \mathcal {X}_N, \mathcal {Y}_1, \cdots , \mathcal {Y}_N\) be Polish spaces, and take \(\mathcal {X}:=\mathcal {X}_1 \times \cdots \times \mathcal {X}_N, \mathcal {Y}:=\mathcal {Y}_1 \times \cdots \times \mathcal {Y}_N\). Define \(\mathcal {X}_{s:t}= \mathcal {X}_s \times \cdots \times \mathcal {X}_t\) and \(\mathcal {Y}_{s:t}= \mathcal {Y}_s \times \cdots \times \mathcal {Y}_t\) for \(1 \le s \le t \le N\). For \(x \in \mathcal {X}\), we denote \(x_{s:t}=(x_s, \cdots , x_t)\) for \(1 \le s \le t \le N\), and similarly define \(y_{s:t}\) for \(y \in \mathcal {Y}\). Denote the canonical filtration on \(\mathcal {X}\) and \(\mathcal {Y}\) by \((\mathcal {F}^{\mathcal {X}}_t)_{t=1}^N\) and \((\mathcal {F}^{\mathcal {Y}}_t)_{t=1}^N\) respectively. For any Polish space \(\mathcal {Z}\), we denote by \(\mathcal {P}(\mathcal {Z})\) the space of Borel probability measures on \(\mathcal {Z}\). Given \(\eta \in \mathcal {P}(\mathcal {X})\), and \(\nu \in \mathcal {P}(\mathcal {Y})\), we denote the set of all couplings between \(\eta \) and \(\nu \) by

$$\begin{aligned} \Pi (\eta , \nu ):=\{\pi \in \mathcal {P}(\mathcal {X}\times \mathcal {Y}): \, \pi (A \times \mathcal {Y})=\eta (A), \, \pi (\mathcal {X}\times B)=\nu (B), \, \forall \, A \in \mathcal {F}^{\mathcal {X}}_N, \, B \in \mathcal {F}^{\mathcal {Y}}_N \}. \end{aligned}$$

The letter \({\mathcal {L}}\) stands for Law and if \(T:\mathcal {X}\rightarrow \mathcal {Y}\) is measurable we denote by \(T(\eta ):=\eta \circ T^{-1}\in \mathcal {P}(\mathcal {Y})\) the push-forward of \(\eta \) by T.

2 Existence by set-valued fixed point theorem

In this section, we formulate the Cournot-Nash equilibrium as a fixed point problem, and solve it by applying Kakutani fixed point theorem. First we recall the notion of causal coupling.

Definition 2.1

Suppose \(\eta \in \mathcal {P}(\mathcal {X}), \, \nu \in \mathcal {P}(\mathcal {Y})\). A coupling \(\pi \in \Pi (\eta , \nu )\) is said to be casual if under \(\pi \) it holds that

$$\begin{aligned} \mathcal {F}^{\mathcal {Y}}_{t} \underset{\mathcal {F}^{\mathcal {X}}_{t}}{\perp \!\!\! \perp } \mathcal {F}^{\mathcal {X}}_{N}, \quad t=1,\cdots , N. \end{aligned}$$

Denote by \(\Pi _c(\eta , \nu )\) the collection of all causal couplings from \(\eta \) to \(\nu \).

Remark 2.1

In words, the above means that \(\mathcal {F}^{\mathcal {Y}}_{t}\) and \(\mathcal {F}^{\mathcal {X}}_{N}\) are conditionally independent under \(\pi \) given the information in \(\mathcal {F}^{\mathcal {X}}_{t}\), and this for each t. See [7, 26] for equivalent formulations of this condition, or our proof of Lemma 2.2 below. The set \(\Pi _c(\eta , \nu )\) is never empty, as the product of \(\eta \) and \(\nu \) is always an element thereof. It is instructive to consider the case when \(\pi \) is supported on the graph of a function T from \(\mathcal {X}\) to \(\mathcal {Y}\): in this case causality essentially boils down to the named function being adapted (\(T(x)=(T_1(x_1),T_2(x_{1:2}),\dots , T_N(x_{1:N})\)).

In the rest of this paper, N stands for a fixed time horizon. At each time \(t \in \{1, \cdots , N\}\), a representative player is characterized by her type at that time, denoted by \(x_t\in \mathcal {X}_t\), and her control/action undertaken at that time, denoted by \(y_t\in \mathcal {Y}_t\). Hence \(x\in \mathcal {X}\) and \(y\in \mathcal {Y}\) denote the type-path and action-path of a player. We fix once and for all \(\eta \in \mathcal {P}(\mathcal {X})\). The measure \(\eta \) is the distribution of the types in the population of players, and is known in advance by the players.

We denote

$$\begin{aligned} \Pi _c(\eta , \cdot )= \mathop {\cup }_{\nu \in \mathcal {P}(\mathcal {Y}) }\Pi _c(\eta ,\nu ). \end{aligned}$$

We now recall the notion of dynamic Cournout-Nash equilibrium (see [2]), which we will simply call equilibrium in the rest of the work.

Definition 2.2

An equilibrium is a coupling \(\hat{\pi } \in \Pi _c(\eta ,\cdot )\) such that

$$\begin{aligned} \mathrm{(i)}&\;\hat{\pi } \in {\mathop {\textrm{argmin}}\limits _{\pi \in \Pi _c(\eta , \cdot )}} \int _{\mathcal {X}\times \mathcal {Y}} F(x,y,\hat{\nu }) \, \pi (dx,dy) \text { for some } \hat{\nu } \in \mathcal {P}(\mathcal {Y}), \nonumber \\ \mathrm{(ii)}&\text { The }\mathcal {Y}-\text { marginal of }\hat{\pi } \text { is }\hat{\nu }. \end{aligned}$$
(2.1)

Above \(F: \mathcal {X}\times \mathcal {Y}\times \mathcal {P}(\mathcal {Y}) \rightarrow \mathbb {R}\) is a given cost function, assumed lower-bounded for the time being. Here \( \hat{\nu }\) represents the distribution of controls/actions by the population of players, which is only determined at equilibrium, and \(\hat{\pi }\) characterizes the optimal response of each type of player given the cost function that they face \((x,y) \mapsto F(x,y, \hat{\nu })\).

Remark 2.2

The above should be interpreted as randomized, or mixed strategies, equilibrium. A pure equilibrium would be an adapted map \({\hat{T}}:\mathcal {X}\rightarrow \mathcal {Y}\) satisfying

$$\begin{aligned} \mathrm{(i')}&\int _{\mathcal {X}} F(x,{\hat{T}}(x),\hat{\nu }) \, \eta (dx) = \inf _{T\,\text {adapted}} \int _{\mathcal {X}} F(x,T(x),\hat{\nu }) \, \eta (dx) \text { for some } \hat{\nu } \in \mathcal {P}(\mathcal {Y}), \\ \mathrm{(ii')}&\,\, {\hat{T}}(\eta )=\hat{\nu }, \text { i.e. the image of }\eta \text { by }{\hat{T}} \text { is }\hat{\nu }. \end{aligned}$$

As usual in game theory we introduce the best-response set-valued map, or correspondence, defined by

$$\begin{aligned} \Phi (\nu )&:=\left\{ \pi \in \Pi _c(\eta , \cdot ): \, \int F(x,y,\nu ) \, \pi (dx ,dy) \right. \nonumber \\&\left. \le \int F(x,y,\nu ) \, \pi '(dx ,dy), \forall \pi ' \in \Pi _c(\eta , \cdot ) \right\} , \end{aligned}$$
(2.2)

and also the projection from \(\Pi _c(\eta , \cdot )\) to \(\mathcal {P}(\mathcal {Y})\)

$$\begin{aligned} Pj: \pi \mapsto \mathcal {Y}-\text { marginal of }\pi . \end{aligned}$$

Finally we introduce

$$\begin{aligned} R(\hat{\nu }):=Pj \circ \Phi (\hat{\nu }), \end{aligned}$$

the \(\mathcal {Y}\)-marginals of the best responses to \(\hat{\nu }\), i.e. the possible distributions of actions in response to \(\hat{\nu }\).

It can be readily seen that \(\hat{\nu }\) is a fixed point as in (2.1) if and only if \(\hat{\nu } \in R(\hat{\nu })\). We will show the existence of fixed points of R applying Kakutani fixed point theorem, which we recall in the following lemma.

Lemma 2.1

Let \(R: \mathcal {Z}\rightarrow 2^{\mathcal {Z}}\) be a set-valued map. Then R has a fixed point, i.e. \(\exists z\) s.t. \(z \in R(z)\), if

  1. (i)

    \(\mathcal {Z}\) is a nonempty compact, convex set in a locally convex space.

  2. (ii)

    R is upper semi-continuous, and the set R(y) is nonempty, closed, and convex for all \(z \in \mathcal {Z}\).

Proof

See [31, Theorem 9.B]. \(\square \)

The following lemma will be used to show that \(R(\nu )\) is closed and convex for any \(\nu \in \mathcal {P}(\mathcal {Y})\). See [7, 26] for similar statements: We present it here, separately, for the sake of clarity.

Lemma 2.2

Causality is preserved under weak convergence, i.e., \(\pi \in \Pi _c(\eta , \cdot )\) if \(\pi =\lim \limits _{n \rightarrow \infty } \pi _n\) for a sequence \((\pi _n)_{n \ge 0} \subset \Pi _c(\eta , \cdot )\), and so \(\Pi _c(\eta , \cdot )\) is closed. Also \(\Pi _c(\eta , \cdot )\) is convex, i.e., \(a\pi _1+(1-a)\pi _2 \in \Pi _c(\eta , \cdot )\) for any \( \pi _1, \pi _2 \in \Pi _c(\eta , \cdot )\) and \(a \in [0,1]\).

Proof

Clearly the \(\mathcal {X}\)-marginal of \(\pi \) is \(\eta \). Let us prove that \(\mathcal {F}^{\mathcal {Y}}_{t} \underset{\mathcal {F}^{\mathcal {X}}_{t}}{\perp \!\!\! \perp } \mathcal {F}^{\mathcal {X}}_{N}\) under \(\pi \) for any \(t \in \{1, \cdots , N\}\). This is equivalent to proving that, for any bounded continuous function \(g: \mathcal {Y}_{1:t} \rightarrow \mathbb {R}\), it holds

$$\begin{aligned} \mathbb {E}^{\pi } \left[ g(Y_{1:t}) \, | \, \mathcal {F}^{\mathcal {X}}_t\right] = \mathbb {E}^{\pi } \left[ g(Y_{1:t}) \, | \, \mathcal {F}^{\mathcal {X}}_N\right] , \end{aligned}$$

where \(Y_{1:t}: \mathcal {Y}\rightarrow \mathcal {Y}_{1:t}\) is the projection map on the first t coordinates. Denote by \(\eta _{x_{1:t}}(dx_{t+1:N})\) the disintegration of \(\eta \) on the first t components \(x_{1:t}\). Then it suffices to prove that

$$\begin{aligned}&\int _{\mathcal {X}\times \mathcal {Y}} g(y_{1:t}) f(x)\, \pi (dx, dy) =\int _{\mathcal {X}_{1:t} \times \mathcal {Y}_{1:t}} g(y_{1:t}) \nonumber \\&\left( \int _{\mathcal {X}_{t+1:N}} f(x_{1:t}, x_{t+1:N}) \, \eta _{x_{1:t}}(dx_{t+1:N}) \right) \, \pi (dx_{1:t} dy_{1:t}), \end{aligned}$$
(2.3)

for any bounded continuous function \(f: \mathcal {X}\rightarrow \mathbb {R}\). Since the function

$$\begin{aligned} \bar{f}(x_{1:t}):= \int _{\mathcal {X}_{t+1:N}} f(x_{1:t}, x_{t+1:N}) \, \eta _{x_{1:t}}(dx_{t+1:N}) \end{aligned}$$

is measurable, by Lusin’s Theorem, there exists a closed \(\mathcal {V}\subset \mathcal {X}_{1:t}\) such that \(\eta (\mathcal {V}) > 1-\delta \) and \(\bar{f}\) is continuous restricted to \(\mathcal {V}\). Then by Tietze’s Theorem, we extend \(\bar{f}\) to a bounded continuous function \(\bar{f}'\) on \(\mathcal {X}_{1:t}\), and it is clear that \(f|_{\mathcal {V}}=\bar{f}'|_{\mathcal {V}}\) and \(\Vert f-\bar{f}' \Vert _{\infty } < 2\Vert f \Vert _{\infty }\).

The equality (2.3) holds for each causal coupling \(\pi _n \). It can be readily seen that

$$\begin{aligned}&\lim \limits _{n \rightarrow \infty } \int _{\mathcal {X}\times \mathcal {Y}} g(y_{1:t}) f(x)\, \pi _n(dx, dy) = \int _{\mathcal {X}\times \mathcal {Y}} g(y_{1:t}) f(x)\, \pi (dx, dy) , \\&\lim \limits _{n \rightarrow \infty } \int _{\mathcal {X}\times \mathcal {Y}} g(y_{1:t}) \bar{f}'(x_{1:t}) \, \pi _n(dx_{1:t}, dy_{1:t})= \int _{\mathcal {X}\times \mathcal {Y}} g(y_{1:t}) \bar{f}'(x_{1:t}) \, \pi (dx_{1:t}, dy_{1:t}), \end{aligned}$$

and

$$\begin{aligned} \left| \int _{\mathcal {X}\times \mathcal {Y}} g(y_{1:t}) \left( \bar{f}'(x_{1:t})-f(x_{1:t}) \right) \, {\tilde{\pi }}(dx_{1:t}, dy_{1:t})\right| \le 2\delta \Vert f \Vert _{\infty } \Vert g \Vert _{\infty }, \, \forall \, {\tilde{\pi }} \text { with }\mathcal {X}-\text { marginal } \eta . \end{aligned}$$

Therefore we conclude that

$$\begin{aligned}&\left| \int _{\mathcal {X}\times \mathcal {Y}} g(y_{1:t}) \left( \int _{\mathcal {X}_{t+1:N}} f(x_{1:t}, x_{t+1:N}) \, \eta _{x_{1:t}}(dx_{t+1:N}) \right) \, \pi (dx_{1:t}, dy_{1:t}) \right. \\&\quad \quad \left. -\int _{\mathcal {X}\times \mathcal {Y}} g(y_{1:t}) f(z)\, \pi (dx, dy) \right| \le 4 \delta \Vert f \Vert _{\infty } \Vert g \Vert _{\infty }. \end{aligned}$$

Letting \(\delta \rightarrow 0\), we finish proving (2.3).

Convexity of \(\Pi _c(\eta , \cdot )\) is a direct consequence of (2.3). \(\square \)

Now we are ready to show our main result of this section. The precise assumption on the cost function F is:

Assumption 2.1

  1. (i)

    \(F: \mathcal {X}\times \mathcal {Y}\times \mathcal {P}(\mathcal {Y})\) is non-negative, \(F(\cdot ,\cdot ,\nu )\) is continuous for each \(\nu \), and \(\nu \mapsto F(\cdot ,\cdot ,\nu )\) is continuous in supremum norm.

  2. (ii)

    \(\left\{ y: \inf _{(x,{\nu }) \in \mathcal {X}\times \mathcal {P}(\mathcal {Y})} F(x,y,{\nu }) \le r \right\} \) is compact for any \(r>0\).

  3. (iii)

    There exists a \(y_0 \in \mathcal {Y}\) and \(C<+\infty \) such that

    $$\begin{aligned} \sup _{\nu \in \mathcal {P}(\mathcal {Y})} \int F (x,y_0,\nu ) \, \eta (dx)\le C. \end{aligned}$$

Here are two simple examples that satisfy Assumption 2.1

Example 2.1

  1. (i).

    Suppose \(\mathcal {X}\) and \(\mathcal {Y}\) are compact. Then any non-negative continuous function \(F: \mathcal {X}\times \mathcal {Y}\times \mathcal {P}(\mathcal {Y}) \rightarrow \mathbb {R}\) satisfies Assumption 2.1

  2. (ii).

    Suppose \(\mathcal {X}=\mathcal {Y}=\mathbb {R}^N\) and \(\eta \) has finite second moment. Let \(\alpha , \beta , \gamma \) be three positive constants and \(g:\mathbb {R}^N \times \mathbb {R}^N \rightarrow \mathbb {R}\) be a non-negative, bounded and uniformly continuous function. Then it can be easily verified that

    $$\begin{aligned} F(x,y,\nu )=\alpha \Vert x-y\Vert ^2 + \beta \Vert y \Vert ^2+ \gamma \int g(y, {\bar{y}}) \, \nu (d {\bar{y}}) \end{aligned}$$

    satisfies Assumption 2.1.

Theorem 2.1

Under Assumption 2.1, a solution to the fixed point problem (2.1) exists.

Proof

We show that the composition \(R= Pj \circ \Phi \) has a fixed point. In Step 1, we prove that \(R(\nu )\) is relatively compact for any \(\nu \in \mathcal {P}(\mathcal {Y})\), and hence we can restrict R to a compact domain. In Step 2, invoking Lemma 2.2, we show that \(R(\nu )\) is closed and convex. In Step 3 we prove the R is upper-semicontinuous and therefore the existence of a fixed points for R according to Lemma 2.1.

Step 1:

Take \(y_0 \in \mathcal {X}\) and \(C<+\infty \) as in Assumption 2.1 (iii). It is clear that \(\eta (dx)\delta _{y_0}(dy)\in \Pi _c(\eta , \cdot ) \). Then for any putative \(\pi \in \Phi ({\nu })\) we would have

$$\begin{aligned} \int _{\mathcal {X}\times \mathcal {Y}} F(x,y,{\nu }) \, \pi (dx ,dy) \le \int _{\mathcal {X}\times \mathcal {Y}} F(x,y_0,{\nu }) \, \eta (dx) \le C. \end{aligned}$$

From Assumption 2.1 (ii), we know that for any \( r >0\), a compact subset \(\mathcal {V}_{r} \subset \mathcal {Y}\) exists such that

$$\begin{aligned} F(x,y,\nu ) \ge r\, (\text { all }x,\nu ) \quad \text { whenever } \quad y \not \in \mathcal {V}_{r} . \end{aligned}$$

Therefore we obtain the inequality

$$\begin{aligned} Pj({\pi })[y \not \in \mathcal {V}_r] \le \pi [(x,y):F(x,y,\nu ) \ge r]\le \frac{\int _{\mathcal {X}\times \mathcal {Y}} F(x,y,{\nu }) \, \pi (dx ,dy)}{r} \le \frac{C}{r}. \end{aligned}$$

Define a subset \(\mathcal {E}\subset \mathcal {P}(\mathcal {Y})\) as

$$\begin{aligned} \mathcal {E}:=\left\{ \nu \in \mathcal {P}(\mathcal {Y}): \, \nu [y \not \in \mathcal {V}_r] \le C/r, \, \forall r >0 \right\} . \end{aligned}$$

It is clear that \(\mathcal {E}\) is relatively compact, by Prokhorov theorem, as it is tight. By Portmanteau theorem, \(\mathcal {E}\) is also closed, since each set \(\mathcal {Y}\backslash \mathcal {V}_r\) is open. Hence \(\mathcal {E}\) is compact, and clearly convex too. By design we have \(R({\nu }) \subset \mathcal {E}\) for any \({\nu } \in \mathcal {P}(\mathcal {Y})\). We restrict the domain of R to \({\mathcal {E}}\), which is a compact and convex subset of the space of finite signed measures equipped with the weak topology.

Step 2 :

We define \(\Pi _c(\eta , \mathcal {E})\) as the subset of \(\Pi _c(\eta , \cdot )\) consisting of measures with a \(\mathcal {Y}\)-marginal lying in \(\mathcal {E}\). Note that \(\Phi ({\nu }) \subset \Pi _c(\eta , \mathcal {E})\), by Step 1. The compactness of \(\mathcal {E}\), Lemma 2.2, and Prokhorov theorem, yield that \(\Pi _c(\eta , \mathcal {E})\) is compact and so \(\Phi ({\nu })\) is relatively compact. We notice that

$$\begin{aligned} \Phi (\nu )&=\left\{ \pi \in \Pi _c(\eta , \mathcal {E}): \, \int F(x,y,\nu ) \, \pi (dx ,dy) \right. \\&\left. \le \int F(x,y,\nu ) \, \pi '(dx ,dy), \forall \pi ' \in \Pi _c(\eta , \mathcal {E}) \right\} , \end{aligned}$$

and by the compactness of \(\Pi _c(\eta , \mathcal {E})\) and Assumption 2.1 (i) we obtain that \(\Phi (\nu )\) is non-empty. By the same token, \(\Phi (\nu )\) is closed and hence compact, and clearly \(\Phi (\nu )\) is convex too. On the other hand, the map Pj is continuous and linear. Hence \(R({\nu })=Pj(\Phi ({\nu }))\) is also nonempty, convex and compact.

Step 3:

We prove that \(R:\mathcal {E}\rightarrow \mathcal {E}\) is an upper-semicontinuous set-valued map. Thus there exists a fixed point in \({\mathcal {E}}\), as a result of Lemma 2.1. Since \({\mathcal {E}}\) is compact, it is equivalent to show that the graph of R is closed in \({\mathcal {E}} \times {\mathcal {E}}\). Take any sequence \(({\nu }_n, \nu _n')_{n\ge 0} \subset \mathcal {E}\times \mathcal {E}\) such that

$$\begin{aligned} \nu _n' \in R({\nu }_n),\quad {\nu }_n \rightarrow \hat{\nu },\quad \nu _n' \rightarrow \hat{\nu }'. \end{aligned}$$

Let us prove that \(\hat{\nu }' \in R(\hat{\nu })\). Note that for each n, there exists a \(\pi _n \in \Phi ({\nu }_n)\) such that \(Pj (\pi _n)=\nu _n'\). Since \((\pi _n)_{n \ge 0} \subset \Pi _c(\eta , \mathcal {E})\), there exists a subsequence \((\pi _{n_k})_{k \ge 0}\) converging to \(\hat{\pi }\). According to Lemma 2.2, we know that \(\hat{\pi } \in \Pi _c(\eta , \cdot )\) as well. It is clear then that \(Pj(\hat{\pi })=\hat{\nu }'\). Let us verify that

$$\begin{aligned} \int _{\mathcal {X}\times \mathcal {Y}} F(x,y,\hat{\nu }) \, \hat{\pi }(dx ,dy) \le \int _{\mathcal {X}\times \mathcal {Y}} F(x,y,\hat{\nu }) \, \pi '(dx ,dy), \quad \forall \pi ' \in \Pi _c(\eta , \cdot ). \end{aligned}$$
(2.4)

According to the definition of \(\pi _{n_k} \in \Phi ({\nu }_{n_k})\), we know that

$$\begin{aligned} \int _{\mathcal {X}\times \mathcal {Y}} F(x,y,{\nu }_{n_k}) \, \pi _{n_k}(dx ,dy) \le \int _{\mathcal {X}\times \mathcal {Y}} F(x,y,{\nu }_{n_k}) \, \pi '(dx ,dy), \quad \forall \pi ' \in \Pi _c(\eta , \cdot ). \end{aligned}$$

Now using the uniform continuity of F in Assumption 2.1 (i), and letting \(k \rightarrow \infty \) in the above inequality, we conclude (2.4).

\(\square \)

Remark 2.3

Inspection of the previous proof shows that Assumption 2.1 (i) could be weakened to

  1. (i’)

    The function \(\nu \mapsto F(\cdot ,\cdot ,\nu )\) is continuous in sup-norm and for each \(\nu \) the function \(F(\cdot ,\cdot ,\nu )\) jointly lower semicontinuous and continuous in its second argument.

As this seems to be a technicality, we do not develop this further.

To guarantee the uniqueness of fixed point, we impose the following monotonicity condition on F.

Assumption 2.2

For any \(\pi \in \Pi _c ( \eta , {\nu }), \pi ' \in \Pi _c(\eta , {\nu }')\), if \(\pi \ne \pi '\) then

$$\begin{aligned} \int _{\mathcal {X}\times \mathcal {Y}} \left( F(x,y,{\nu })-F(x,y,{\nu }')\right) (\pi -\pi ' ) (dx, dy) >0. \end{aligned}$$

Corollary 2.1

There exists at most one equilibrium under Assumption 2.2.

Proof

Suppose there are two distinct equilibria \(\pi \in \Pi _c ( \eta , \hat{\nu }) \) and \(\pi '\in \Pi _c ( \eta , \hat{\nu }')\), so \(\pi \in \Phi (\hat{\nu })\) and \( \pi ' \in \Phi (\hat{\nu }')\). Then by definition

$$\begin{aligned} \int _{\mathcal {X}\times \mathcal {Y}} F(x,y, \hat{\nu }) \, \pi (dx, dy)&\le \int _{\mathcal {X}\times \mathcal {Y}} F(x,y, \hat{\nu }) \, \pi '(dx, dy), \\ \int _{\mathcal {X}\times \mathcal {Y}} F(x,y, \hat{\nu }') \, \pi '(dx, dy)&\le \int _{\mathcal {X}\times \mathcal {Y}} F(x,y, \hat{\nu }')\, \pi (dx, dy). \end{aligned}$$

Adding the above inequalities, we obtain that

$$\begin{aligned} \int _{\mathcal {X}\times \mathcal {Y}} \left( F(x,y, \hat{\nu })- F(x,y,\hat{\nu }')\right) (\pi -\pi ') (dx, dy)\le 0, \end{aligned}$$

which contradicts Assumption 2.2. \(\square \)

Here is a simple example of F that satisfies Assumption 2.2.

Example 2.2

\(F(x,y,{\nu })=c(x,y)+V[{\nu }](y)\), where V is strictly Lasry-Lions monotone:

$$\begin{aligned} \int _{\mathcal {Y}} \left( V[\nu ](y) - V[\nu '](y) \right) (\nu -\nu ') (dy) > 0 \quad \text { for any }\nu \not = \nu '. \end{aligned}$$

3 Fixed point iterations in the quadratic case

In this section, we apply fixed point iterations/the contraction mapping theorem, in order to find the fixed point of (2.1). As it is known, this is an algorithmic recipe unlike the result in Lemma 2.1. Let us assume that \(\mathcal {X}_t=\mathcal {Y}_t=\mathbb {R}\), \(t=1, \cdots , N\), and

$$\begin{aligned} F(x,y,{\nu })= \frac{1}{2} \sum \limits _{t=1}^N |x_t-y_t|^2 + V[{\nu }](y), \end{aligned}$$

where \(y \mapsto \mathcal {V}[{\nu }](y)\) is lower semicontimuous and bounded from below for any \({\nu } \in \mathcal {P}(\mathcal {Y})\). Due to the explicit structure of F, for any \(\nu \in \mathcal {P}(\mathcal {Y})\) we can actually solve the minimization problem

$$\begin{aligned} \min _{\pi \in \Pi _c(\eta ,\cdot )} \int _{\mathcal {X}\times \mathcal {Y}} F(x,y,{\nu }) \, \pi (dx,dy) \end{aligned}$$
(3.1)

recursively. We first present the construction of minimizers of (3.1), and hence obtain a map \(\Psi : \mathcal {P}(\mathcal {Y}) \rightarrow \mathcal {P}(\mathcal {Y})\). Then we prove that \(\Psi \) is actually a contraction under further assumptions.

3.1 Minimizer of (3.1)

We first sketch the idea. For any \(\eta \in \mathcal {P}(\mathcal {X})\), define its disintegration

$$\begin{aligned} \eta _1(A)&:= \eta (A \times \mathbb {R}^{N-1}), \quad A \subset \mathbb {R}, \\ \eta ^{x_{1:t}}&:= \mathcal {L}^{\eta }( x_{t+1} \, | \, \mathcal {F}^{\mathcal {X}}_t), \quad \quad \ \ t=1, \cdots , N-1. \end{aligned}$$

Then we have that \(\eta =\eta _1 \otimes \eta ^{x_1} \otimes \cdots \otimes \eta ^{x_{1:N-1}}\). Denote \(V[{\nu }]_N(x,y):= V[{\nu }](y)\). For \(t=N,\cdots , 1\), we define recursively

$$\begin{aligned} \text {Opt}^{(x,y)_{1:t-1}}(x_t)&:= \inf _{\bar{y} \in \mathcal {Y}_t} \left\{ \frac{1}{2}|x_t-\bar{y}|^2+ V[{\nu }]_t(x_{1:t},y_{1:t-1},\bar{y})\right\} \end{aligned}$$
(3.2)
$$\begin{aligned} T[{\nu }]_{t}^{(x,y)_{1:t-1}}(x_t)&\in \mathcal {P}\left( {\mathop {\textrm{argmin}}\limits _{\bar{y} \in \mathcal {Y}_t}} \left\{ \frac{1}{2}|x_t-\bar{y}|^2+ V[{\nu }]_t(x_{1:t},y_{1:t-1},\bar{y})\right\} \right) , \end{aligned}$$
(3.3)

and also

$$\begin{aligned} V[{\nu }]_{t-1}(x_{1:t-1}, y_{1:t-1})&:=\int _{x_t \in \mathcal {X}_t} \text {Opt}^{(x,y)_{1:t-1}}(x_t) \, \eta ^{x_{1:t-1}}(dx_t), \end{aligned}$$
(3.4)

with the understanding that, when \(t=1\), we interpret \(1:0=\emptyset \) and hence \( \eta ^{x_{1:t-1}}:=\eta _1\) and so forth, in the above equation. We assume implicitly, for the time being, that the optimal value (3.2) depends measurably on the various parameters, and likewise that at least one optimizing kernel (3.3) exists. With each measurable choice of optimizing kernels in (3.3) it is possible to paste together a coupling as follows: by induction one defines first \(\pi [\nu ]_1\in \mathcal {P}(\mathcal {X}_1\times \mathcal {Y}_1)\) as \(\eta _1(dx_1)T[\nu ]_1^{\emptyset }(x_1)(dy_1)\) and then \(\pi [\nu ]^{(x,y)_{1:t-1}}(dx_t,dy_t):=\eta ^{x_{1:t-1}}(dx_t)T[{\nu }]_{t}^{(x,y)_{1:t-1}}(x_t)(dy_t)\). Setting

$$\begin{aligned} \pi [\nu ]:= \pi [\nu ]_1 \otimes \pi [\nu ]^{(x,y)_1} \otimes \cdots \otimes \pi [\nu ]^{(x,y)_{1:N-1}}, \end{aligned}$$
(3.5)

we construct a causal coupling with \(\mathcal {X}\)-marginal \(\eta \). It can be proven that, given \(\nu \), the set of all such couplings \(\pi [\nu ]\) is equal to \(\Phi (\nu )\), i.e. the best responses to \(\nu \). In particular \(T(\nu )\), the set of \(\mathcal {Y}\)-marginals of best responses, is equal to the set of \(\mathcal {Y}\)-marginals of all such \(\pi [\nu ]\).

In the particular case that the selection (3.3) is a Dirac measure (we still denote by \(T[{\nu }]_{t}^{(x,y)_{1:t-1}}(x_t)\) the support of such Dirac measure), then the above recipe allows us to build an adapted map \(\mathcal {T}[\nu ](x)=(\mathcal {T}[\nu ]_1(x_1),\mathcal {T}[\nu ]_2(x_{1:2}),\dots ,\mathcal {T}[\nu ]_N(x_{1:N}) )\) inductively as follows: \(\mathcal {T}[\nu ]_1(x_1):= T[\nu ]_1(x_1)\) and \(\mathcal {T}[\nu ]_t(x_{1:t}):= T[{\nu }]_k^{\left( x_{1:k-1}, \mathcal {T}[{\nu }]_{1:k-1}(x_{1:k-1})\right) }(x_k)\). Hence this defines a causal coupling with \(\mathcal {X}\)-marginal \(\eta \), supported on the graph of an adapted map, via \(\pi [\nu ]:=(id,\mathcal {T}[\nu ])(\eta )\).

Proposition 3.1

If (3.2) admits a minimizer (for any \( t=1, \cdots , N\), \(x_{1:t} \in \mathcal {X}_{1:t}\) and \(y_{1:t-1} \in \mathcal {Y}_{1:t-1}\)), then \(\pi [\nu ]\) defined in (3.5) minimizes (3.1). If (3.2) admits a unique minimizer (for any \( t=1, \cdots , N\), \(x_{1:t} \in \mathcal {X}_{1:t}\) and \(y_{1:t-1} \in \mathcal {Y}_{1:t-1}\)), then so does (3.1) and its unique minimizer is supported on the graph of an adapted map.

Proof

First of all we stress that the proposed construction of \(\pi [\nu ]\) is well-founded. This is proved by backwards induction from \(t=N-1\) to \(t=0\), and standard measurable selection arguments: Details aside, one applies [13, Proposition 7.50] so that (3.2) is analytically measurable in its parameters, and (3.3) admits analytically measurable selectors. By the same token (3.4) is well-defined and analytically measurable. Then one iterates these arguments. The same arguments, applied to the case when (3.2) admits a unique minimizer (for any \( t=1, \cdots , N\), \(x_{1:t} \in \mathcal {X}_{1:t}\) and \(y_{1:t-1} \in \mathcal {Y}_{1:t-1}\)), show the well-foundedness of the mentioned coupling supported on the graph of an adapted map. Hence, it remains to discuss optimality.

Let \(\gamma \in \Pi _c ( \eta , \cdot )\). Denote its disintegration by \(\gamma _1 \otimes \gamma ^{(x,y)_1} \otimes \cdots \otimes \gamma ^{(x,y)_{1:N-1}}\). Since \(\gamma \) is causal, the \(\mathcal {X}_t\)-marginal of \(\gamma ^{(x,y)_{1:t-1}}\) is just \(\eta ^{x_{1:t-1}}\), and hence we have the disintegration \(\gamma ^{(x,y)_{1:t-1}}(dx_t,dy_t)=\eta ^{x_{1:t-1}}(dx_t) \otimes \gamma ^{(x,y)_{1:t-1}}(x_t,d y_t )\).

For any fixed \((x,y)_{1:N-1}\), according to our construction of \(\pi \), it is clear that

$$\begin{aligned}&\int _{\mathcal {X}_N \times \mathcal {Y}_N} F(x,y,{\nu }) \,\gamma ^{(x,y)_{1:N-1}}(dx_N, dy_N) \\&= \frac{1}{2} \sum \limits _{t=1}^{N-1} |x_t-y_t|^2 + \int _{\mathcal {X}_N \times \mathcal {Y}_N} \left( \frac{1}{2}|x_N-y_N|^2+ V[{\nu }]_N(x,y) \right) \,\gamma ^{(x,y)_{1:N-1}}(dx_N, dy_N) \\&= \frac{1}{2} \sum \limits _{t=1}^{N-1} |x_t-y_t|^2 \\&\ \ \ + \int _{\mathcal {X}_N } \eta ^{x_{1:N-1}}(dx_N) \int _{\mathcal {Y}_N} \left( \frac{1}{2}|x_N-y_N|^2+ V[{\nu }]_N(x,y) \right) \,\gamma ^{(x,y)_{1:N-1}}(x_N, dy_N) \\&\ge \frac{1}{2} \sum \limits _{t=1}^{N-1} |x_t-y_t|^2+V[{\nu }](x_{1:N-1}, y_{1:N-1})\\&= \int _{\mathcal {X}_N \times \mathcal {Y}_N} F(x,y,{\nu }) \,\pi [\nu ]^{(x,y)_{1:N-1}}(dx_N, dy_N), \end{aligned}$$

since by definition \(\pi [\nu ]^{(x,y)_{1:N-1}}(x_N,dy_N)\) is concentrated on the set of minimizers of (3.2). Similarly, for any fixed \((x,y)_{1:N-2}\), it can be readily seen that

$$\begin{aligned}&\int _{\mathcal {X}_{N-1:N} \times \mathcal {Y}_{N-1:N}} F(x,y,{\nu }) \, \gamma ^{(x,y)_{1:N-2}}(dx_{N-1}, dy_{N-1}) \otimes \gamma ^{(x,y)_{1:N-1}}(dx_{N}, dy_{N}) \\&\ge \frac{1}{2} \sum \limits _{t=1}^{N-2} |x_t-y_t|^2 \\&\ \ \ + \int _{\mathcal {X}_{N-1} \times \mathcal {Y}_{N-1}} \left( \frac{1}{2} |x_{N-1}-y_{N-1}|^2+V[{\nu }]_{N-1}(x_{1:N-1}, y_{1:N-1}) \right) \gamma ^{(x,y)_{1:N-2}}(x_{N-1}, dy_{N-1}) \\&\ge \frac{1}{2} \sum \limits _{t=1}^{N-2} |x_t-y_t|^2+V[{\nu }]_{N-2}(x_{1:N-2}, y_{1:N-2}) \\&=\int _{\mathcal {X}_{N-1:N} \times \mathcal {Y}_{N-1:N}} F(x,y,{\nu }) \, \pi [\nu ]^{(x,y)_{1:N-2}}(dx_{N-1}, dy_{N-1}) \otimes \pi [\nu ]^{(x,y)_{1:N-1}}(dx_{N}, dy_{N}). \end{aligned}$$

Repeating the above argument iteratively for \(t= N-2, \cdots , 1\), one can show that

$$\begin{aligned} \int _{\mathcal {X}\times \mathcal {Y}} F(x,y,{\nu }) \, (\gamma -\pi [\nu ]) (dx,dy) \ge 0. \end{aligned}$$

\(\square \)

3.2 \(\mathcal {W}_1\) contraction

As a first step, the convexity of \(y_{1:t} \mapsto V[\nu ]_t(x_{1:t},y_{1:t})\) will be analyzed quantitatively in Proposition 3.2 under a convexity assumption on \(V[\nu ]\). Then we will prove the contractivity of the best response map in Propositions 3.3 and 3.4 using the convexity of \(y_{1:t} \mapsto V[\nu ]_t(x_{1:t},y_{1:t})\) together with a Lipschitz property of \(\nu \mapsto \nabla V[\nu ](y)\). In addition, to exchange derivatives and integrals in Proposition 3.2, we need to assume that \(\eta \) has finite first moment. We now give the precise assumptions we need. These will be of quantitative flavor. The reason is that, as we will be arguing with backwards induction, we will need to make sure that neither convexity nor the Lipschitz property are destroyed.

Assumption 3.1

  1. (i)

    For any \({\nu } \in \mathcal {P}(\mathcal {Y})\), \(y \mapsto V[{\nu }](y)\) is twice continuously differentiable, and there exist two constants \(\kappa \ge \lambda \ge 0\) such that \(\kappa I_N \ge \nabla ^2 V[{\nu }] \ge \lambda I_N\), and

    $$\begin{aligned} \kappa +\lambda \ge 3 \times 5 \times \cdots \times (2N-1) \times (\kappa -\lambda ). \end{aligned}$$
  2. (ii)

    There exists a constant \(L>0\) such that \(\nu \mapsto \nabla V [{\nu }](y)\) is L-Lipschitz for any \(y \in \mathcal {Y}\).

  3. (iii)

    \(\eta \) has finite first moment.

Remark 3.1

In Point (ii) of Assumption 3.1, the Lipschitz property is meant to hold under the 1-Wasserstein distance, defined by:

$$\begin{aligned} \mathcal {W}_1(\mu ,\nu ):=\sup _{\begin{array}{c} f:{\mathbb {R}}^N\rightarrow {\mathbb {R}}^N \\ 1-Lipschitz \end{array} }\int fd(\mu -\nu ). \end{aligned}$$

For the convexity of \(y_{1:t} \mapsto V[{\nu }]_{t}(x_{1:t}, y_{1:t})\), we need the following lemma whose proof is trivial and so it is omitted.

Lemma 3.1

Suppose M is a symmetric \(N \times N\) matrix such that \(\kappa \, I_N \ge M \ge \lambda \, I_N\). Then

$$\begin{aligned}&M_{ii} \in [\lambda , \kappa ], \quad \quad i=1, \cdots , N; \\&|M_{i,j}| \le \sqrt{(M_{ii}-\lambda )(M_{jj}-\lambda ) } \le \kappa -\lambda , \quad 1 \le i \not = j \le N. \end{aligned}$$

In the rest of the paper, let us denote by \(\nabla _{y_{1:k}}V[\nu ]_k\) and \(\nabla ^2_{y_{1:k}}V[\nu ]_k\) the gradient and Hessian of \( y_{1:k} \mapsto V[\nu ]_k(x_{1:k},y_{1:k})\) respectively.

Proposition 3.2

Under Points (i) and (iii) of Assumption 3.1, the function \(y_{1:k} \mapsto V[{\nu }]_k(x_{1:k},y_{1:k})\) is twice continuously differentiable, and \(\kappa _k I_k \ge \nabla ^2_{y_{1:k}} V[{\nu }]_k \ge \lambda _k I_k\), where

$$\begin{aligned} \lambda _k:= \frac{\kappa +\lambda -(2k+1)\cdots (2N-1)(k-\lambda )}{2}, \nonumber \\ \kappa _k:= \frac{\kappa +\lambda +(2k+1)\cdots (2N-1)(k-\lambda )}{2}. \end{aligned}$$
(3.6)

Proof

Suppose \(t=N-1\). The minimization problem (3.2) is strictly convex for each value of x and \(y_{1:N-1}\). Hence the first order conditions of (3.3) completely characterize the unique minimizer \(T[{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N)\), and we obtain that

$$\begin{aligned} T[{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N)+\partial _{y_N} V [{\nu }]_N\left( x,y_{1:N-1}, T[\hat{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N) \right) =x_N. \end{aligned}$$
(3.7)

Let us show that \(T[{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N)\) is Lipschitz in \(x_N\), which is necessary for us to exchange integral and derivative later in this argument. Denote \(y_N= T[{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N)\), \(y_N' = T[{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N')\). Due to the first order condition, we have that

$$\begin{aligned}&(y_N-y_N')^2 + (y_N-y_N') \left( \partial _{y_N}V[\nu ]_N(x,y_{1:N-1}, y_N)-\partial _{y_N}V[\nu ]_N(x,y_{1:N-1}, y_N') \right) \\&= (y_N-y_N')(x_N-x_N'). \end{aligned}$$

According to Assumption 3.1 (i), the left hand side is bounded from below by \((1+\lambda ) (y_N-y_N')^2\), and hence we obtain that

$$\begin{aligned} \left| T [{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N')-T[{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N') \right| =|y_N-y_N'| \le \frac{|x_N-x_N'|}{1+\lambda }. \end{aligned}$$
(3.8)

As abbreviations, we take \(T_N:=T[{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N)\), \(V_N=V[{\nu }]_N(x,y_{1:N-1}, T_N)\), and

$$\begin{aligned} V_{N-1}&=V[{\nu }]_{N-1}(x_{1:N-1}, y_{1:N-1}) \\&= \int _{x_N \in \mathcal {X}_N} \frac{1}{2} |x- T[{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N)|^2\\ {}&\quad + V[{\nu }]_N(x,y_{1:N-1},T[{\nu }]_{N}^{(x,y)_{1:N-1}}(x_N)) \, \eta ^{x_{1:N-1}}(dx_N). \end{aligned}$$

According to the implicit function theorem, which is applicable thanks to Assumption 3.1(i), \(T_N\) is continuously differentiable in y. By the envelope theorem, \(V_{N-1}\) is continuously differentiable (as \(V_N\) is) in y, and we have

$$\begin{aligned} \partial _{y_t} V_{N-1} =&\int _{x_N \in \mathcal {X}_N} \left( (T_N -x_N) \partial _{y_t}T_N+\partial _{y_t}V_N + \partial _{y_{N}}V_N \partial _{y_t}T_N\right) \, \eta ^{x_{1:N-1}}(dx_N) \nonumber \\ =&\int _{x_N \in \mathcal {X}_N} \partial _{y_{t}}V_N \, \eta ^{x_{1:N-1}}(dx_N). \end{aligned}$$
(3.9)

We can deduce from (3.8) and Lemma 3.1 that \(\partial _{y_t} V[\nu ]_N(x,y_{1:N-1}, T_N)\) is Lipschitz in \(x_N\) and y, which justifies together with Assumption 3.1 (iii) the exchange of derivative and integral in (3.9). By the same token, we deduce that \(V_{N-1}\) is in effect twice continuously differentiable in y and we have

$$\begin{aligned} \partial ^2_{y_k y_t} V_{N-1}=\int _{x_N \in \mathcal {X}_N} \left( \partial ^2_{y_k y_t } V_N + \partial ^2_{y_t y_N} V_N \partial _{y_k}T_N \right) \, \eta ^{x_{1:N-1}}(dx_N). \end{aligned}$$

Taking derivative of (3.7) with respect to \(y_k\), it can be seen that

$$\begin{aligned} \partial _{y_k} T_N (1+\partial ^2_{y_N}V_N)+\partial ^2_{y_{k} y_N} V_N =0, \end{aligned}$$

and hence

$$\begin{aligned} \partial _{y_{k}}T_N=-\frac{\partial ^2_{y_{k} y_N} V_N}{(1+\partial ^2_{y_N}V_N)}. \end{aligned}$$

Therefore we obtain that

$$\begin{aligned} \partial ^2_{y_k y_t} V_{N-1} =&\int _{x_N \in \mathcal {X}_N} \left( \partial ^2_{y_t y_k} V_N -\frac{(\partial ^2_{y_ty_N} V_N)(\partial ^2_{y_{k} y_N} V_N)}{(1+\partial ^2_{y_N}V_N)} \right) \, \eta ^{x_{1:N-1}}(dx_N). \end{aligned}$$
(3.10)

Take any vector \(\xi =(\xi _1, \cdots , \xi _{N-1})\). Using (3.10), Cauchy-Schwarz inequality, and Lemma 3.1, it can be easily seen that

$$\begin{aligned} \xi ^\top \nabla _{y_{1:N-1}}^2 V_{N-1} \xi&\ge \lambda \Vert \xi \Vert ^2 -\frac{(\sum _{j=1}^{N-1}\xi _j \partial ^2_{y_jy_N} V_N)^2}{1+\partial ^2_{y_N} V_N} \\&\ge \left( \lambda -\sum _{j=1}^{N-1} \frac{(\partial ^2_{y_j} V_N-\lambda )(\partial ^2_{y_N} V_N-\lambda )}{1+\partial ^2_{y_N} V_N} \right) \Vert \xi \Vert ^2 \\&\ge \left( \lambda -(N-1) (\kappa -\lambda ) \right) \Vert \xi \Vert ^2, \end{aligned}$$

and similarly

$$\begin{aligned} \xi ^\top \nabla _{y_{1:N-1}}^2 V_{N-1} \xi \le \left( \kappa +(N-1) (\kappa -\lambda ) \right) \Vert \xi \Vert ^2. \end{aligned}$$

Therefore, we obtain that

$$\begin{aligned} (\kappa +(N-1)(\kappa -\lambda ) )I_{N-1} \ge \nabla _{y_{1:N-1}}^2 V_{N-1} \ge (\lambda -(N-1)(\kappa -\lambda ))I_{N-1}, \end{aligned}$$

or equivalently, that

$$\begin{aligned} \frac{(\kappa +\lambda +(2N-1)(\kappa -\lambda ) )}{2}I_{N-1} \ge \nabla _{y_{1:N-1}}^2 V_{N-1} \ge \frac{(\kappa +\lambda -(2N-1)(\kappa -\lambda ) )}{2}I_{N-1}, \end{aligned}$$

By induction, following the exact same arguments as above, we can get that for each \(1\le k \le N-1\) the function \(V_k\) is twice continuously differentiable in y and

$$\begin{aligned} \lambda _k I_k \le \nabla _{y_{1:k}}^2 V_{k} \le \kappa _k I_k, \end{aligned}$$

where \(\lambda _k, \kappa _k\) are defined as in (3.6). \(\square \)

By Proposition 3.2, we know that \(V[{\nu }]_t\) is convex in \(y_t\) for any \(t =1,\cdots , N\) under Assumption 3.1 (i). It follows that the problems (3.2) admit a unique minimizer. Then, by Proposition 3.1, it follows that Problem (3.1) admits a unique minimizer \(\pi [\nu ]\). This minimizer is furthermore supported on the graph of an adapted map \(\mathcal {T}[\nu ]\). To simplify notation, we write

$$\begin{aligned} \Psi : \mathcal {P}(\mathcal {Y})&\rightarrow \mathcal {P}(\mathcal {Y}) \nonumber \\ {\nu }&\mapsto \mathcal {T}[\nu ](\eta )= Pj\circ \Phi (\nu ), \end{aligned}$$
(3.11)

which is now an actual function, rather than a set-valued one. Observe that any minimizer of the problem

$$\begin{aligned}&\min _{\pi \in \Pi _c(\eta , \Psi ({\nu }))} \int _{\mathcal {X}\times \mathcal {Y}} F(x,y,{\nu }) \, \pi (dx,dy) \nonumber \\&\quad = \min _{\pi \in \Pi _c(\eta , \Psi ({\nu }))} \int _{\mathcal {X}\times \mathcal {Y}}\frac{\Vert x-y\Vert ^2}{2} \pi (dx,dy) + \int V[\nu ](y)\Psi (\nu )(dy). \end{aligned}$$
(3.12)

is also the minimizer of (3.1). Hence we conclude that \(\pi [\nu ]\) is also the unique minimizer of (3.12).

Now we analyze the Lipschitz property of the function \(({\nu },y) \mapsto T[{\nu }]_k^{(x,y)_{1:k-1}}(x_k)\), and after that we will show that \(\Psi \) is a contraction under Assumption 3.1. Here the contraction property is meant to hold under the 1-Wasserstein distance.

Proposition 3.3

Under Assumption 3.1, it holds that

$$\begin{aligned} \left| T[{\nu }]_k^{(x,y)_{1:k-1}}(x_k)-T[{\nu }']_k^{(x,y')_{1:k-1}}(x_k) \right| \le \frac{L_k}{1+ \lambda _k} \mathcal {W}_1({\nu }, {\nu }') +\frac{(\kappa _k -\lambda _k)\sum _{t=1}^{k-1} |y_t-y'_t|}{1+\lambda _k}, \end{aligned}$$

where \(L_N:=L\) and

$$\begin{aligned} L_k:=\frac{1+\kappa _{k+1}}{1+\lambda _{k+1}} L_{k+1}, \quad k=N-1, \cdots , 1. \end{aligned}$$
(3.13)

Proof

   

Step 1:

First we prove that

$$\begin{aligned}&\left| T[{\nu }]_{k}^{(x,y)_{1:k-1}}(x_k)- T[{\nu }']_k^{(x,y)_{1:k-1}}(x_k) \right| \le \frac{L_k}{1+ \lambda _k} \mathcal {W}_1({\nu }, {\nu }') . \end{aligned}$$
(3.14)

Denote \(\overline{y}_N=T[{\nu }]^{(x,y)_{1:N-1}}_N(x_N)\), \(\overline{y}_N'=T[{\nu }']^{(x,y)_{1:N-1}}_N(x_N)\). It can be easily seen, by the first order optimality conditions as in (3.7), that

$$\begin{aligned} \overline{y}_N-\overline{y}_N'+\partial _{y_N}V[{\nu }]_N(x,y_{1:N-1}, \overline{y}_N) - \partial _{y_N}V[{\nu }']_N(x,y_{1:N-1},\overline{y}_N')=0, \end{aligned}$$

and hence

$$\begin{aligned}&(\overline{y}_N-\overline{y}_N')^2+(\overline{y}_N-\overline{y}_N')\left( \partial _{y_N}V[{\nu }]_N(x,y_{1:N-1}, \overline{y}_N) - \partial _{y_N}V[{\nu }]_N(x,y_{1:N-1},\overline{y}_N')\right) \nonumber \\&=(\overline{y}_N-\overline{y}_N')\left( \partial _{y_N}V[{\nu }']_N(x,y_{1:N-1}, \overline{y}_N') - \partial _{y_N}V[{\nu }]_N(x,y_{1:N-1},\overline{y}_N')\right) . \end{aligned}$$
(3.15)

Using the convexity of \(V[{\nu }]_N\) in \(y_N\), the left hand side of (3.15) is greater than

$$\begin{aligned} (1+\lambda ) (\overline{y}_N-\overline{y}_N')^2, \end{aligned}$$

while the right hand side is smaller than \(L|\overline{y}_N-\overline{y}_N'|\mathcal {W}_1({\nu }, {\nu }')\). Therefore we obtain that

$$\begin{aligned} |\overline{y}_N-\overline{y}_N'| \le \frac{L\mathcal {W}_1({\nu }, {\nu }')}{1+\lambda }. \end{aligned}$$
(3.16)

According to (3.9), we know that

$$\begin{aligned}&\left| \nabla _{y_{1:N-1}} V[{\nu }]_{N-1} - \nabla _{y_{1:N-1}} V[{\nu }']_{N-1}\right| \\&=\left| \int _{x_N \in \mathcal {X}_N} \left( \nabla _{y_{1:N-1}}V_N[\nu ](x,y_{1:N-1}, \overline{y}_N)-\nabla _{y_{1:N-1}}V_N[\nu '](x,y_{1:N-1}, \overline{y}_N') \right) \, \eta ^{x_{1:N-1}}(dx_N) \right| \\&\le \left| \int _{x_N \in \mathcal {X}_N} \left( \nabla _{y_{1:N-1}}V_N[\nu ](x,y_{1:N-1}, \overline{y}_N)-\nabla _{y_{1:N-1}}V_N[\nu '](x,y_{1:N-1}, \overline{y}_N) \right) \, \eta ^{x_{1:N-1}}(dx_N) \right| \\&\ \ \ + \left| \int _{x_N \in \mathcal {X}_N} \left( \nabla _{y_{1:N-1}}V_N[\nu '](x,y_{1:N-1}, \overline{y}_N)-\nabla _{y_{1:N-1}}V_N[\nu '](x,y_{1:N-1}, \overline{y}_N') \right) \, \eta ^{x_{1:N-1}}(dx_N) \right| . \end{aligned}$$

The first term on the right hand side is bounded above by \(L\mathcal {W}_1({\nu }, {\nu }')\) due the point (ii) of Assumption 3.1. By Lemma 3.1, we obtain

$$\begin{aligned} \left| \partial _{y_N} \nabla _{y_{1:N-1}}V_N[\nu '](x,y_{1:N-1}, y_N) \right| \le (\kappa -\lambda ), \end{aligned}$$

and thus (3.16) implies

$$\begin{aligned} \left| \nabla _{y_{1:N-1}}V_N[\nu '](x,y_{1:N-1}, \overline{y}_N)-\nabla _{y_{1:N-1}}V_N[\nu '](x,y_{1:N-1}, \overline{y}_N') \right| \le \frac{ (\kappa -\lambda )L\mathcal {W}_1({\nu }, {\nu }')}{1+\lambda }. \end{aligned}$$

Combining these estimates, we get that

$$\begin{aligned}&\left| \nabla _{y_{1:N-1}} V[{\nu }]_{N-1} - \nabla _{y_{1:N-1}} V[{\nu }']_{N-1}\right| \\&\le L\mathcal {W}_1({\nu }, {\nu }') + \frac{ (\kappa -\lambda )L\mathcal {W}_1({\nu }, {\nu }')}{1+\lambda }. \end{aligned}$$

Recursively, we get that for \(k=N-1, \cdots , 1\),

$$\begin{aligned} \left| \nabla V[{\nu }]_k- \nabla V[{\nu }']_k \right| \le L_k \mathcal {W}_1({\nu }, {\nu }'), \end{aligned}$$

and also (3.14)

Step 2 :

Let us compute \(|T[{\nu }]_k^{(x,y)_{1:k-1}}(x_k)-T[{\nu }]_k^{(x,y')_{1:k-1}}(x_k)|\). By first order condition, we have that

$$\begin{aligned}&T[{\nu }]_k^{(x,y)_{1:k-1}}(x_k)-T[{\nu }]_k^{(x,y')_{1:k-1}}(x_k) \\&\quad + \partial _{y_k} V[{\nu }]_k\left( x_{1:k}, y_{1:k-1}, T[{\nu }]_k^{(x,y)_{1:k-1}}(x_k)\right) \\&\quad - \partial _{y_k} V[{\nu }]_k\left( x_{1:k}, y'_{1:k-1}, T[{\nu }]_k^{(x,y')_{1:k-1}}(x_k)\right) =0. \end{aligned}$$

Similar to the derivation of (3.16), using Proposition 3.2 and Lemma 3.1 we get that

$$\begin{aligned} \left| T[{\nu }]_k^{(x,y)_{1:k-1}}(x_k)-T[{\nu }]_k^{(x,y')_{1:k-1}}(x_k) \right| \le \frac{(\kappa _k -\lambda _k)\sum _{t=1}^{k-1} |y_t-y'_t|}{1+\lambda _k}. \end{aligned}$$
(3.17)
Step 3 :

We combine the first two steps using the triangle inequality.

\(\square \)

Proposition 3.4

Under Assumption 3.1 the function \(\Psi \) defined in (3.11) is a contraction in \(\mathcal {W}_1\) metric if

$$\begin{aligned} \frac{L_1\left( \frac{\kappa _1-\lambda _1}{1+\lambda _1}\right) ^N-L_1}{\kappa _1-2\lambda _1-1}<1. \end{aligned}$$
(3.18)

Proof

Let us recall the construction from Sect. 3.1: Using \(T[{\nu }]_1, \cdots , T[{\nu }]_N\), we can define \(\mathcal {T}[{\nu }]=(\mathcal {T}[{\nu }]_1, \cdots , \mathcal {T}[{\nu }]_N): \mathcal {X}\rightarrow \mathcal {Y}\) inductively via

$$\begin{aligned}&\mathcal {T}[{\nu }]_1(x_1) =T[{\nu }]_1(x_1), \\&\mathcal {T}[{\nu }]_k(x_{1:k}) =T[{\nu }]_k^{\left( x_{1:k-1}, \mathcal {T}[{\nu }]_{1:k-1}(x_{1:k-1})\right) }(x_k), \quad k=2, \cdots , N. \end{aligned}$$

It is clear that \(\Psi ({\nu })= (\mathcal {T}[{\nu }]) ({\eta })\), and therefore

$$\begin{aligned} \mathcal {W}_1(\Psi ({\nu }),\Psi ({\nu }')) \le \int _{x \in \mathcal {X}} \left| \mathcal {T}[{\nu }](x)- \mathcal {T}[{\nu }'](x) \right| \eta (dx). \end{aligned}$$

Now according to Proposition 3.3, we have that

$$\begin{aligned} |\mathcal {T}[{\nu }]_1(x_1)- \mathcal {T}[{\nu }']_1(x_1)|\le \frac{L_1}{1+\lambda _1} \mathcal {W}_1({\nu }, {\nu }'), \end{aligned}$$

and

$$\begin{aligned}&|\mathcal {T}[{\nu }]_2(x_{1:2})- \mathcal {T}[{\nu }']_2(x_{1:2})| = \left| T[{\nu }]_2^{\left( x_1, \mathcal {T}[{\nu }]_1(x_1)\right) }(x_2) -T[{\nu }']_2^{\left( x_1, \mathcal {T}[{\nu }']_1(x_1)\right) }(x_2)\right| \\&\le \frac{L_2}{1+\lambda _2}\mathcal {W}_1({\nu }, {\nu }') +\frac{\kappa _2-\lambda _2}{1+\lambda _2} |\mathcal {T}[{\nu }]_1(x_{1})- \mathcal {T}[{\nu }']_1(x_{1})| \\&\le \frac{L_1}{1+\lambda _1} \left( 1+\frac{\kappa _1-\lambda _1}{1+\lambda _1} \right) \mathcal {W}_1({\nu }, {\nu }'). \end{aligned}$$

By induction, one can prove that

$$\begin{aligned}&|\mathcal {T}[{\nu }]_k(x_{1:k})- \mathcal {T}[{\nu }']_k(x_{1:k})| \\&\le \frac{L_1}{1+\lambda _1} \left( 1+ \cdots + \left( \frac{\kappa _1-\lambda _1}{1+\lambda _1} \right) ^{k-1} \right) \mathcal {W}_1({\nu }, {\nu }'), \end{aligned}$$

and hence

$$\begin{aligned} |\mathcal {T}[{\nu }](x)- \mathcal {T}[{\nu }'](x)| \le \frac{L_1}{1+\lambda _1} \left( 1+\cdots +\left( \frac{\kappa _1-\lambda _1}{1+\lambda _1} \right) ^{N-1} \right) \mathcal {W}_1({\nu }, {\nu }'). \end{aligned}$$

Therefore \(\Psi \) is a contraction if (3.18) is satisfied. \(\square \)

In the contracting case, it is well-known that there exist a unique fixed-point, which is furthermore determined by repeatedly iterating a map (fixed-point iterations). This tells us how to completely solve our equilibrium problem:

Corollary 3.1

Under Assumption 3.1 and Condition (3.18), we have

  1. (1)

    The Cournot-Nash problem (3.1) has a unique equilibrium \(\pi \);

  2. (2)

    The second marginal of \(\pi \) is the unique fixed point of \(\Psi \), and it can be determined by the usual fixed-point iterations “\(\nu _{m+1}=\Psi (\nu _m)\)”.

  3. (3)

    Conversely, after determining \(\nu \) the unique fixed point of \(\Psi \), the unique Cournot-Nash equilibrium \(\pi \) is determined by minimizing (3.12) or equivalently by taking \(\pi =(id,T[\nu ])(\eta )\) with \(T[\nu ]\) adapted and being uniquely (\(\eta \)-a.s.) determined via the recursions (3.3).

4 Application to optimal liquidation in a price impact model

We give a description of the price impact model in discrete time. An agent has at time 0 a number \(Q_0>0\) of shares on a stock. At time 1, based on the available information, she aims to sell \(y_1\) shares for their current price \(S_1\), after which she is left with \(Q_1=Q_0-y_1\) shares. This is iterated until time N, where she chooses to sell \(y_{N}\) shares based on her current information, at the current price of \(S_{N}\), leaving her with \(Q_{N}=Q_{N-1}-y_{N}\) shares. The total earnings from this strategy is then

$$\begin{aligned} E_N:= \sum _{i=1}^{N}y_iS_i. \end{aligned}$$

As for the behaviour of the share prices \(S_i\), we suppose that \(S_0\in \mathbb {R}\) is known and that otherwise

$$\begin{aligned} S_{i}-S_{i-1}=x_i-x_{i-1}-m_i[\nu ], \end{aligned}$$

where \(x\sim \eta \) is noise (wlog. we assume \(x_0=0\)) and \(m_i[\nu ]\) stands for the mean of the i-th marginal of a measure \(\nu \). The idea is that the i-th marginal of \(\nu \) is (in equilibrium) the distribution of the number of shares sold at time i, and so the term \(m_i[\nu ]\) in the dynamics of S indicates a permanent market impact caused by a population of identical, independent and negligible agents who at time i decide to sell a number of shares.

We define

$$\begin{aligned} F(x,y,\nu ):= AQ_N^2 + K\sum _{i=1}^{N} y_i^2-E_N, \end{aligned}$$

where the first term accounts for a final cost of inventory and the second term models the accumulated transaction costs. Given a distribution \(\nu \) of decisions taken by a population of agents, a negligible agent will aim to minimize the \(\eta \)-expectation of F over the strategies adapted to the information of the share prices, or equivalently, the strategies adapted to x. More precisely, a pure equilibrium for this game would be an adapted map \(\hat{T}\) and a measure \(\hat{\nu }\) such that

(i):

\(\hat{T}\in {\mathop {\textrm{argmin}}\limits _{T \, \text {adapted}}}\int F(x, T(x),\hat{\nu }) \, \eta (dx);\)

(ii):

\(\hat{T}(\eta )=\hat{\nu }\).

For this model we easily check that \(F(x,y,\nu )=\frac{1}{2}\Vert x-y\Vert ^2+V[\nu ](y)\) where

$$\begin{aligned} V[\nu ](y):= \left( K-\frac{1}{2}\right) \sum _i y_i^2-S_0\sum y_i+A\left( Q_0- \sum _i y_i\right) ^2+\sum _iy_i\sum _{k\le i} m_k[\nu ]. \end{aligned}$$
(4.1)

Let us denote by \(\mathbbm {1}_N\) an N-dimensional vector with 1 in each coordinate, and by \(\mathbbm {1}_{N \times N}\) an \(N \times N\) matrix equal to \(\mathbbm {1}_N^\top \mathbbm {1}_N\). On the other hand \(I_N\) denotes the identity matrix. Hence \(\nabla V[\nu ](y)=(2K-1)y+\{2A(\sum y_i-Q_0)-S_0\}\mathbbm {1}_{N}+(\sum _{k\le i}m_k[\nu ] )_{i=1}^N\), and so \(\nu \mapsto \nabla V[\nu ](y)\) is N-Lipschitz with respect to the 1-Wasserstein distance, uniformly in y. Moreover, \(\nabla ^2V[\nu ](y)=2A\mathbbm {1}_{N\times N}+(2K-1)I_N\), and so we have that \(\kappa I_N \ge \nabla ^2V[\nu ](y) \ge \lambda I_N\), where \(\kappa =2K-1+2AN\) and \(\lambda =2K-1\).

Corollary 4.1

Take \(L_N=N\), \(\kappa =2K-1+2AN\), \(\lambda =2K-1\), and define \(L_t, \kappa _t, \lambda _t\), \(t=N-1, \cdots 1\) recursively as in (3.6) and (3.13). Then there exists a unique equilibrium if Assumption 3.1 (i) and (3.18) are satisfied.

In our model, it can be readily seen that assumptions of Corollary 4.1 are satisfied if \( N +A \ll K \). Now we show that it is not a potential game, and therefore cannot be covered by [2]. Let us only prove it for the simplest case \(N=2\).

Lemma 4.1

There exists no Fréchet differentiable \(\mathcal {E}: \mathcal {P}(\mathbb {R}^2) \rightarrow \mathbb {R}\) such that

$$\begin{aligned} \lim \limits _{\epsilon \rightarrow 0} \frac{\mathcal {E}(\nu +\epsilon \nu )-\mathcal {E}(\nu ) }{\epsilon }=\int _{ \mathbb {R}^2} V[\nu ](y) \, \mu (dy) \end{aligned}$$
(4.2)

for any \(\mu , \nu \in \mathcal {P}(\mathbb {R}^2)\).

Proof

Let us define

$$\begin{aligned} \hat{V}[\nu ](y):= V[\nu ](y)- m_1[\nu ]y_2, \end{aligned}$$

and

$$\begin{aligned} \hat{\mathcal {E}}(\nu ):=&\int _{ \mathbb {R}^2} \left( K-\frac{1}{2}\right) \sum _i y_i^2-S_0\sum y_i+A\left( Q_0- \sum _i y_i\right) ^2 \, \nu (dy) \\&+\frac{1}{2} (m_1[\nu ])^2+ \frac{1}{2} (m_2[\nu ])^2. \end{aligned}$$

It can easily verified that

$$\begin{aligned} \lim \limits _{\epsilon \rightarrow 0} \frac{\hat{\mathcal {E}}(\nu +\epsilon \nu )-\mathcal {E}(\nu ) }{\epsilon }=\int _{ \mathbb {R}^2} \hat{V}[\nu ](y) \, \mu (dy) \end{aligned}$$

for any \(\mu , \nu \in \mathcal {P}(\mathbb {R}^2)\). Therefore it suffices to show that \(m_1[\nu ]y_2\) is not potential. Otherwise suppose there exists some \(\mathcal {E}\) such that (4.2) holds with \(V[\nu ](y)=m_1[\nu ]y_2\).

Then it can be readily seen that

$$\begin{aligned}&\mathcal {E}(\delta _T \times \delta _1) - \mathcal {E}(\delta _T \times \delta _0)\\&= \int _0^1 \, dt \int \left( m_1[\delta _T \times \delta _0+t(\delta _T \times \delta _1-\delta _T \times \delta _0)]y_2\right) \, (\delta _T \times \delta _1-\delta _T \times \delta _0)(dy)=T, \\&\mathcal {E}(\delta _T \times \delta _1) - \mathcal {E}(\delta _0 \times \delta _1)\\&= \int _0^1 \, dt \int \left( m_1[\delta _0 \times \delta _1+t(\delta _T \times \delta _1-\delta _0 \times \delta _1)]y_2\right) \, (\delta _T \times \delta _1-\delta _0 \times \delta _1)(dy)=0, \\&\mathcal {E}(\delta _T \times \delta _0) - \mathcal {E}(\delta _0 \times \delta _0)\\&= \int _0^1 \, dt \int \left( m_1[\delta _0 \times \delta _0+t(\delta _T \times \delta _0-\delta _0 \times \delta _0)]y_2\right) \, (\delta _T \times \delta _0-\delta _0 \times \delta _0)(dy)=0, \\&\mathcal {E}(\delta _0 \times \delta _1) - \mathcal {E}(\delta _0 \times \delta _0)\\&= \int _0^1 \, dt \int \left( m_1[\delta _0 \times \delta _0+t(\delta _0 \times \delta _1-\delta _0 \times \delta _0)]y_2\right) \, (\delta _0 \times \delta _1-\delta _0 \times \delta _0)(dy)=0. \end{aligned}$$

Therefore we obtain that

$$\begin{aligned} \mathcal {E}(\delta _T \times \delta _1)-\mathcal {E}(\delta _0 \times \delta _0)=&\; \mathcal {E}(\delta _T \times \delta _1)-\mathcal {E}(\delta _T \times \delta _0)+\mathcal {E}(\delta _T \times \delta _0)-\mathcal {E}(\delta _0 \times \delta _0)=T \\ =&\; \mathcal {E}(\delta _T \times \delta _1)-\mathcal {E}(\delta _0 \times \delta _1)+\mathcal {E}(\delta _0 \times \delta _1)-\mathcal {E}(\delta _0 \times \delta _0)=0, \end{aligned}$$

which is a contradiction. \(\square \)

To finish the article, let us present a simple example where we can illustrate how to compute the best response map \( \mathcal {T}[\nu ]\) and the fixed point \(\nu \).

Example 4.1

Suppose \(N=2\) and \(\eta =\frac{1}{2}(\delta _{0} +\delta _1) \times \frac{1}{2}(\delta _{0} + \delta _1)\). Take \(F^{\epsilon }(x,y,\nu )= \frac{1}{2} \Vert x-y \Vert ^2 + \epsilon V[\nu ](y)\), where V is given by (4.1). In the case of \(\epsilon =1\), it is just price impact model above. Hence we know that \(F^{\epsilon }\) is non-potential for \(\epsilon >0\). Let us compute the best response given \(\nu \):

$$\begin{aligned} T^{\epsilon }[\nu ]_2^{(x_1,y_1)}(x_2) =&{\mathop {\textrm{argmin}}\limits _{\bar{y} \in \mathbb {R}}} \bigg \{\frac{1}{2} |x_2-\bar{y}|^2+ \epsilon \left( (K-1/2)(y_1^2+\bar{y}^2)-S_0(y_1 + \bar{y}) \right. \\&\left. \quad \quad \quad \quad +A(Q_0-y_1-\bar{y})^2+y_1m_1[\nu ]+\bar{y} (m_1[\nu ]+m_2[\nu ])\right) \bigg \} \\ =&\frac{x_2+\epsilon (S_0-2A(y_1-Q_0)-m_1[\nu ]-m_2[\nu ])}{1+\epsilon (2K+2A-1)}. \end{aligned}$$

Plugging the above equation into

$$\begin{aligned} V^{\epsilon }[\nu ]_1(x_1,y_1)=&\frac{1}{2}\left( \frac{1}{2} |T^{\epsilon }[\nu ]_2^{(x_1,y_1)}(0)|^2+V[\nu ](y_1, T^{\epsilon }[\nu ]_2^{(x_1,y_1)}(0)) \right) \\&+ \frac{1}{2}\left( \frac{1}{2} |1-T^{\epsilon }[\nu ]_2^{(x_1,y_1)}(1)|^2+V[\nu ](y_1, T^{\epsilon }[\nu ]_2^{(x_1,y_1)}(1)) \right) \end{aligned}$$

one can express \(V^{\epsilon }[\nu ]_1(x_1,y_1)\) in terms of \(m_1[\nu ],m_2[\nu ]\) and \(y_1\). Then using the first order condition

$$\begin{aligned} 0=T^{\epsilon }[\nu ]_1(x_1)-x_1+\partial _{y_1} V^{\epsilon }[\nu ]_1(x_1,T^{\epsilon }[\nu ]_1(x_1)), \end{aligned}$$

one can find a formula of \(T^{\epsilon }[\nu ]_1(x_1)\).

After some computation, there exists some constants \(a_1^{\epsilon },\cdots , a_4^{\epsilon }\), \(\tilde{b}_1^{\epsilon }, b_1^{\epsilon }, \cdots , b_4^{\epsilon }\) such that

$$\begin{aligned} \mathcal {T}^{\epsilon }[\nu ]_1(x_1)&:=T^{\epsilon }[\nu ]_1(x_1)= a_1^{\epsilon } x_1+a_2^{\epsilon } m_1[\nu ]+a_3^{\epsilon }m_2[\nu ]+a_4^{\epsilon }, \\ \mathcal {T}^{\epsilon }[\nu ]_2(x_1,x_2)&:= T^{\epsilon }[\nu ]_2^{(x_1,\mathcal {T}^{\epsilon }[\nu ]_1(x_1))}(x_2)=b_1^{\epsilon } x_2 + \tilde{b}_1^{\epsilon } x_1 + b_2^{\epsilon } m_1[\nu ]+ b_3^{\epsilon }m_2[\nu ]+b_4^{\epsilon }. \end{aligned}$$

Since we assume that \(\eta =\frac{1}{2}(\delta _{0} +\delta _1) \times \frac{1}{2}(\delta _{0} + \delta _1)\), the optimal response measure is given by

$$\begin{aligned} \hat{\nu }:=\Psi (\nu )=\frac{1}{4} \sum _{x_1,x_2=0,1} \delta _{\left( \mathcal {T}^{\epsilon }[\nu ]_1(x_1), \mathcal {T}^{\epsilon }[\nu ]_2(x_1,x_2) \right) }, \end{aligned}$$
(4.3)

and hence is completely determined by means \( m_1 [\nu ]\) and \(m_2[\nu ]\). Computing the means of \({\hat{\nu }} \), we obtain that

$$\begin{aligned} m_1[\hat{\nu }]&=\frac{1}{2}a_1^{\epsilon } +a_2^{\epsilon } m_1[\nu ]+a_3^{\epsilon }m_2[\nu ]+a_4^{\epsilon } \\ m_2[\hat{\nu }]&=\frac{1}{2}b_1^{\epsilon } + \frac{1}{2} \tilde{b}_1^{\epsilon } + b_2^{\epsilon } m_1[\nu ]+ b_3^{\epsilon }m_2[\nu ]+b_4^{\epsilon }. \end{aligned}$$

Therefore, the equilibrium is given by the solution of the linear system

$$\begin{aligned} m_1^{\epsilon }&=\frac{1}{2}a_1^{\epsilon } +a_2^{\epsilon } m_1^{\epsilon }+a_3^{\epsilon }m_2^{\epsilon }+a_4^{\epsilon } \nonumber \\ m_2^{\epsilon }&=\frac{1}{2}b_1^{\epsilon } + \frac{1}{2} \tilde{b}_1^{\epsilon } + b_2^{\epsilon } m_1^{\epsilon }+ b_3^{\epsilon }m_2^{\epsilon }+b_4^{\epsilon }. \end{aligned}$$
(4.4)

where variables \(m_1^{\epsilon }\), \(m_2^{\epsilon }\) stand for the mean of the first and second marginals of the equilibria.

It can be verified that if \(F(x,y,\nu )\) satisfies assumptions of Proposition 3.4, e.g. when K is large, then \(F^{\epsilon }(x,y,\nu )\) also satisfies that for any \( \epsilon \in [0,1]\). Therefore, there always exists a unique solution of \(\Psi (\nu ) =\nu \) in (4.3). As discussed in the paragraph above, the solution of (4.3) is provided by the linear system (4.4). Therefore there always exists a unique solution to (4.4).

Although it is not immediate how to interpret this equilibrium, we do notice that as \(\epsilon \rightarrow 0\) the unique equilibria converge to the intuitive solution for \(\epsilon =0\). In the case that \(\epsilon =0\), the solution sends \((x_1,x_2)\) to \((y_1,y_2)\), i.e., \( \mathcal {T}^{0}[\nu ]_1(x_1)=x_1\), \( \mathcal {T}^{0}[\nu ]_2(x_1,x_2)=x_2\). Indeed, as \(\epsilon \rightarrow 0\), we have \( a_1^{\epsilon }, b_1^{\epsilon } \rightarrow 1\) and \( a_2^{\epsilon }, a_3^{\epsilon },a_4^{\epsilon }, \tilde{b}_1^{\epsilon }, b_2^{\epsilon }, b_3^{\epsilon }, b_4^{\epsilon } \rightarrow 0\). Therefore the fixed point \(m_1^{\epsilon }, m_2^{\epsilon }\) both converge to \(\frac{1}{2}\), and thus \(\lim _{\epsilon \rightarrow 0} \mathcal {T}^{\epsilon }[\nu ]_1(x_1)=x_1\), \(\lim _{\epsilon \rightarrow 0} \mathcal {T}^{\epsilon }[\nu ]_2(x_1,x_2)=x_2\).