Data-driven estimation in equilibrium using inverse optimization

Bertsimas, Dimitris; Gupta, Vishal; Paschalidis, Ioannis Ch.

doi:10.1007/s10107-014-0819-4

Data-driven estimation in equilibrium using inverse optimization

Full Length Paper
Series A
Published: 30 September 2014

Volume 153, pages 595–633, (2015)
Cite this article

Mathematical Programming Submit manuscript

Dimitris Bertsimas¹,
Vishal Gupta² &
Ioannis Ch. Paschalidis³

3716 Accesses
99 Citations
1 Altmetric
Explore all metrics

Abstract

Equilibrium modeling is common in a variety of fields such as game theory and transportation science. The inputs for these models, however, are often difficult to estimate, while their outputs, i.e., the equilibria they are meant to describe, are often directly observable. By combining ideas from inverse optimization with the theory of variational inequalities, we develop an efficient, data-driven technique for estimating the parameters of these models from observed equilibria. We use this technique to estimate the utility functions of players in a game from their observed actions and to estimate the congestion function on a road network from traffic count data. A distinguishing feature of our approach is that it supports both parametric and nonparametric estimation by leveraging ideas from statistical learning (kernel methods and regularization operators). In computational experiments involving Nash and Wardrop equilibria in a nonparametric setting, we find that a) we effectively estimate the unknown demand or congestion function, respectively, and b) our proposed regularization technique substantially improves the out-of-sample performance of our estimators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Universal Method of Searching for Equilibria and Stochastic Equilibria in Transportation Networks

Article 01 January 2019

Approximate variational inequalities and equilibria

Article Open access 25 September 2023

Network Equilibrium Models for Urban Transport

Notes

For the avoidance of doubt, the closure in (19) is with respect to the norm $\Vert \cdot \Vert _{{\mathcal {H}}_0}$.
In fact, the authors show more: they give an algorithm leveraging $\ell _1$ regularization to reduce the dimensionality of $\theta $ and then an improved bound based on the reduced dimension. The analysis of this improved bound can be adapted to our current context at the expense of more notation. We omit the details for space.

References

Abrahamsson, T.: Estimation of origin–destination matrices using traffic counts—a literature survey. www.iiasa.ac.at/Admin/PUB/Documents/IR-98-021.pdf. (1998)
Aghassi, M., Bertsimas, D., Perakis, G.: Solving asymmetric variational inequalities via convex optimization. Oper. Res. Lett. 34(5), 481–490 (2006). doi:10.1016/j.orl.2005.09.006. http://www.sciencedirect.com/science/article/pii/S0167637705001124
Ahuja, R., Orlin, J.: Inverse optimization. Oper. Res. 49(5), 771–783 (2001)
Article MathSciNet MATH Google Scholar
Allon, G., Federgruen, A., Pierson, M.: How much is a reduction of your customers’ wait worth? An empirical study of the fast-food drive-thru industry based on structural estimation methods. Manuf. Serv. Oper. Manag. 13(4), 489–507 (2011)
Google Scholar
Bajari, P., Benkard, C., Levin, J.: Estimating dynamic models of imperfect competition. Econometrica 75(5), 1331–1370 (2007)
Article MathSciNet MATH Google Scholar
Bar-Gera, H.: Transportation test problems. http://www.bgu.ac.il/bargera/tntp/ (2011). Accessed Nov 2011
Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2003)
MathSciNet MATH Google Scholar
Berry, S., Haile, P.: Nonparametric identification of multinomial choice demand models with heterogeneous consumers. Technical report, National Bureau of Economic Research (2009)
Berry, S., Levinsohn, J., Pakes, A.: Automobile prices in market equilibrium. Econom. J. Econom. Soc. 63(4), 841–890 (1995)
Berry, S.T.: Estimating discrete-choice models of product differentiation. RAND J. Econ. 25(2), 242–262 (1994)
Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont, Massachusetts (1999)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Braess, D., Nagurney, A., Wakolbinger, T.: On a paradox of traffic planning. Transp. Sci. 39(4), 446–450 (2005). doi:10.1287/trsc.1050.0127. http://transci.journal.informs.org/content/39/4/446.abstract
Branston, D.: Link capacity functions: a review. Transp. Res. 10(4), 223–236 (1976)
Article Google Scholar
Breiman, L., et al.: Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16(3), 199–231 (2001)
Article MathSciNet MATH Google Scholar
Bureau of Public Roads: Traffic assignment manual. US Department of Commerce, Urban Planning Division (1964)
Campi, M., Carè, A.: Random convex programs with ${\rm l}\_1$-regularization: sparsity and generalization. SIAM J. Control Optim. 51(5), 3532–3557 (2013)
Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19(3), 1211–1230 (2008)
Article MathSciNet MATH Google Scholar
Dafermos, S., Nagurney, A.: A network formulation of market equilibrium problems and variational inequalities. Oper. Res. Lett. 3(5), 247–250 (1984)
Article MathSciNet MATH Google Scholar
Dubé, J.P., Fox, J.T., Su, C.L.: Improving the numerical performance of static and dynamic aggregate discrete choice random coefficients demand estimation. Econometrica 80(5), 2231–2267 (2012)
Article MathSciNet MATH Google Scholar
Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)
Article MathSciNet MATH Google Scholar
Fudenberg, D., Tirole, J.: Game Theory. MIT Press, Cambridge (1991)
Google Scholar
Gallego, G., Huh, W., Kang, W., Phillips, R.: Price competition with the attraction demand model: existence of unique equilibrium and its stability. Manuf. Serv. Oper. Manag. 8(4), 359–375 (2006)
Google Scholar
Girosi, F., Jones, M., Poggio, T.: Priors stabilizers and basis functions: from regularization to radial, tensor and additive splines. http://dspace.mit.edu/handle/1721.1/7212 (1993)
Guerre, E., Perrigne, I., Vuong, Q.: Optimal nonparametric estimation of first-price auctions. Econometrica 68(3), 525–574 (2000)
Article MathSciNet MATH Google Scholar
Harker, P., Pang, J.: Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications. Math. Program. 48(1), 161–220 (1990)
Article MathSciNet MATH Google Scholar
Heuberger, C.: Inverse combinatorial optimization: a survey on problems, methods, and results. J. Comb. Optim. 8(3), 329–361 (2004)
Article MathSciNet MATH Google Scholar
Iyengar, G., Kang, W.: Inverse conic programming with applications. Oper. Res. Lett. 33(3), 319 (2005)
Article MathSciNet MATH Google Scholar
LeBlanc, L., Morlok, E., Pierskalla, W.: An efficient approach to solving the road network equilibrium traffic assignment problem. Transp. Res. 9(5), 309–318 (1975)
Article Google Scholar
Luenberger, D.: Optimization by Vector Space Methods. Wiley-Interscience, New York (1997)
Google Scholar
Nakayama, S., Connors, R., Watling, D.: A method of estimating parameters on transportation equilibrium models: toward integration analysis on both demand and supply sides. In: Transportation Research Board Annual Meeting 2007 (2007)
Nevo, A.: Measuring market power in the ready-to-eat cereal industry. Econometrica 69(2), 307–342 (2001)
Article Google Scholar
Pang, J.: A posteriori error bounds for the linearly-constrained variational inequality problem. Math. Oper. Res. 12, 474–484 (1987)
Article MathSciNet Google Scholar
Perakis, G., Roels, G.: An analytical model for traffic delays and the dynamic user equilibrium problem. Oper. Res. 54(6), 1151 (2006)
Article MathSciNet MATH Google Scholar
Rust, J.: Structural estimation of markov decision processes. Handb. Econom. 4, 3081–3143 (1994)
Article MathSciNet Google Scholar
Smola, A., Schölkopf, B.: Learning with Kernels. MIT press, Cambridge (1998)
Google Scholar
Su, C.L., Judd, K.L.: Constrained optimization approaches to estimation of structural models. Econometrica 80(5), 2213–2230 (2012)
Article MathSciNet MATH Google Scholar
Trevor, H., Robert, T., Jerome, F.: The elements of statistical learning: data mining, inference and prediction. Springer, New York (2001)
Wahba, G.: Spline Models for Observational Data, vol. 59. Society for Industrial Mathematics (1990)
Yang, H., Sasaki, T., Iida, Y., Asakura, Y.: Estimation of origin–destination matrices from link traffic counts on congested networks. Transp. Res. B Methodol. 26(6), 417–434 (1992)
Article Google Scholar
Zhao, L., Dafermos, S.: General economic equilibrium and variational inequalities. Oper. Res. Lett. 10(7), 369–376 (1991)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

Research partially supported by the NSF under grant EFRI-0735974, CNS-1239021 and IIS-1237022, by the DOE under grant DE-FG52-06NA27490, by the ARO under grants W911NF-11-1-0227, and W911NF-12-1-0390, by the ONR under grant N00014-10-1-0952, and by Citigroup under a grant to the Sloan School of Management. We would also like to thank the two anonymous reviewers, the Associate editor and the Area editor for their helpful comments on a first draft of this paper.

Author information

Authors and Affiliations

MIT, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Dimitris Bertsimas
Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Vishal Gupta
Department of Electrical and Computer Engineering, Boston University, Boston, MA, 02215, USA
Ioannis Ch. Paschalidis

Authors

Dimitris Bertsimas
View author publications
You can also search for this author in PubMed Google Scholar
Vishal Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Ch. Paschalidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitris Bertsimas.

Appendices

Appendix 1: Omitted proofs

1.1 Proof of Theorem 4

Proof

Let $\mathbf {f}^* = (f^*_1, \ldots , f^*_n)$ be any solution. We will construct a new solution with potentially lower cost with the required representation. We do this iteratively beginning with $f^*_1$.

Consider the subspace $\mathcal {T} \subset {\mathcal {H}}_1$ defined by $\mathcal {T} = \text {span}(k_{\mathbf {x}_1}, \ldots , k_{\mathbf {x}_N})$, and let $\mathcal {T}^\bot $ be its orthogonal complement. It follows that $f^*_1$ decomposes uniquely into $f_1^* = f_0 + f_0^\bot $ with $f_0 \in \mathcal {T}$ and $f_0^\bot \in \mathcal {T}^\bot $. Consequently,

$$\begin{aligned} f^*_1(\mathbf {x}_j)&= \langle k_{\mathbf {x}_j}, f^*_1 \rangle ,&\text {[by (20)]}\\&=\langle k_{\mathbf {x}_j} , f_0 \rangle + \langle k_{\mathbf {x}_j}, f_0^\bot \rangle \\&=\langle k_{\mathbf {x}_j} , f_0 \rangle&\text {(since }f_0^\bot \in \mathcal {T}^\bot )\\&= f_0(\mathbf {x}_j)&\text {[by (20)]}. \end{aligned}$$

Thus, the solution $\mathbf {f} = (f_0, f^*_2, \ldots , f^*_n)$ is feasible to (22). Furthermore, by orthogonality $\Vert f^*_1 \Vert _{{\mathcal {H}}_1} = \Vert f_0 \Vert _{{\mathcal {H}}_1} + \Vert f_0^\bot \Vert _{{\mathcal {H}}_1} \ge \Vert f_0 \Vert _{{\mathcal {H}}_1}.$ Since the objective is non-decreasing in $\Vert f_1 \Vert _{\mathcal {H}}$, $\mathbf {f}$ has an objective value which is no worse than $\mathbf {f}^*$. We can now proceed iteratively, considering each coordinate in turn. After at most $n$ steps, we have constructed a solution with the required representation. $\square $

1.2 Proof of Theorem 5

Proof

Suppose Problem (24) is feasible and let ${\varvec{\alpha }}$ be a feasible solution. Define $\mathbf {f}$ via eq. (23). It is straightforward to check that $\mathbf {f}$ is feasible in Problem (22) with the same objective value.

On the other hand, let $\mathbf {f}$ be some feasible solution to Problem (22). By Theorem 4, there exists ${\varvec{\alpha }}$ such that $f_i( \mathbf {x}_j) = \mathbf {e}_i^T {\varvec{\alpha }}{\mathbf {K}}\mathbf {e}_j,$ and $\Vert f_i \Vert ^2_{\mathcal {H}}= \mathbf {e}_i^T{\varvec{\alpha }}{\mathbf {K}}{\varvec{\alpha }}^T \mathbf {e}_i.$ It straightforward to check that such ${\varvec{\alpha }}$ is feasible in Problem (24) and that they yield the same objective value. Thus, Problem (22) is feasible if and only if Problem (24) is feasible, and we can construct an optimal solution to Problem (22) from an optimal solution to Problem (24) via (23). $\square $

1.3 Proof of Theorem 6

Proof

As mentioned in the text, the key idea in the proof is to relate (12) with a randomized uncertain convex program. To this end, notice that if $z_N, {\varvec{\theta }}_N$ are an optimal solution to (12) with the $\ell _\infty $-norm, then $(z_N, {\varvec{\theta }}_N) \in \bigcap _{j=1}^N \mathcal {X}( \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)$ where

$$\begin{aligned} \mathcal {X}( \mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) = \left\{ z, {\varvec{\theta }}\in \Theta : \exists \mathbf {y}\in {\mathbb {R}}^m \text { s.t. } \mathbf {A}^T \mathbf {y}\le \mathbf {f}(\mathbf {x}, {\varvec{\theta }}), \ \ \mathbf {x}^T\mathbf {f}(\mathbf {x}, {\varvec{\theta }}) - {\mathbf {b}}^T\mathbf {y}\le z \right\} . \end{aligned}$$

The sets $\mathcal {X}( \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)$ are convex. Consider then the problem

$$\begin{aligned} \min \limits _{z \ge 0, {\varvec{\theta }}} \quad z \ \ \text {s.t.} \quad (z, {\varvec{\theta }}) \in \bigcap _{j=1}^N \mathcal {X}( \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j) . \end{aligned}$$

This is exactly of the form Eq. 2.1 in [18]. Applying Theorem 2.4 of that work shows that with probability $\beta (\alpha )$ with respect to the sampling, the “violation probability” of the pair $(z_N, \theta _N)$ is a most $\alpha $. In our context, the probability of violation is exactly the probability that $(\tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C})$ is not a $z_N$ approximate equilibria. This proves the theorem. $\square $

Observe that the original proof in [18] requires that the solution ${\varvec{\theta }}_N$ be unique almost surely. However, as mentioned on pg. 7 discussion point 5 of that text, it suffices to pick a tie-breaking rule for the ${\varvec{\theta }}_N$ in the case of multiple solutions. The tie-breaking rule discussed in the main text is one possible example.

1.4 Proof of Theorem 7

We require auxiliary results. Our treatment closely follows [7]. Let $\zeta _1, \ldots , \zeta _N$ be i.i.d. For any class of functions $\mathcal {S}$, define the empirical Rademacher complexity $\mathcal {R}_N(\mathcal {S})$ by

$$\begin{aligned} \mathcal {R}_N(\mathcal {S}) = {\mathbb {E}}\left[ \sup _{f \in \mathcal {S}} \frac{2}{N} \left| \left. \sum _{i=1}^N \sigma _i f(\zeta _i) \right| \right| \zeta _1, \ldots , \zeta _N \right] , \end{aligned}$$

where $\sigma _i$ are independent uniform $\{ \pm 1 \}$-valued random variables. Notice this quantity is random, because it depends on the data $\zeta _1, \ldots , \zeta _N$.

Our interest in Rademacher complexity stems from the following lemma.

Lemma 1

Let $\mathcal {S}$ be a class of functions whose range is contained in $[0, M]$. Then, for any $N$, and any $0 < \beta < 1$, with probability at least $1-\beta $ with respect to ${\mathbb {P}}$, every $f \in \mathcal {F}$ simultaneously satisfies

$$\begin{aligned} {\mathbb {E}}[ f( \zeta ) ] \le \frac{1}{N} \sum \limits _{i=1}^N f(\zeta _i) + \mathcal {R}_N(\mathcal {S}) + \sqrt{ \frac{8 M \log (2/\beta ) }{N}} \end{aligned}$$

(34)

Proof

The result follows by specializing Theorem 8 of [7]. Namely, using the notation of that work, let $\phi (y, a) = \mathcal {L}(y, a) = a / M$, $\delta = \beta $ and then apply the theorem. Multiply the resulting inequality by $M$ and use Theorem 12, part 3 of the same work to conclude that $M \mathcal {R}_N( M^{-1} \mathcal {S} ) = \mathcal {R}_N(\mathcal {S})$ to finish the proof. $\square $

Remark 10

The constants in the above lemma are not tight. Indeed, modifying the proof of Theorem 8 in [7] to exclude the centering of $\phi $ to $\tilde{\phi }$, one can reduce the constant 8 in the above bound to 2. For simplicity in what follows, we will not be concerned with improvements at constant order.

Remark 11

Lemma 1 relates the empirical expectation of a function to its true expectation. If $f \in \mathcal {S}$ were fixed a priori, stronger statements can be proven more simply by invoking the weak law of large numbers. The importance of Lemma 1 is that it asserts the inequality holds uniformly for all $f \in \mathcal {S}$. This is important since in what follows, we will be identifying the relevant function $f$ by an optimization, and hence it will not be known to us a priori, but will instead depend on the data.

Our goal is to use Lemma 1 to bound the ${\mathbb {E}}[\epsilon (\mathbf {f}_N, \tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C})]$. To do so, we must compute an upper-bound on the Rademacher complexity of a suitable class of functions. As a preliminary step,

Lemma 2

For any $\mathbf {f}$ which is feasible in (12) or (28), we have

$$\begin{aligned} \tilde{\epsilon }(\mathbf {f}, \tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C}) \le \overline{B} \quad \text { a.s. } \end{aligned}$$

(35)

Proof

Using strong duality as in Theorem 2,

$$\begin{aligned} \tilde{\epsilon }(\mathbf {f}, \tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C}) = \max _{\mathbf {x}\in \tilde{\mathcal {F}}} (\tilde{\mathbf {x}} - \mathbf {x})^T \mathbf {f}( \tilde{\mathbf {x}} ) \le 2R \sup _{\tilde{\mathbf {x}} \in \tilde{\mathcal {F}}} \Vert \mathbf {f}(\tilde{\mathbf {x}}) \Vert _2, \end{aligned}$$

(36)

by A6. For Problem (12), the result follows from the definition of $\overline{B}$. For Problem (28), observe that for any $\tilde{\mathbf {x}} \in \tilde{\mathcal {F}}$,

$$\begin{aligned} | f_i (\tilde{\mathbf {x}}) |^2 = \langle f_i, k_{\tilde{\mathbf {x}}} \rangle ^2 \le \Vert f_i \Vert _{\mathcal {H}}^2 \sup _{\Vert \mathbf {x}\Vert _2 \le R } k(\mathbf {x}, \mathbf {x}) = \Vert f_i \Vert _{\mathcal {H}}^2 \overline{K}^2\le \kappa _i^2 \overline{K}^2, \end{aligned}$$

(37)

where the middle inequality follows from Cauchy–Schwartz. Plugging this into Eq. (36) and using the definition of $\overline{B}$ yields the result.

Now consider the class of functions

$$\begin{aligned} F = {\left\{ \begin{array}{ll} \Big \{ (\mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) \mapsto \epsilon (\mathbf {f}, \mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) : \mathbf {f} = \mathbf {f}(\cdot , {\varvec{\theta }}), {\varvec{\theta }}\in \Theta \Big \} \quad \text {for Problem~(12)}\\ \Big \{ (\mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) \mapsto \epsilon (\mathbf {f}, \mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) : f_i \in {\mathcal {H}}, \ \ \Vert f_i \Vert _{\mathcal {H}}\le \kappa _i \ \ i=1, \ldots , N \Big \}\\ \quad \text {for Problem~(28). } \end{array}\right. } \end{aligned}$$

Lemma 3

$$\begin{aligned} \mathcal {R}_N(F) \le \frac{2\overline{B}}{\sqrt{N}} \end{aligned}$$

Proof

We prove the lemma for Problem (12). The proof in the other case is identical. Let $\mathcal {S} = \{ \mathbf {f}( \cdot , {\varvec{\theta }}) : {\varvec{\theta }}\in \Theta \}$. Then,

$$\begin{aligned} \mathcal {R}_N(F)&= \frac{2}{N} {\mathbb {E}}\left[ \sup _{f \in \mathcal {S}}\left. \left| \sum \limits _{j=1}^N \sigma _j \epsilon (\mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j) \right| \right| (\mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)_{j=1}^N \right] \\&\le \frac{2 \overline{B}}{N} {\mathbb {E}}\left[ \left( \sum \limits _{j=1}^N \sigma _j^2 \right) ^{\frac{1}{2}}\right] \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad [\text {using (35)}]\\&\le \frac{2 \overline{B}}{N} \sqrt{ {\mathbb {E}}\left[ \sum \limits _{j=1}^N \sigma _j^2 \right] }\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad (\text {Jensen's inequality})\\&=\frac{2 \overline{B}}{\sqrt{ N }}\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad ( \sigma _j^2 = 1\text { a.s.}). \end{aligned}$$

$\square $

We are now in a position to prove the theorem.

Proof

(Theorem 7) Observe that $z_N = \frac{1}{N} \sum \nolimits _{j=1}^N (\epsilon ( \mathbf {f}_N, \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j))^p$. Next, the function $\phi (z) = z^p$ satisfies $\phi (0) = 0$ and is Lipschitz with constant $L_\phi = p \overline{B}^{p-1}$ on the interval $[0, \overline{B} ]$. Consequently, from Theorem 12 part 4 of [7],

$$\begin{aligned} \mathcal {R}_N( \phi \circ F )&\le 2 L_\phi \mathcal {R}_N(F)\\&\le 2p \overline{B}^{p-1} \frac{2\overline{B}}{\sqrt{N}}\\&= \frac{4p \overline{B}^p }{\sqrt{N}}. \end{aligned}$$

Now applying Lemma 1 with $\zeta \rightarrow (\tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C})$, $f(\cdot ) \rightarrow \epsilon ( \cdot )^p$, and $M = \overline{B}^p$ yields the first part of the theorem.

For the second part of the theorem, observe that, conditional on the sample, the event $\tilde{\mathbf {x}}$ is not a $z_N + \alpha $-approximate equilibrium is equivalent to the event that $\epsilon _N > z_N + \alpha $. Now use Markov’s inequality and apply the first part of the theorem. $\square $

1.5 Proof of Theorem 8

Proof

Consider the first part of the theorem.

By construction, $\hat{\mathbf {x}}$ solves ${{\mathrm{\text {VI}}}}(\mathbf {f}(\cdot , \theta _N), \mathbf {A}_{N+1}, {\mathbf {b}}_{N+1}, C_{N+1})$. The theorem, then, claims that $\mathbf {x}_{N+1}$ is $\delta ^\prime \equiv \sqrt{\frac{z_N}{\gamma }}$ near a solution to this VI. From Theorem 1, if $\mathbf {x}_{N+1}$ were not $\delta ^\prime $ near a solution, then it must be the case that $\epsilon ( \mathbf {f}(\cdot , \theta _N), \mathbf {x}_{N+1}, \mathbf {A}_{N+1}, {\mathbf {b}}_{N+1}, C_{N+1}) > z_N$. By Theorem 6, this happens only with probability $\beta (\alpha )$.

The second part is similar to the first with Theorem 6 replaced by Theorem 7. $\square $

Appendix 2: Casting structural estimation as an inverse variational inequality

In the spirit of structural estimation, assume there exists a true ${\varvec{\theta }}^* \in \Theta $ that generates solutions $\mathbf {x}_j^*$ to ${{\mathrm{\text {VI}}}}(\mathbf {f}(\cdot , \theta ^*), \mathbf {A}^*_j, {\mathbf {b}}^*_j, C^*_j)$. We observe $(\mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)$ which are noisy versions of these true parameters. We additionally are given a precise mechanism for the noise, e.g., that

$$\begin{aligned} \mathbf {x}_j = \mathbf {x}^*_j + \Delta \mathbf {x}_j, \quad \mathbf {A}_j = \mathbf {A}^*_j + \Delta \mathbf {A}_j, \quad {\mathbf {b}}_j = {\mathbf {b}}^*_j + \Delta {\mathbf {b}}_j, \quad C_j = C_j^*, \end{aligned}$$

where $(\Delta \mathbf {x}_j, \Delta \mathbf {A}_j, \Delta {\mathbf {b}}_j)$ are i.i.d. realizations of a random vector $(\tilde{\Delta \mathbf {x}}, \tilde{\Delta \mathbf {A}}, \tilde{\Delta {\mathbf {b}}})$ and $\tilde{\Delta \mathbf {x}}, \tilde{\Delta \mathbf {A}}, \tilde{\Delta {\mathbf {b}}}$ are mutually uncorrelated.

We use Theorem 2 to estimate ${\varvec{\theta }}$ under these assumptions by solving

$$\begin{aligned} \min \limits _{\mathbf {y}\ge \mathbf {0}, {\varvec{\theta }}\in \Theta , \Delta \mathbf {x}, \Delta \mathbf {A}, \Delta {\mathbf {b}}} \quad&\left\| \begin{pmatrix} \tilde{\Delta \mathbf {x}_j} \\ \tilde{\Delta \mathbf {A}_k} \\ \tilde{\Delta {\mathbf {b}}_j} \end{pmatrix}_{j=1, \ldots , N} \right\| \nonumber \\ \text {s.t.} \quad&(\mathbf {A}_j - \Delta \mathbf {A}_j)^T \mathbf {y}_j \le _{C_j} \mathbf {f}(\mathbf {x}_j - \Delta \mathbf {x}_j, {\varvec{\theta }}), \ j=1, \ldots , N,\nonumber \\&(\mathbf {x}_j - \Delta \mathbf {x}_j)^T \mathbf {f}(\mathbf {x}_j - \Delta \mathbf {x}_j, {\varvec{\theta }}) = {\mathbf {b}}_j^T \mathbf {y}_j, j =1, \ldots , N, \end{aligned}$$

(38)

where $\Vert \cdot \Vert $ refers to some norm. Notice this formulation also supports the case where potentially some of the components of $\mathbf {x}$ are unobserved; simply replace them as optimization variables in the above. In words, this formulation assumes that the “de-noised” data constitute a perfect equilibrium with respect to the fitted ${\varvec{\theta }}$.

We next claim that if we assume all equilibria occur on the strict interior of the feasible region, Problem (38) is equivalent to a least-squares approximate solution to the equations $\mathbf {f}(\mathbf {x}^*) = \mathbf {0}$. Specifically, when $\mathbf {x}^*$ occurs on the interior of $\mathcal {F}$, the VI condition Eq. (1) is equivalent to the equations $\mathbf {f}(\mathbf {x}^*) = \mathbf {0}.$ At the same time, by Theorem 2, Eq. (1) is equivalent to the system (8, 9) with $\epsilon = 0$ which motivated the constraints in Problem (38). Thus, Problem (38) is equivalent to finding a minimal (with respect to the given norm) perturbation which satisfies the structural equations.

We can relate this weighted least-squares problem to some structural estimation techniques. Indeed, [20] and [37] observed that many structural estimation techniques can be reinterpreted as a constrained optimization problem which minimizes the size of the perturbation necessary to make the observed data satisfy the structural equations, and, additionally, satisfy constraints motivated by orthogonality conditions and the generalized method of moments (GMM). In light of our previous comments, if we augment Problem (38) with the same orthogonality constraints, and all equilibria occur on the strict interior of the feasible region, the solutions to this problem will coincide traditional estimators.

Of course, some structural estimation techniques incorporate even more sophisticated adaptations. They may also pre-process the data (e.g., 2 stage least squares technique in econometrics) incorporate additional constraints (e.g. orthogonality of instruments approach), or tune the choice of norm in the least-squares computation (two-stage GMM estimation). These application-specific adaptations improve the statistical properties of the estimator given certain assumptions about the data generating process. What we would like to stress is that, provided we make the same adaptations to Problem (38)—i.e., preprocess the data, incorporate orthogonality of instruments, and tune the choice of norm—and provided that all equilibria occur on the interior, the solution to Problem (38) must coincide exactly with these techniques. Thus, they necessarily inherit all of the same statistical properties.

Recasting (at least some) structural estimation techniques in our framework facilitates a number of comparisons to our proposed approach based on Problem (12). First, it is clear how our perspective on data alters the formulation. Problem (38) seeks minimal perturbations so that the observed data are exact equilibria with respect to ${\varvec{\theta }}$, while Problem (12) seeks a ${\varvec{\theta }}$ that makes the observed data approximate equilibria and minimizes the size of the approximation. Secondly, the complexity of the proposed optimization problems differs greatly. The complexity of Problem (38) depends on the dependence of $\mathbf {f}$ on $\mathbf {x}$ and ${\varvec{\theta }}$ (as opposed to just ${\varvec{\theta }}$ for (12)), and there are unavoidable non-convex, bilinear terms like $\Delta \mathbf {A}_j^T \mathbf {y}_j$. These terms are well-known to cause difficulties for numerical solvers. Thus, we expect that solving this optimization to be significantly more difficult than solving Problem (12). Finally, as we will see in the next section, Problem (12) generalizes naturally to a nonparametric setting.

Appendix 3: Omitted formulations

1.1 Formulation from Sect. 8.1

Let $\xi ^{med}$ be the median value of $\xi $ over the dataset. Breaking ties arbitrarily, $\xi ^{med}$ occurs for some observation $j = j^{med}$. Let $p_1^{med}, p_2^{med}, \xi ^{med}_1, \xi ^{med}_2$ be the corresponding prices and demand shocks at time $j^{med}$. (Recall that in this section $\xi = \xi _1 = \xi _2$.) These definitiosn make precise what we mean in the main text by “fixing other variables to the median observation. Denote by $\underline{p}_1, \underline{p}_2$ the minimum prices observed over the data set.

Our parametric formation in Sect. 8.1 is

$$\begin{aligned}&\min \limits _{\mathbf {y}, \varvec{\epsilon }, {\varvec{\theta }}_1, {\varvec{\theta }}_2 } \quad \Vert \varvec{\epsilon }\Vert _\infty \end{aligned}$$

(39a)

$$\begin{aligned}&\text {s.t.} \quad \mathbf {y}^j \ge \mathbf {0}, \quad j = 1, \ldots , N,\nonumber \\&y_i^j \ge M_i(p_1^j, p_2^j, \xi ^j; {\varvec{\theta }}_i), \quad i = 1, 2, \ j = 1, \ldots , N,\nonumber \\&\quad \sum \limits _{i=1}^2\overline{p}^j y_i^j -(p_i^j) M_i(p_1^j, p_2^j, \xi ^j; {\varvec{\theta }}_i) \le \epsilon _j, \quad j = 1, \ldots , N,\nonumber \\&\quad M_1(p_1^j, p_2^{med}, \xi ^{med}; {\varvec{\theta }}_1) \!\ge \! M_1(p_1^k, p_2^{med}, \xi ^{med}; {\varvec{\theta }}_1), \; \forall 1 \!\le \! j,k \!\le \! N \text { s.t. } p_1^j \le p_1^k,\end{aligned}$$

(39b)

$$\begin{aligned}&\quad M_2(p_1^{med}, p_2^j, \xi ^{med}; {\varvec{\theta }}_2) \!\ge \! M_2(p_1^{med}, p_2^k, \xi ^{med}; {\varvec{\theta }}_2), \; \forall 1 \!\le \! j,k \le N \text { s.t. } p_2^j \le p_2^k,\end{aligned}$$

(39c)

$$\begin{aligned}&\quad M_1(\underline{p}_1, p_2^{med}, \xi ^{med}; {\varvec{\theta }}_1) = M^*_1(\underline{p}_1, p_2^{med}, \xi ^{med}_1; {\varvec{\theta }}^*_1)\end{aligned}$$

(39d)

$$\begin{aligned}&\quad M_2(p_1^{med}, \underline{p}_2, \xi ^{med}; {\varvec{\theta }}_2) = M^*_2(p_1^{med}, \underline{p}_2, \xi ^{med}_2 ; {\varvec{\theta }}^*_2) \end{aligned}$$

(39e)

Here $M_1$ and $M_2$ are given by Eq. (32). Notice, for this choice, the optimization is a linear optimization problem.

Equations (39b) and (39c) constrain the fitted function to be non-decreasing in the firm’s own price. Equations (39d) and (39e) are normalization conditions. We have chosen to normalize the functions to be equal to the true functions at this one point to make the visual comparisons easier. In principle, any suitable normalization can be used.

Our nonparametric formulation is similar to the above, but we replace

The parametric $M_1(\cdot , {\varvec{\theta }}_1), M_2(\cdot , {\varvec{\theta }}_2)$ with nonparametric $M_1(\cdot ), M_2(\cdot ) \in \mathcal {H}$
The objective by $\Vert \varvec{\epsilon }\Vert _1 + \lambda ( \Vert M_1 \Vert _{\mathcal {H}}+ \Vert M_2 \Vert _{\mathcal {H}})$.

By Theorem 4 and the discussion in Sect. 6, we can rewrite this optimization as a convex quadratic program.

1.2 Formulation from Sect. 8.2

Our parametric formulation is nearly identical to the parametric formulation in Appendix “Formulation from Sect. 8.1”, with the following changes:

Replace Eq. (39a) by $\Vert \varvec{\epsilon }\Vert _\infty + \lambda ( \Vert {\varvec{\theta }}_1 \Vert _1 + \Vert {\varvec{\theta }}_2\Vert _1)$
Replace the definition of $M_1, M_2$ by Eq. (33).

Our nonparametric formulation is identical to the nonparametric formulation of the previous section.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bertsimas, D., Gupta, V. & Paschalidis, I.C. Data-driven estimation in equilibrium using inverse optimization. Math. Program. 153, 595–633 (2015). https://doi.org/10.1007/s10107-014-0819-4

Download citation

Received: 23 November 2012
Accepted: 12 September 2014
Published: 30 September 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10107-014-0819-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-driven estimation in equilibrium using inverse optimization

Abstract

Access this article

Similar content being viewed by others

Universal Method of Searching for Equilibria and Stochastic Equilibria in Transportation Networks

Approximate variational inequalities and equilibria

Network Equilibrium Models for Urban Transport

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Omitted proofs

1.1 Proof of Theorem 4

Proof

1.2 Proof of Theorem 5

Proof

1.3 Proof of Theorem 6

Proof

1.4 Proof of Theorem 7

Lemma 1

Proof

Remark 10

Remark 11

Lemma 2

Proof

Lemma 3

Proof

Proof

1.5 Proof of Theorem 8

Proof

Appendix 2: Casting structural estimation as an inverse variational inequality

Appendix 3: Omitted formulations

1.1 Formulation from Sect. 8.1

1.2 Formulation from Sect. 8.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation