Skip to main content
Log in

Data-driven estimation in equilibrium using inverse optimization

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Equilibrium modeling is common in a variety of fields such as game theory and transportation science. The inputs for these models, however, are often difficult to estimate, while their outputs, i.e., the equilibria they are meant to describe, are often directly observable. By combining ideas from inverse optimization with the theory of variational inequalities, we develop an efficient, data-driven technique for estimating the parameters of these models from observed equilibria. We use this technique to estimate the utility functions of players in a game from their observed actions and to estimate the congestion function on a road network from traffic count data. A distinguishing feature of our approach is that it supports both parametric and nonparametric estimation by leveraging ideas from statistical learning (kernel methods and regularization operators). In computational experiments involving Nash and Wardrop equilibria in a nonparametric setting, we find that a) we effectively estimate the unknown demand or congestion function, respectively, and b) our proposed regularization technique substantially improves the out-of-sample performance of our estimators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. For the avoidance of doubt, the closure in (19) is with respect to the norm \(\Vert \cdot \Vert _{{\mathcal {H}}_0}\).

  2. In fact, the authors show more: they give an algorithm leveraging \(\ell _1\) regularization to reduce the dimensionality of \(\theta \) and then an improved bound based on the reduced dimension. The analysis of this improved bound can be adapted to our current context at the expense of more notation. We omit the details for space.

References

  1. Abrahamsson, T.: Estimation of origin–destination matrices using traffic counts—a literature survey. www.iiasa.ac.at/Admin/PUB/Documents/IR-98-021.pdf. (1998)

  2. Aghassi, M., Bertsimas, D., Perakis, G.: Solving asymmetric variational inequalities via convex optimization. Oper. Res. Lett. 34(5), 481–490 (2006). doi:10.1016/j.orl.2005.09.006. http://www.sciencedirect.com/science/article/pii/S0167637705001124

  3. Ahuja, R., Orlin, J.: Inverse optimization. Oper. Res. 49(5), 771–783 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  4. Allon, G., Federgruen, A., Pierson, M.: How much is a reduction of your customers’ wait worth? An empirical study of the fast-food drive-thru industry based on structural estimation methods. Manuf. Serv. Oper. Manag. 13(4), 489–507 (2011)

    Google Scholar 

  5. Bajari, P., Benkard, C., Levin, J.: Estimating dynamic models of imperfect competition. Econometrica 75(5), 1331–1370 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bar-Gera, H.: Transportation test problems. http://www.bgu.ac.il/bargera/tntp/ (2011). Accessed Nov 2011

  7. Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2003)

    MathSciNet  MATH  Google Scholar 

  8. Berry, S., Haile, P.: Nonparametric identification of multinomial choice demand models with heterogeneous consumers. Technical report, National Bureau of Economic Research (2009)

  9. Berry, S., Levinsohn, J., Pakes, A.: Automobile prices in market equilibrium. Econom. J. Econom. Soc. 63(4), 841–890 (1995)

  10. Berry, S.T.: Estimating discrete-choice models of product differentiation. RAND J. Econ. 25(2), 242–262 (1994)

  11. Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont, Massachusetts (1999)

    MATH  Google Scholar 

  12. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  13. Braess, D., Nagurney, A., Wakolbinger, T.: On a paradox of traffic planning. Transp. Sci. 39(4), 446–450 (2005). doi:10.1287/trsc.1050.0127. http://transci.journal.informs.org/content/39/4/446.abstract

  14. Branston, D.: Link capacity functions: a review. Transp. Res. 10(4), 223–236 (1976)

    Article  Google Scholar 

  15. Breiman, L., et al.: Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16(3), 199–231 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  16. Bureau of Public Roads: Traffic assignment manual. US Department of Commerce, Urban Planning Division (1964)

  17. Campi, M., Carè, A.: Random convex programs with \({\rm l}\_1\)-regularization: sparsity and generalization. SIAM J. Control Optim. 51(5), 3532–3557 (2013)

  18. Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncertain convex programs. SIAM J. Optim. 19(3), 1211–1230 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  19. Dafermos, S., Nagurney, A.: A network formulation of market equilibrium problems and variational inequalities. Oper. Res. Lett. 3(5), 247–250 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  20. Dubé, J.P., Fox, J.T., Su, C.L.: Improving the numerical performance of static and dynamic aggregate discrete choice random coefficients demand estimation. Econometrica 80(5), 2231–2267 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  21. Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  22. Fudenberg, D., Tirole, J.: Game Theory. MIT Press, Cambridge (1991)

    Google Scholar 

  23. Gallego, G., Huh, W., Kang, W., Phillips, R.: Price competition with the attraction demand model: existence of unique equilibrium and its stability. Manuf. Serv. Oper. Manag. 8(4), 359–375 (2006)

    Google Scholar 

  24. Girosi, F., Jones, M., Poggio, T.: Priors stabilizers and basis functions: from regularization to radial, tensor and additive splines. http://dspace.mit.edu/handle/1721.1/7212 (1993)

  25. Guerre, E., Perrigne, I., Vuong, Q.: Optimal nonparametric estimation of first-price auctions. Econometrica 68(3), 525–574 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  26. Harker, P., Pang, J.: Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications. Math. Program. 48(1), 161–220 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  27. Heuberger, C.: Inverse combinatorial optimization: a survey on problems, methods, and results. J. Comb. Optim. 8(3), 329–361 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  28. Iyengar, G., Kang, W.: Inverse conic programming with applications. Oper. Res. Lett. 33(3), 319 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  29. LeBlanc, L., Morlok, E., Pierskalla, W.: An efficient approach to solving the road network equilibrium traffic assignment problem. Transp. Res. 9(5), 309–318 (1975)

    Article  Google Scholar 

  30. Luenberger, D.: Optimization by Vector Space Methods. Wiley-Interscience, New York (1997)

    Google Scholar 

  31. Nakayama, S., Connors, R., Watling, D.: A method of estimating parameters on transportation equilibrium models: toward integration analysis on both demand and supply sides. In: Transportation Research Board Annual Meeting 2007 (2007)

  32. Nevo, A.: Measuring market power in the ready-to-eat cereal industry. Econometrica 69(2), 307–342 (2001)

    Article  Google Scholar 

  33. Pang, J.: A posteriori error bounds for the linearly-constrained variational inequality problem. Math. Oper. Res. 12, 474–484 (1987)

    Article  MathSciNet  Google Scholar 

  34. Perakis, G., Roels, G.: An analytical model for traffic delays and the dynamic user equilibrium problem. Oper. Res. 54(6), 1151 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  35. Rust, J.: Structural estimation of markov decision processes. Handb. Econom. 4, 3081–3143 (1994)

    Article  MathSciNet  Google Scholar 

  36. Smola, A., Schölkopf, B.: Learning with Kernels. MIT press, Cambridge (1998)

    Google Scholar 

  37. Su, C.L., Judd, K.L.: Constrained optimization approaches to estimation of structural models. Econometrica 80(5), 2213–2230 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  38. Trevor, H., Robert, T., Jerome, F.: The elements of statistical learning: data mining, inference and prediction. Springer, New York (2001)

  39. Wahba, G.: Spline Models for Observational Data, vol. 59. Society for Industrial Mathematics (1990)

  40. Yang, H., Sasaki, T., Iida, Y., Asakura, Y.: Estimation of origin–destination matrices from link traffic counts on congested networks. Transp. Res. B Methodol. 26(6), 417–434 (1992)

    Article  Google Scholar 

  41. Zhao, L., Dafermos, S.: General economic equilibrium and variational inequalities. Oper. Res. Lett. 10(7), 369–376 (1991)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Research partially supported by the NSF under grant EFRI-0735974, CNS-1239021 and IIS-1237022, by the DOE under grant DE-FG52-06NA27490, by the ARO under grants W911NF-11-1-0227, and W911NF-12-1-0390, by the ONR under grant N00014-10-1-0952, and by Citigroup under a grant to the Sloan School of Management. We would also like to thank the two anonymous reviewers, the Associate editor and the Area editor for their helpful comments on a first draft of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitris Bertsimas.

Appendices

Appendix 1: Omitted proofs

1.1 Proof of Theorem 4

Proof

Let \(\mathbf {f}^* = (f^*_1, \ldots , f^*_n)\) be any solution. We will construct a new solution with potentially lower cost with the required representation. We do this iteratively beginning with \(f^*_1\).

Consider the subspace \(\mathcal {T} \subset {\mathcal {H}}_1\) defined by \(\mathcal {T} = \text {span}(k_{\mathbf {x}_1}, \ldots , k_{\mathbf {x}_N})\), and let \(\mathcal {T}^\bot \) be its orthogonal complement. It follows that \(f^*_1\) decomposes uniquely into \(f_1^* = f_0 + f_0^\bot \) with \(f_0 \in \mathcal {T}\) and \(f_0^\bot \in \mathcal {T}^\bot \). Consequently,

$$\begin{aligned} f^*_1(\mathbf {x}_j)&= \langle k_{\mathbf {x}_j}, f^*_1 \rangle ,&\text {[by (20)]}\\&=\langle k_{\mathbf {x}_j} , f_0 \rangle + \langle k_{\mathbf {x}_j}, f_0^\bot \rangle \\&=\langle k_{\mathbf {x}_j} , f_0 \rangle&\text {(since }f_0^\bot \in \mathcal {T}^\bot )\\&= f_0(\mathbf {x}_j)&\text {[by (20)]}. \end{aligned}$$

Thus, the solution \(\mathbf {f} = (f_0, f^*_2, \ldots , f^*_n)\) is feasible to (22). Furthermore, by orthogonality \(\Vert f^*_1 \Vert _{{\mathcal {H}}_1} = \Vert f_0 \Vert _{{\mathcal {H}}_1} + \Vert f_0^\bot \Vert _{{\mathcal {H}}_1} \ge \Vert f_0 \Vert _{{\mathcal {H}}_1}.\) Since the objective is non-decreasing in \(\Vert f_1 \Vert _{\mathcal {H}}\), \(\mathbf {f}\) has an objective value which is no worse than \(\mathbf {f}^*\). We can now proceed iteratively, considering each coordinate in turn. After at most \(n\) steps, we have constructed a solution with the required representation. \(\square \)

1.2 Proof of Theorem 5

Proof

Suppose Problem (24) is feasible and let \({\varvec{\alpha }}\) be a feasible solution. Define \(\mathbf {f}\) via eq. (23). It is straightforward to check that \(\mathbf {f}\) is feasible in Problem (22) with the same objective value.

On the other hand, let \(\mathbf {f}\) be some feasible solution to Problem (22). By Theorem 4, there exists \({\varvec{\alpha }}\) such that \(f_i( \mathbf {x}_j) = \mathbf {e}_i^T {\varvec{\alpha }}{\mathbf {K}}\mathbf {e}_j,\) and \(\Vert f_i \Vert ^2_{\mathcal {H}}= \mathbf {e}_i^T{\varvec{\alpha }}{\mathbf {K}}{\varvec{\alpha }}^T \mathbf {e}_i.\) It straightforward to check that such \({\varvec{\alpha }}\) is feasible in Problem (24) and that they yield the same objective value. Thus, Problem (22) is feasible if and only if Problem (24) is feasible, and we can construct an optimal solution to Problem (22) from an optimal solution to Problem (24) via (23). \(\square \)

1.3 Proof of Theorem 6

Proof

As mentioned in the text, the key idea in the proof is to relate (12) with a randomized uncertain convex program. To this end, notice that if \(z_N, {\varvec{\theta }}_N\) are an optimal solution to (12) with the \(\ell _\infty \)-norm, then \((z_N, {\varvec{\theta }}_N) \in \bigcap _{j=1}^N \mathcal {X}( \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)\) where

$$\begin{aligned} \mathcal {X}( \mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) = \left\{ z, {\varvec{\theta }}\in \Theta : \exists \mathbf {y}\in {\mathbb {R}}^m \text { s.t. } \mathbf {A}^T \mathbf {y}\le \mathbf {f}(\mathbf {x}, {\varvec{\theta }}), \ \ \mathbf {x}^T\mathbf {f}(\mathbf {x}, {\varvec{\theta }}) - {\mathbf {b}}^T\mathbf {y}\le z \right\} . \end{aligned}$$

The sets \(\mathcal {X}( \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)\) are convex. Consider then the problem

$$\begin{aligned} \min \limits _{z \ge 0, {\varvec{\theta }}} \quad z \ \ \text {s.t.} \quad (z, {\varvec{\theta }}) \in \bigcap _{j=1}^N \mathcal {X}( \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j) . \end{aligned}$$

This is exactly of the form Eq. 2.1 in [18]. Applying Theorem 2.4 of that work shows that with probability \(\beta (\alpha )\) with respect to the sampling, the “violation probability” of the pair \((z_N, \theta _N)\) is a most \(\alpha \). In our context, the probability of violation is exactly the probability that \((\tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C})\) is not a \(z_N\) approximate equilibria. This proves the theorem. \(\square \)

Observe that the original proof in [18] requires that the solution \({\varvec{\theta }}_N\) be unique almost surely. However, as mentioned on pg. 7 discussion point 5 of that text, it suffices to pick a tie-breaking rule for the \({\varvec{\theta }}_N\) in the case of multiple solutions. The tie-breaking rule discussed in the main text is one possible example.

1.4 Proof of Theorem 7

We require auxiliary results. Our treatment closely follows [7]. Let \(\zeta _1, \ldots , \zeta _N\) be i.i.d. For any class of functions \(\mathcal {S}\), define the empirical Rademacher complexity \(\mathcal {R}_N(\mathcal {S})\) by

$$\begin{aligned} \mathcal {R}_N(\mathcal {S}) = {\mathbb {E}}\left[ \sup _{f \in \mathcal {S}} \frac{2}{N} \left| \left. \sum _{i=1}^N \sigma _i f(\zeta _i) \right| \right| \zeta _1, \ldots , \zeta _N \right] , \end{aligned}$$

where \(\sigma _i\) are independent uniform \(\{ \pm 1 \}\)-valued random variables. Notice this quantity is random, because it depends on the data \(\zeta _1, \ldots , \zeta _N\).

Our interest in Rademacher complexity stems from the following lemma.

Lemma 1

Let \(\mathcal {S}\) be a class of functions whose range is contained in \([0, M]\). Then, for any \(N\), and any \(0 < \beta < 1\), with probability at least \(1-\beta \) with respect to \({\mathbb {P}}\), every \(f \in \mathcal {F}\) simultaneously satisfies

$$\begin{aligned} {\mathbb {E}}[ f( \zeta ) ] \le \frac{1}{N} \sum \limits _{i=1}^N f(\zeta _i) + \mathcal {R}_N(\mathcal {S}) + \sqrt{ \frac{8 M \log (2/\beta ) }{N}} \end{aligned}$$
(34)

Proof

The result follows by specializing Theorem 8 of [7]. Namely, using the notation of that work, let \(\phi (y, a) = \mathcal {L}(y, a) = a / M\), \(\delta = \beta \) and then apply the theorem. Multiply the resulting inequality by \(M\) and use Theorem 12, part 3 of the same work to conclude that \(M \mathcal {R}_N( M^{-1} \mathcal {S} ) = \mathcal {R}_N(\mathcal {S})\) to finish the proof. \(\square \)

Remark 10

The constants in the above lemma are not tight. Indeed, modifying the proof of Theorem 8 in [7] to exclude the centering of \(\phi \) to \(\tilde{\phi }\), one can reduce the constant 8 in the above bound to 2. For simplicity in what follows, we will not be concerned with improvements at constant order.

Remark 11

Lemma 1 relates the empirical expectation of a function to its true expectation. If \(f \in \mathcal {S}\) were fixed a priori, stronger statements can be proven more simply by invoking the weak law of large numbers. The importance of Lemma 1 is that it asserts the inequality holds uniformly for all \(f \in \mathcal {S}\). This is important since in what follows, we will be identifying the relevant function \(f\) by an optimization, and hence it will not be known to us a priori, but will instead depend on the data.

Our goal is to use Lemma 1 to bound the \({\mathbb {E}}[\epsilon (\mathbf {f}_N, \tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C})]\). To do so, we must compute an upper-bound on the Rademacher complexity of a suitable class of functions. As a preliminary step,

Lemma 2

For any \(\mathbf {f}\) which is feasible in (12) or (28), we have

$$\begin{aligned} \tilde{\epsilon }(\mathbf {f}, \tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C}) \le \overline{B} \quad \text { a.s. } \end{aligned}$$
(35)

Proof

Using strong duality as in Theorem 2,

$$\begin{aligned} \tilde{\epsilon }(\mathbf {f}, \tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C}) = \max _{\mathbf {x}\in \tilde{\mathcal {F}}} (\tilde{\mathbf {x}} - \mathbf {x})^T \mathbf {f}( \tilde{\mathbf {x}} ) \le 2R \sup _{\tilde{\mathbf {x}} \in \tilde{\mathcal {F}}} \Vert \mathbf {f}(\tilde{\mathbf {x}}) \Vert _2, \end{aligned}$$
(36)

by A6. For Problem (12), the result follows from the definition of \(\overline{B}\). For Problem (28), observe that for any \(\tilde{\mathbf {x}} \in \tilde{\mathcal {F}}\),

$$\begin{aligned} | f_i (\tilde{\mathbf {x}}) |^2 = \langle f_i, k_{\tilde{\mathbf {x}}} \rangle ^2 \le \Vert f_i \Vert _{\mathcal {H}}^2 \sup _{\Vert \mathbf {x}\Vert _2 \le R } k(\mathbf {x}, \mathbf {x}) = \Vert f_i \Vert _{\mathcal {H}}^2 \overline{K}^2\le \kappa _i^2 \overline{K}^2, \end{aligned}$$
(37)

where the middle inequality follows from Cauchy–Schwartz. Plugging this into Eq. (36) and using the definition of \(\overline{B}\) yields the result.

Now consider the class of functions

$$\begin{aligned} F = {\left\{ \begin{array}{ll} \Big \{ (\mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) \mapsto \epsilon (\mathbf {f}, \mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) : \mathbf {f} = \mathbf {f}(\cdot , {\varvec{\theta }}), {\varvec{\theta }}\in \Theta \Big \} \quad \text {for Problem~(12)}\\ \Big \{ (\mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) \mapsto \epsilon (\mathbf {f}, \mathbf {x}, \mathbf {A}, {\mathbf {b}}, C) : f_i \in {\mathcal {H}}, \ \ \Vert f_i \Vert _{\mathcal {H}}\le \kappa _i \ \ i=1, \ldots , N \Big \}\\ \quad \text {for Problem~(28). } \end{array}\right. } \end{aligned}$$

Lemma 3

$$\begin{aligned} \mathcal {R}_N(F) \le \frac{2\overline{B}}{\sqrt{N}} \end{aligned}$$

Proof

We prove the lemma for Problem (12). The proof in the other case is identical. Let \(\mathcal {S} = \{ \mathbf {f}( \cdot , {\varvec{\theta }}) : {\varvec{\theta }}\in \Theta \}\). Then,

$$\begin{aligned} \mathcal {R}_N(F)&= \frac{2}{N} {\mathbb {E}}\left[ \sup _{f \in \mathcal {S}}\left. \left| \sum \limits _{j=1}^N \sigma _j \epsilon (\mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j) \right| \right| (\mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)_{j=1}^N \right] \\&\le \frac{2 \overline{B}}{N} {\mathbb {E}}\left[ \left( \sum \limits _{j=1}^N \sigma _j^2 \right) ^{\frac{1}{2}}\right] \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad [\text {using (35)}]\\&\le \frac{2 \overline{B}}{N} \sqrt{ {\mathbb {E}}\left[ \sum \limits _{j=1}^N \sigma _j^2 \right] }\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad (\text {Jensen's inequality})\\&=\frac{2 \overline{B}}{\sqrt{ N }}\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad ( \sigma _j^2 = 1\text { a.s.}). \end{aligned}$$

\(\square \)

We are now in a position to prove the theorem.

Proof

(Theorem 7) Observe that \(z_N = \frac{1}{N} \sum \nolimits _{j=1}^N (\epsilon ( \mathbf {f}_N, \mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j))^p\). Next, the function \(\phi (z) = z^p\) satisfies \(\phi (0) = 0\) and is Lipschitz with constant \(L_\phi = p \overline{B}^{p-1}\) on the interval \([0, \overline{B} ]\). Consequently, from Theorem 12 part 4 of [7],

$$\begin{aligned} \mathcal {R}_N( \phi \circ F )&\le 2 L_\phi \mathcal {R}_N(F)\\&\le 2p \overline{B}^{p-1} \frac{2\overline{B}}{\sqrt{N}}\\&= \frac{4p \overline{B}^p }{\sqrt{N}}. \end{aligned}$$

Now applying Lemma 1 with \(\zeta \rightarrow (\tilde{\mathbf {x}}, \tilde{\mathbf {A}}, \tilde{{\mathbf {b}}}, \tilde{C})\), \(f(\cdot ) \rightarrow \epsilon ( \cdot )^p\), and \(M = \overline{B}^p\) yields the first part of the theorem.

For the second part of the theorem, observe that, conditional on the sample, the event \(\tilde{\mathbf {x}}\) is not a \(z_N + \alpha \)-approximate equilibrium is equivalent to the event that \(\epsilon _N > z_N + \alpha \). Now use Markov’s inequality and apply the first part of the theorem. \(\square \)

1.5 Proof of Theorem 8

Proof

Consider the first part of the theorem.

By construction, \(\hat{\mathbf {x}}\) solves \({{\mathrm{\text {VI}}}}(\mathbf {f}(\cdot , \theta _N), \mathbf {A}_{N+1}, {\mathbf {b}}_{N+1}, C_{N+1})\). The theorem, then, claims that \(\mathbf {x}_{N+1}\) is \(\delta ^\prime \equiv \sqrt{\frac{z_N}{\gamma }}\) near a solution to this VI. From Theorem 1, if \(\mathbf {x}_{N+1}\) were not \(\delta ^\prime \) near a solution, then it must be the case that \(\epsilon ( \mathbf {f}(\cdot , \theta _N), \mathbf {x}_{N+1}, \mathbf {A}_{N+1}, {\mathbf {b}}_{N+1}, C_{N+1}) > z_N\). By Theorem 6, this happens only with probability \(\beta (\alpha )\).

The second part is similar to the first with Theorem 6 replaced by Theorem 7. \(\square \)

Appendix 2: Casting structural estimation as an inverse variational inequality

In the spirit of structural estimation, assume there exists a true \({\varvec{\theta }}^* \in \Theta \) that generates solutions \(\mathbf {x}_j^*\) to \({{\mathrm{\text {VI}}}}(\mathbf {f}(\cdot , \theta ^*), \mathbf {A}^*_j, {\mathbf {b}}^*_j, C^*_j)\). We observe \((\mathbf {x}_j, \mathbf {A}_j, {\mathbf {b}}_j, C_j)\) which are noisy versions of these true parameters. We additionally are given a precise mechanism for the noise, e.g., that

$$\begin{aligned} \mathbf {x}_j = \mathbf {x}^*_j + \Delta \mathbf {x}_j, \quad \mathbf {A}_j = \mathbf {A}^*_j + \Delta \mathbf {A}_j, \quad {\mathbf {b}}_j = {\mathbf {b}}^*_j + \Delta {\mathbf {b}}_j, \quad C_j = C_j^*, \end{aligned}$$

where \((\Delta \mathbf {x}_j, \Delta \mathbf {A}_j, \Delta {\mathbf {b}}_j)\) are i.i.d. realizations of a random vector \((\tilde{\Delta \mathbf {x}}, \tilde{\Delta \mathbf {A}}, \tilde{\Delta {\mathbf {b}}})\) and \(\tilde{\Delta \mathbf {x}}, \tilde{\Delta \mathbf {A}}, \tilde{\Delta {\mathbf {b}}}\) are mutually uncorrelated.

We use Theorem 2 to estimate \({\varvec{\theta }}\) under these assumptions by solving

$$\begin{aligned} \min \limits _{\mathbf {y}\ge \mathbf {0}, {\varvec{\theta }}\in \Theta , \Delta \mathbf {x}, \Delta \mathbf {A}, \Delta {\mathbf {b}}} \quad&\left\| \begin{pmatrix} \tilde{\Delta \mathbf {x}_j} \\ \tilde{\Delta \mathbf {A}_k} \\ \tilde{\Delta {\mathbf {b}}_j} \end{pmatrix}_{j=1, \ldots , N} \right\| \nonumber \\ \text {s.t.} \quad&(\mathbf {A}_j - \Delta \mathbf {A}_j)^T \mathbf {y}_j \le _{C_j} \mathbf {f}(\mathbf {x}_j - \Delta \mathbf {x}_j, {\varvec{\theta }}), \ j=1, \ldots , N,\nonumber \\&(\mathbf {x}_j - \Delta \mathbf {x}_j)^T \mathbf {f}(\mathbf {x}_j - \Delta \mathbf {x}_j, {\varvec{\theta }}) = {\mathbf {b}}_j^T \mathbf {y}_j, j =1, \ldots , N, \end{aligned}$$
(38)

where \(\Vert \cdot \Vert \) refers to some norm. Notice this formulation also supports the case where potentially some of the components of \(\mathbf {x}\) are unobserved; simply replace them as optimization variables in the above. In words, this formulation assumes that the “de-noised” data constitute a perfect equilibrium with respect to the fitted \({\varvec{\theta }}\).

We next claim that if we assume all equilibria occur on the strict interior of the feasible region, Problem (38) is equivalent to a least-squares approximate solution to the equations \(\mathbf {f}(\mathbf {x}^*) = \mathbf {0}\). Specifically, when \(\mathbf {x}^*\) occurs on the interior of \(\mathcal {F}\), the VI condition Eq. (1) is equivalent to the equations \(\mathbf {f}(\mathbf {x}^*) = \mathbf {0}.\) At the same time, by Theorem 2, Eq. (1) is equivalent to the system (8, 9) with \(\epsilon = 0\) which motivated the constraints in Problem (38). Thus, Problem (38) is equivalent to finding a minimal (with respect to the given norm) perturbation which satisfies the structural equations.

We can relate this weighted least-squares problem to some structural estimation techniques. Indeed, [20] and [37] observed that many structural estimation techniques can be reinterpreted as a constrained optimization problem which minimizes the size of the perturbation necessary to make the observed data satisfy the structural equations, and, additionally, satisfy constraints motivated by orthogonality conditions and the generalized method of moments (GMM). In light of our previous comments, if we augment Problem (38) with the same orthogonality constraints, and all equilibria occur on the strict interior of the feasible region, the solutions to this problem will coincide traditional estimators.

Of course, some structural estimation techniques incorporate even more sophisticated adaptations. They may also pre-process the data (e.g., 2 stage least squares technique in econometrics) incorporate additional constraints (e.g. orthogonality of instruments approach), or tune the choice of norm in the least-squares computation (two-stage GMM estimation). These application-specific adaptations improve the statistical properties of the estimator given certain assumptions about the data generating process. What we would like to stress is that, provided we make the same adaptations to Problem (38)—i.e., preprocess the data, incorporate orthogonality of instruments, and tune the choice of norm—and provided that all equilibria occur on the interior, the solution to Problem (38) must coincide exactly with these techniques. Thus, they necessarily inherit all of the same statistical properties.

Recasting (at least some) structural estimation techniques in our framework facilitates a number of comparisons to our proposed approach based on Problem (12). First, it is clear how our perspective on data alters the formulation. Problem (38) seeks minimal perturbations so that the observed data are exact equilibria with respect to \({\varvec{\theta }}\), while Problem (12) seeks a \({\varvec{\theta }}\) that makes the observed data approximate equilibria and minimizes the size of the approximation. Secondly, the complexity of the proposed optimization problems differs greatly. The complexity of Problem (38) depends on the dependence of \(\mathbf {f}\) on \(\mathbf {x}\) and \({\varvec{\theta }}\) (as opposed to just \({\varvec{\theta }}\) for (12)), and there are unavoidable non-convex, bilinear terms like \(\Delta \mathbf {A}_j^T \mathbf {y}_j\). These terms are well-known to cause difficulties for numerical solvers. Thus, we expect that solving this optimization to be significantly more difficult than solving Problem (12). Finally, as we will see in the next section, Problem (12) generalizes naturally to a nonparametric setting.

Appendix 3: Omitted formulations

1.1 Formulation from Sect. 8.1

Let \(\xi ^{med}\) be the median value of \(\xi \) over the dataset. Breaking ties arbitrarily, \(\xi ^{med}\) occurs for some observation \(j = j^{med}\). Let \(p_1^{med}, p_2^{med}, \xi ^{med}_1, \xi ^{med}_2\) be the corresponding prices and demand shocks at time \(j^{med}\). (Recall that in this section \(\xi = \xi _1 = \xi _2\).) These definitiosn make precise what we mean in the main text by “fixing other variables to the median observation. Denote by \(\underline{p}_1, \underline{p}_2\) the minimum prices observed over the data set.

Our parametric formation in Sect. 8.1 is

$$\begin{aligned}&\min \limits _{\mathbf {y}, \varvec{\epsilon }, {\varvec{\theta }}_1, {\varvec{\theta }}_2 } \quad \Vert \varvec{\epsilon }\Vert _\infty \end{aligned}$$
(39a)
$$\begin{aligned}&\text {s.t.} \quad \mathbf {y}^j \ge \mathbf {0}, \quad j = 1, \ldots , N,\nonumber \\&y_i^j \ge M_i(p_1^j, p_2^j, \xi ^j; {\varvec{\theta }}_i), \quad i = 1, 2, \ j = 1, \ldots , N,\nonumber \\&\quad \sum \limits _{i=1}^2\overline{p}^j y_i^j -(p_i^j) M_i(p_1^j, p_2^j, \xi ^j; {\varvec{\theta }}_i) \le \epsilon _j, \quad j = 1, \ldots , N,\nonumber \\&\quad M_1(p_1^j, p_2^{med}, \xi ^{med}; {\varvec{\theta }}_1) \!\ge \! M_1(p_1^k, p_2^{med}, \xi ^{med}; {\varvec{\theta }}_1), \; \forall 1 \!\le \! j,k \!\le \! N \text { s.t. } p_1^j \le p_1^k,\end{aligned}$$
(39b)
$$\begin{aligned}&\quad M_2(p_1^{med}, p_2^j, \xi ^{med}; {\varvec{\theta }}_2) \!\ge \! M_2(p_1^{med}, p_2^k, \xi ^{med}; {\varvec{\theta }}_2), \; \forall 1 \!\le \! j,k \le N \text { s.t. } p_2^j \le p_2^k,\end{aligned}$$
(39c)
$$\begin{aligned}&\quad M_1(\underline{p}_1, p_2^{med}, \xi ^{med}; {\varvec{\theta }}_1) = M^*_1(\underline{p}_1, p_2^{med}, \xi ^{med}_1; {\varvec{\theta }}^*_1)\end{aligned}$$
(39d)
$$\begin{aligned}&\quad M_2(p_1^{med}, \underline{p}_2, \xi ^{med}; {\varvec{\theta }}_2) = M^*_2(p_1^{med}, \underline{p}_2, \xi ^{med}_2 ; {\varvec{\theta }}^*_2) \end{aligned}$$
(39e)

Here \(M_1\) and \(M_2\) are given by Eq. (32). Notice, for this choice, the optimization is a linear optimization problem.

Equations (39b) and (39c) constrain the fitted function to be non-decreasing in the firm’s own price. Equations (39d) and (39e) are normalization conditions. We have chosen to normalize the functions to be equal to the true functions at this one point to make the visual comparisons easier. In principle, any suitable normalization can be used.

Our nonparametric formulation is similar to the above, but we replace

  • The parametric \(M_1(\cdot , {\varvec{\theta }}_1), M_2(\cdot , {\varvec{\theta }}_2)\) with nonparametric \(M_1(\cdot ), M_2(\cdot ) \in \mathcal {H}\)

  • The objective by \(\Vert \varvec{\epsilon }\Vert _1 + \lambda ( \Vert M_1 \Vert _{\mathcal {H}}+ \Vert M_2 \Vert _{\mathcal {H}})\).

By Theorem 4 and the discussion in Sect. 6, we can rewrite this optimization as a convex quadratic program.

1.2 Formulation from Sect. 8.2

Our parametric formulation is nearly identical to the parametric formulation in Appendix “Formulation from Sect. 8.1”, with the following changes:

  • Replace Eq. (39a) by \(\Vert \varvec{\epsilon }\Vert _\infty + \lambda ( \Vert {\varvec{\theta }}_1 \Vert _1 + \Vert {\varvec{\theta }}_2\Vert _1)\)

  • Replace the definition of \(M_1, M_2\) by Eq. (33).

Our nonparametric formulation is identical to the nonparametric formulation of the previous section.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bertsimas, D., Gupta, V. & Paschalidis, I.C. Data-driven estimation in equilibrium using inverse optimization. Math. Program. 153, 595–633 (2015). https://doi.org/10.1007/s10107-014-0819-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-014-0819-4

Keywords

Mathematics Subject Classification

Navigation