Skip to main content
Log in

A new mixed MNP model accommodating a variety of dependent non-normal coefficient distributions

  • Published:
Theory and Decision Aims and scope Submit manuscript

Abstract

In this paper, we propose a general copula approach to accommodate non-normal continuous mixing distributions in multinomial probit models. In particular, we specify a multivariate mixing distribution that allows different marginal continuous parametric distributions for different coefficients. A new hybrid estimation technique is proposed to estimate the model, which combines the advantageous features of each of the maximum simulated likelihood inference technique and Bhat’s maximum approximate composite marginal likelihood inference approach. The effectiveness of our formulation and inference approach is demonstrated through simulation exercises and an empirical application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. An analogous structure may be obtained by essentially adding an IID Gumbel error term across alternatives to the multivariate normal coefficients, leading to a mixed multinomial logit model; see Bhat (1997) and Revelt and Train (1998) for the first multivariate applications of this type of a model. Alternatively, one can add a multivariate extreme value (MEV) error vector kernel to the utility of the alternatives, combined with additional non-identical kernel error terms, to the random coefficients vector (see, for example, Bhat and Guo 2007). But, as discussed in detail in Bhat (2011), all these structures essentially achieve the same purpose, and the choice is simply a matter of convenience. Besides, the use of an MNP kernel has substantial advantages when combined with recently proposed analytic methods of evaluating a multivariate cumulative normal distribution (MVNCD) function that have been shown to be much more computationally efficient than traditional simulation approaches. Also, when extensions to accommodate correlation across decision makers due to spatial and/or social interactions are considered, the MNP kernel is much easier and more efficient. We will henceforth focus in this paper on the MNP kernel.

  2. Just to clarify a myth. The mixed multinomial logit model is no more general than the mixed MNP model, as long as we allow the mixing distribution with the MNP kernel to be non-normal, as we do so in the current paper.

  3. Discrete distributions may also be used for the mixing. If the mixing vector is assumed to take M possible value states with state-specific probabilities, this leads to the familiar latent class model used in marketing (see Kamakura and Russell 1989) and transportation (see Bhat 1997). On the other hand, if a discrete distribution is considered separately for each individual random coefficient, this is essentially a non-parametric random coefficients model (see Bastin et al. 2010; Berry and Haile 2014; il Kim 2014). The non-parametric specification allows consistent estimates of the observed variable effects under broad model contexts by making regularity (for instance, differentiability) assumptions on an otherwise distribution-free density form. But the flexibility of these methods comes at a high inferential cost since consistency is achieved only in very large samples, parameter estimates have high variance, and the computational complexity/effort can be substantial (Mittelhammer and Judge 2011). Overall, the continuous distribution specification dominates the literature, at least in part because it offers efficiency in the number of mixing distribution parameters to be estimated.

  4. Note that, by construction, the marginal multivariate distribution function of \({\varvec{\upbeta }}_{q}\) is the multivariate standard normal distribution function of \({\tilde{\varvec{\upbeta }}}_q \); that is \(F_E ({\varvec{\upbeta } }_q <\mathbf{z}_q )=\Phi _E ({{\varvec{g}}}_q \hbox {;} {\varvec{\Gamma }}_{\tilde{\beta }} ),\) from which \(f_E (\mathbf z _q )=\frac{dF_E ({\varvec{\upbeta } }_q <\mathbf z _q )}{{\varvec{dz}}_q }= \phi _E ({{\varvec{g}}}_q \hbox {;} {\varvec{\Gamma }}_{\tilde{\beta }} )\frac{{\varvec{dg}}_q }{{\varvec{dz}}_q },\) or \(f_E (\mathbf z _q ){\varvec{dz}}_q =\phi _E ({{\varvec{g}}}_q \hbox {;} {\varvec{\Gamma }}_{\tilde{\beta }} ){{\varvec{dg}}}_q \), and Eq. (14) is the result.

  5. On the other hand, the problem with the log-normal distribution to represent a coefficient such as a cost coefficient is that the tails of the distribution are directly determined by the variance term. If there is high heterogeneity in the sensitivity to cost, this immediately implies a peaking (mode) close to zero as well as a long and fat left tail (note that the cost coefficient is introduced as the negative of the log-normal distribution). The result is that, as the variance parameter of the log-normal distribution increases (for the same mean parameter), a larger fraction of individuals will have a small cost coefficient. At the same time, a small fraction of individuals will have very high cost sensitivity because of the long and fat tail. The result can cause unusually large and small willingness to pay estimates. Further, the long and fat tail on the unbounded side of the distribution is known to cause convergence problems during estimation (Bartels et al. 2006).

  6. As discussed earlier, the log-normal distribution a priori fixes the power term to 1. Here, while we can estimate the power term, our experience suggested that the optimization algorithms took longer with much more convergence difficulty than if the power term was fixed. That is, the best way to estimate a model with a power log-normal term appears to be to estimate the model at different fixed values of the power term, and then compare the data fits across the different optimization function values (corresponding to different fixed values of the power term) to determine the best value for the power term. That is the reason we fix the power term at the value of five in the simulation estimations here, while estimating the means (\(\mu _{1}\) and \(\mu _{2}\)) and the scale parameters ( \(\sigma _{1}\) and \(\sigma _{2}\)).

  7. The specification for the differenced covariance matrix above may be viewed as being derived from a specification where the error terms for the first three alternatives are independent and distributed with a variance of 0.5, while the last error term has a variance of 0.913 and is correlated with the error term of the third alternative with a covariance of 0.1. In the simulation experiment estimations, to focus on the random coefficients, we fix the variances of the first three alternatives to 0.5 and impose independence among the first three alternatives, but estimate the variance of the fourth error term and the covariance between the third and fourth alternatives, which translates to the two Cholesky parameters \(l_{\varTheta 5} =0.404 \hbox { and } l_{\varTheta 6} =0.998.\)

  8. This penalized log-composite likelihood is nothing but the generalization of the usual Akaike’s Information Criterion (AIC). In fact, when the candidate model includes the true model in the usual maximum likelihood inference procedure, the information identity holds (i.e., \({{\varvec{H}}} ({\varvec{\theta }})={{\varvec{J}}}({\varvec{\theta }}))\) and the CLIC in this case is exactly the AIC \([=\log L_{ML} (\hat{{{\varvec{\theta }} }})-\)(# of model parameters)].

  9. There has been a healthy discussion and debate in the literature (see, for example, Ory and Mokhtarian 2005; Cirillo and Axhausen 2006) on the issue of whether or not some individuals associate a positive valuation to travel time as opposed to the predominantly held view that people are averse to higher travel times. Of course, there is also the issue that this may be very context dependent, including, for example, the length of the travel time being considered (see, for example, Pinjari and Bhat (2006), who suggest that the sensitivity to travel time is non-linear over travel time). In this paper, we do not engage in this line of debate. The purpose here is to present, and demonstrate an application of, a flexible copula model and its estimation that can be gainfully employed to estimate different combinations of multivariate random coefficient distributions to then guide the final model structure and specification, based on theoretical considerations (for example, which coefficients should have bounded distributions and which can have unbounded distributions), intuitive considerations (the reasonableness of trade-off values obtained and their profiles over the population), and statistical data fit considerations.

References

  • Amador, F. J., Gonzales, R., & Ortuzar, J. (2005). Preference heterogeneity and willingness to pay for travel time savings. Transportation, 32(6), 627–647.

    Article  Google Scholar 

  • Azzalini, A. (2013). The Skew-normal and Related Families (Vol. 3). Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Balcombe, K., Chalak, A., & Fraser, I. M. (2009). Model selection for the mixed logit with Bayesian estimation. Journal of Environmental Economics and Management, 57(2), 226–237.

    Article  Google Scholar 

  • Bartels, R., Fiebig, D. G., & van Soest, A. (2006). Consumers and experts: An econometric analysis of the demand for water heaters. Empirical Economics, 31(2), 369–391.

    Article  Google Scholar 

  • Bastin, F., Cirillo, C., & Toint, P. L. (2010). Estimating nonparametric random utility models with an application to the value of time in heterogeneous populations. Transportation Science, 44(4), 537–549.

    Article  Google Scholar 

  • Berry, S. T., & Haile, P. A. (2014). Identification in differentiated products markets using market level data. Econometrica, 82(5), 1749–1797.

    Article  Google Scholar 

  • Bhat, C. R. (1997). Work travel mode choice and number of non-work commute stops. Transportation Research Part B, 31(1), 41–54.

    Article  Google Scholar 

  • Bhat, C.R. (2004). Austin commuter survey: Findings and recommendations. Technical Report, Department of Civil, Architectural & Environmental Engineering, The University of Texas at Austin. http://www.ce.utexas.edu/prof/bhat/reports/austin_commuter_survey_report.doc

  • Bhat, C. R. (2011). The maximum approximate composite marginal likelihood (MACML) estimation of multinomial probit-based unordered response choice models. Transportation Research Part B, 45(7), 923–939.

    Article  Google Scholar 

  • Bhat, C. R. (2014). The composite marginal likelihood (CML) inference approach with applications to discrete and mixed dependent variable models. Foundations and Trends in Econometrics, 7(1), 1–117. (Now Publishers Inc.).

    Article  Google Scholar 

  • Bhat, C. R., & Eluru, N. (2009). A copula-based approach to accommodate residential self-selection effects in travel behavior modeling. Transportation Research Part B, 43(7), 749–765.

    Article  Google Scholar 

  • Bhat, C. R., & Guo, J. Y. (2007). A comprehensive analysis of built environment characteristics on household residential choice and auto ownership levels. Transportation Research Part B, 41(5), 506–526.

    Article  Google Scholar 

  • Bhat, C. R., & Sardesai, R. (2006). The impact of stop-making and travel time reliability on commute mode choice. Transportation Research Part B, 40(9), 709–730.

    Article  Google Scholar 

  • Bhat, C. R., & Sidharthan, R. (2012). A new approach to specify and estimate non-normally mixed multinomial probit models. Transportation Research Part B, 46(7), 817–833.

    Article  Google Scholar 

  • Bhat, C. R., Dubey, S. K., & Nagel, K. (2015). Introducing non-normality of latent psychological constructs in choice modeling with an application to bicyclist route choice. Transportation Research Part B, 78, 341–363.

    Article  Google Scholar 

  • Bhat, C. R., Sener, I. N., & Eluru, N. (2010). A flexible spatially dependent discrete choice model: Formulation and application to teenagers’ weekday recreational activity participation. Transportation Research Part B, 44(8–9), 903–921.

    Article  Google Scholar 

  • Capitanio, A. (2010). On the approximation of the tail probability of the scalar skew-normal distribution. Metron, 68(3), 299–308.

    Article  Google Scholar 

  • Cedilnik, A., Kosmelj, K., & Blejec, A. (2006). Ratio of two random variables: A note on the existence of its moments. Metodološki Zvezki—Advances in Methodology and Statistics, 3(1), 1–7.

    Article  Google Scholar 

  • Cirillo, C., & Axhausen, K. W. (2006). Evidence on the distribution of values of travel time savings from a six-week diary. Transportation Research Part A, 40(5), 444–457.

    Google Scholar 

  • Daly, A., Hess, S., & Train, K. (2011). Assuring finite moments for willingness to pay in random coefficient models. Transportation, 39(1), 19–31.

    Article  Google Scholar 

  • Godambe, V. P. (1960). An optimum property of regular maximum likelihood estimation. The Annals of Mathematical Statistics, 31(4), 1208–1211.

    Article  Google Scholar 

  • Hensher, D. A., Rose, J. M., & Greene, W. H. (2005). Applied Choice Analysis: A Primer. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Ho, C., & Mulley, C. (2015). Intra-household interactions in tour-based mode choice: The role of social, temporal, spatial and resource constraints. Transport Policy, 38, 52–63.

    Article  Google Scholar 

  • il Kim, K. (2014). Identification of the distribution of random coefficients in static and dynamic discrete choice models. The Korean Economic Review, 30(2), 191–216.

    Google Scholar 

  • Joe, H. (2015). Dependence Modeling with Copulas. Boca Raton, FL: CRC Press, Taylor and Francis.

    Google Scholar 

  • Kamakura, W. A., & Russell, G. (1989). A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research, 26, 379–390.

    Article  Google Scholar 

  • Luce, R. D., & Suppes, P. (1965). Preference, utility, and subjective probability. In R. D. Luce, R. R. Bush, & E. H. Galanter (Eds.), Handbook of Mathematical Psychology (Vol. 3, pp. 249–410). New York: Wiley.

    Google Scholar 

  • McFadden, D. (1974). The measurement of urban travel demand. Journal of Public Economics, 3(4), 303–328.

    Article  Google Scholar 

  • McFadden, D., & Train, K. (2000). Mixed MNL models for discrete response. Journal of Applied Econometrics, 15(5), 447–470.

    Article  Google Scholar 

  • Mittelhammer, R. C., & Judge, G. (2011). A family of empirical likelihood functions and estimators for the binary response model. Journal of Econometrics, 164(2), 207–217.

    Google Scholar 

  • Nelsen, R. B. (2006). An Introduction to Copulas (2nd ed.). New York: Springer.

    Google Scholar 

  • Ory, D. T., & Mokhtarian, P. L. (2005). When is getting there half the fun? Modeling the liking for travel. Transportation Research Part A: Policy and Practice, 39(2), 97–123.

  • Paleti, R., & Bhat, C. R. (2013). The composite marginal likelihood (CML) estimation of panel ordered-response models. Journal of Choice Modelling, 7, 24–43.

    Article  Google Scholar 

  • Paleti, R., Bhat, C., & Pendyala, R. (2013). Integrated model of residential location, work location, vehicle ownership, and commute tour characteristics. Transportation Research Record: Journal of the Transportation Research Board, 2382, 162–172.

    Article  Google Scholar 

  • Pinjari, A., & Bhat, C. (2006). Nonlinearity of response to level-of-service variables in travel mode choice models. Transportation Research Record: Journal of the Transportation Research Board, 1977, 67–74.

    Article  Google Scholar 

  • Revelt, D., & Train, K. (1998). Mixed logit with repeated choices: households’ choices of appliance efficiency level. Review of Economics and Statistics, 80(4), 647–657.

    Article  Google Scholar 

  • Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris, 8, 229–231.

    Google Scholar 

  • Sklar, A. (1973). Random variables, joint distribution functions, and copulas. Kybernetika, 9(6), 449–460.

    Google Scholar 

  • Small, K. A. (2012). Valuation of travel time. Economics of Transportation, 1(1), 2–14.

    Article  Google Scholar 

  • Torres, C., Hanley, N., & Riera, A. (2011). How wrong can you be? Implications of incorrect utility function specification for welfare measurement in choice experiments. Journal of Environmental Economics and Management, 62(1), 111–121.

    Article  Google Scholar 

  • Train, K., & Sonnier, G. (2005). Mixed logit with bounded distributions of correlated partworths. In R. Scarpa, A. Alberini (Eds.), Applications of simulation methods in environmental and resource economics (Ch. 7, pp. 117–134). Dordrecht: Springer.

  • Train, K., & Weeks, M. (2005). Discrete choice models in preference space and willingness-to-pay space. In R. Scarpa, A. Alberini (Eds.) Applications of simulation methods in environmental and resource economics (Ch. 1, pp. 1–16). Dordrecht: Springer.

  • Trivedi, P. K., & Zimmer, D. M. (2007). Copula modeling: An introduction for practitioners. Foundations and Trends in Econometrics, 1(1), 1–111. (Now Publishers Inc.).

    Article  Google Scholar 

  • Varin, C., & Vidoni, P. (2005). A note on composite likelihood inference and model selection. Biometrika, 92(3), 519–528.

    Article  Google Scholar 

  • Wang, R. (2015). The stops made by commuters: Evidence from the 2009 US National Household Travel Survey. Journal of Transport Geography, 47, 109–118.

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the U.S. Department of Transportation through the Data-Supported Transportation Operations and Planning (D-STOP) Tier 1 University Transportation Center. The first author would like to acknowledge support from a Humboldt Research Award from the Alexander von Humboldt Foundation, Germany. The authors are grateful to Lisa Macias for her help in formatting this document, and to two anonymous referees who provided useful comments on an earlier version of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chandra R. Bhat.

Appendix

Appendix

Table 5 Sample distributions with closed-form inverse cumulative distribution functions and that are bounded on the half-line

See Table 5

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhat, C.R., Lavieri, P.S. A new mixed MNP model accommodating a variety of dependent non-normal coefficient distributions. Theory Decis 84, 239–275 (2018). https://doi.org/10.1007/s11238-017-9638-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11238-017-9638-4

Keywords

Navigation