Skip to main content

Regression Analysis

  • Chapter
  • First Online:
Market Research

Part of the book series: Springer Texts in Business and Economics ((STBE))

Abstract

We first provide comprehensive, but simple, access to essential regression knowledge by discussing how regression analysis works, the requirements and assumptions on which it relies, and how you can specify a regression analysis model that allows you to make critical decisions for your business, clients, or project. Each step involved in regression analysis is linked to its execution in Stata (using menus and code). We show how to use a range of Stata’s easy-to-learn statistical procedures that underlie regression analysis, which will allow you to analyze, chart, and validate regression analysis results and to assess your analysis’s robustness. Interpretation of Stata output can be difficult, but we make this easier by means of an annotated case study. We conclude with suggestions for further readings on the use, application, and interpretation of regression analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Strictly speaking, the difference between the predicted and the observed y-values is \( \widehat{e} \).

  2. 2.

    This only applies to the standardized βs.

  3. 3.

    This is only a requirement if you are interested in the regression coefficients, which is the dominant use of regression. If you are only interested in prediction, collinearity is not important.

  4. 4.

    The VIF is calculated using a completely separate regression analysis. In this regression analysis, the variable for which the VIF is calculated is regarded as a dependent variable and all other independent variables are regarded as independents. The R2 that this model provides is deducted from 1 and the reciprocal value of this sum (i.e., 1/(1 − R2)) is the VIF. The VIF is therefore an indication of how much the regression model explains one independent variable. If the other variables explain much of the variance (the VIF is larger than 10), collinearity is likely a problem.

  5. 5.

    This term can be calculated manually, but also by using the function mmult in Microsoft Excel where x T x is calculated. Once this matrix has been calculated, you can use the minverse function to arrive at (x T x)−1 .

  6. 6.

    In Stata this can be done by using the, robust option.

  7. 7.

    The test also includes the predicted values squared and to the power of three.

  8. 8.

    Specifically, in the mentioned regression model y = α + β 1 x 1 + β 2 x 2 + β 3 x 3 + e, the Breusch-Pagan test determines whether \( \widehat{e^2}=\alpha +{\beta}_{BP1}{x}_1+{\beta}_{BP2}{x}_2+{\beta}_{BP3}{x}_3+{e}_{BP} \).

  9. 9.

    This hypothesis can also be read as that a model with only an intercept is sufficient.

  10. 10.

    The AIC is specifically calculated as AIC = n·ln(SS E /n) + 2·k, where n is the number of observations and k the number of independent variables, while the BIC is calculated as BIC = n·ln(SS E /n) + k·ln(n).

  11. 11.

    Cohen’s (1994) classical article “The Earth is Round (p < 0.05)” offers an interesting perspective on significance and effect sizes.

  12. 12.

    Using the Stata command egen commitment=rowmean(com1 com2 com3)

  13. 13.

    Note that a p-value is never exactly zero, but has values different from zero in later decimal places.

  14. 14.

    Note that it is possible to show all categories for regression tables by typing set showbaselevels on. This can be made permanent by typing set showbaselevels on, permanent.

  15. 15.

    Note that while the constant has the highest value (1.19), this is not a coefficient and should not be interpreted as an effect size.

  16. 16.

    Please note that only Stata 13 or above feature built-in routines to calculate η 2.

  17. 17.

    The seed specifies the initial value of the random-number generating process such that it can be replicated later.

  18. 18.

    We would like to thank Dr. D.I. Gilliland and AgriPro for making the data and case study available.

References

  • Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Thousand Oaks: Sage.

    Google Scholar 

  • Baum, C. F. (2006). An introduction to modern econometrics using Stata. College Station: Stata Press.

    Google Scholar 

  • Breusch, T. S., & Pagan, A. R. (1980). The Lagrange multiplier test and its applications to model specification in econometrics. Review of Economic Studies, 47(1), 239–253.

    Article  Google Scholar 

  • Cameron, A.C. & Trivedi, P.K. (1990). The information matrix test and its implied alternative hypotheses. (Working Papers from California Davis – Institute of Governmental Affairs, pp. 1–33).

    Google Scholar 

  • Cameron, A. C., & Trivedi, P. K. (2010). Microeconometrics using stata (Revised ed.). College Station: Stata Press.

    Google Scholar 

  • Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.

    Article  Google Scholar 

  • Cohen, J. (1994). The earth is round (p < .05). The American Psychologist, 49(912), 997–1003.

    Article  Google Scholar 

  • Cook, R. D., & Weisberg, S. (1983). Diagnostics for heteroscedasticity in regression. Biometrika, 70(1), 1–10.

    Article  Google Scholar 

  • Durbin, J., & Watson, G. S. (1951). Testing for serial correlation in least squares regression, II. Biometrika, 38(1–2), 159–179.

    Article  Google Scholar 

  • Fabozzi, F. J., Focardi, S. M., Rachev, S. T., & Arshanapalli, B. G. (2014). The basics of financial econometrics: Tools, concepts, and asset management applications. Hoboken: Wiley.

    Book  Google Scholar 

  • Green, S. B. (1991). How many subjects does it take to do a regression analysis? Multivariate Behavioral Research, 26(3), 499–510.

    Article  Google Scholar 

  • Greene, W. H. (2011). Econometric analysis (7th ed.). Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Hair, J. F., Jr., Black, W. C., Babin, B. J., & Anderson, R. E. (2013). Multivariate data analysis. Upper Saddle River: Pearson.

    Google Scholar 

  • Hill, C., Griffiths, W., & Lim, G. C. (2008). Principles of econometrics (3rd ed.). Hoboken: Wiley.

    Google Scholar 

  • Kelley, K., & Maxwell, S. E. (2003). Sample size for multiple regression: Obtaining regression coefficients that are accurate, not simply significant. Psychological Methods, 8(3), 305–321.

    Google Scholar 

  • Mason, C. H., & Perreault, W. D., Jr. (1991). Collinearity, power, and interpretation of multiple regression analysis. Journal of Marketing Research, 28, 268–280.

    Article  Google Scholar 

  • Mooi, E. A., & Frambach, R. T. (2009). A stakeholder perspective on buyer–supplier conflict. Journal of Marketing Channels, 16(4), 291–307.

    Article  Google Scholar 

  • O’brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality and Quantity, 41(5), 673–690.

    Article  Google Scholar 

  • Ramsey, J. B. (1969). Test for specification errors in classical linear least-squares regression analysis. Journal of the Royal Statistical Society, Series B, 31(2), 350–371.

    Google Scholar 

  • Sin, C., & White, H. (1996). Information criteria for selecting possibly misspecified parametric models. Journal of Econometrics, 71(1–2), 207–225.

    Google Scholar 

  • StataCorp. (2015). Stata 14 base reference manual. College Station: Stata Press.

    Google Scholar 

  • Treiman, D. J. (2014). Quantitative data analysis: Doing social research to test ideas. Hoboken: Wiley.

    Google Scholar 

  • VanVoorhis, C. R. W., & Morgan, B. L. (2007). Understanding power and rules of thumb for determining sample sizes. Tutorial in Quantitative Methods for Psychology, 3(2), 43–50.

    Article  Google Scholar 

  • White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: Journal of the Econometric Society, 48(4), 817–838.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mooi, E., Sarstedt, M., Mooi-Reci, I. (2018). Regression Analysis. In: Market Research. Springer Texts in Business and Economics. Springer, Singapore. https://doi.org/10.1007/978-981-10-5218-7_7

Download citation

Publish with us

Policies and ethics