Abstract
We first provide comprehensive, but simple, access to essential regression knowledge by discussing how regression analysis works, the requirements and assumptions on which it relies, and how you can specify a regression analysis model that allows you to make critical decisions for your business, clients, or project. Each step involved in regression analysis is linked to its execution in Stata (using menus and code). We show how to use a range of Stata’s easy-to-learn statistical procedures that underlie regression analysis, which will allow you to analyze, chart, and validate regression analysis results and to assess your analysis’s robustness. Interpretation of Stata output can be difficult, but we make this easier by means of an annotated case study. We conclude with suggestions for further readings on the use, application, and interpretation of regression analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Strictly speaking, the difference between the predicted and the observed y-values is \( \widehat{e} \).
- 2.
This only applies to the standardized βs.
- 3.
This is only a requirement if you are interested in the regression coefficients, which is the dominant use of regression. If you are only interested in prediction, collinearity is not important.
- 4.
The VIF is calculated using a completely separate regression analysis. In this regression analysis, the variable for which the VIF is calculated is regarded as a dependent variable and all other independent variables are regarded as independents. The R2 that this model provides is deducted from 1 and the reciprocal value of this sum (i.e., 1/(1 − R2)) is the VIF. The VIF is therefore an indication of how much the regression model explains one independent variable. If the other variables explain much of the variance (the VIF is larger than 10), collinearity is likely a problem.
- 5.
This term can be calculated manually, but also by using the function mmult in Microsoft Excel where x T x is calculated. Once this matrix has been calculated, you can use the minverse function to arrive at (x T x)−1 .
- 6.
In Stata this can be done by using the, robust option.
- 7.
The test also includes the predicted values squared and to the power of three.
- 8.
Specifically, in the mentioned regression model y = α + β 1 x 1 + β 2 x 2 + β 3 x 3 + e, the Breusch-Pagan test determines whether \( \widehat{e^2}=\alpha +{\beta}_{BP1}{x}_1+{\beta}_{BP2}{x}_2+{\beta}_{BP3}{x}_3+{e}_{BP} \).
- 9.
This hypothesis can also be read as that a model with only an intercept is sufficient.
- 10.
The AIC is specifically calculated as AIC = n·ln(SS E /n) + 2·k, where n is the number of observations and k the number of independent variables, while the BIC is calculated as BIC = n·ln(SS E /n) + k·ln(n).
- 11.
Cohen’s (1994) classical article “The Earth is Round (p < 0.05)” offers an interesting perspective on significance and effect sizes.
- 12.
Using the Stata command egen commitment=rowmean(com1 com2 com3)
- 13.
Note that a p-value is never exactly zero, but has values different from zero in later decimal places.
- 14.
Note that it is possible to show all categories for regression tables by typing set showbaselevels on. This can be made permanent by typing set showbaselevels on, permanent.
- 15.
Note that while the constant has the highest value (1.19), this is not a coefficient and should not be interpreted as an effect size.
- 16.
Please note that only Stata 13 or above feature built-in routines to calculate η 2.
- 17.
The seed specifies the initial value of the random-number generating process such that it can be replicated later.
- 18.
We would like to thank Dr. D.I. Gilliland and AgriPro for making the data and case study available.
References
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Thousand Oaks: Sage.
Baum, C. F. (2006). An introduction to modern econometrics using Stata. College Station: Stata Press.
Breusch, T. S., & Pagan, A. R. (1980). The Lagrange multiplier test and its applications to model specification in econometrics. Review of Economic Studies, 47(1), 239–253.
Cameron, A.C. & Trivedi, P.K. (1990). The information matrix test and its implied alternative hypotheses. (Working Papers from California Davis – Institute of Governmental Affairs, pp. 1–33).
Cameron, A. C., & Trivedi, P. K. (2010). Microeconometrics using stata (Revised ed.). College Station: Stata Press.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.
Cohen, J. (1994). The earth is round (p < .05). The American Psychologist, 49(912), 997–1003.
Cook, R. D., & Weisberg, S. (1983). Diagnostics for heteroscedasticity in regression. Biometrika, 70(1), 1–10.
Durbin, J., & Watson, G. S. (1951). Testing for serial correlation in least squares regression, II. Biometrika, 38(1–2), 159–179.
Fabozzi, F. J., Focardi, S. M., Rachev, S. T., & Arshanapalli, B. G. (2014). The basics of financial econometrics: Tools, concepts, and asset management applications. Hoboken: Wiley.
Green, S. B. (1991). How many subjects does it take to do a regression analysis? Multivariate Behavioral Research, 26(3), 499–510.
Greene, W. H. (2011). Econometric analysis (7th ed.). Upper Saddle River: Prentice Hall.
Hair, J. F., Jr., Black, W. C., Babin, B. J., & Anderson, R. E. (2013). Multivariate data analysis. Upper Saddle River: Pearson.
Hill, C., Griffiths, W., & Lim, G. C. (2008). Principles of econometrics (3rd ed.). Hoboken: Wiley.
Kelley, K., & Maxwell, S. E. (2003). Sample size for multiple regression: Obtaining regression coefficients that are accurate, not simply significant. Psychological Methods, 8(3), 305–321.
Mason, C. H., & Perreault, W. D., Jr. (1991). Collinearity, power, and interpretation of multiple regression analysis. Journal of Marketing Research, 28, 268–280.
Mooi, E. A., & Frambach, R. T. (2009). A stakeholder perspective on buyer–supplier conflict. Journal of Marketing Channels, 16(4), 291–307.
O’brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality and Quantity, 41(5), 673–690.
Ramsey, J. B. (1969). Test for specification errors in classical linear least-squares regression analysis. Journal of the Royal Statistical Society, Series B, 31(2), 350–371.
Sin, C., & White, H. (1996). Information criteria for selecting possibly misspecified parametric models. Journal of Econometrics, 71(1–2), 207–225.
StataCorp. (2015). Stata 14 base reference manual. College Station: Stata Press.
Treiman, D. J. (2014). Quantitative data analysis: Doing social research to test ideas. Hoboken: Wiley.
VanVoorhis, C. R. W., & Morgan, B. L. (2007). Understanding power and rules of thumb for determining sample sizes. Tutorial in Quantitative Methods for Psychology, 3(2), 43–50.
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: Journal of the Econometric Society, 48(4), 817–838.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Mooi, E., Sarstedt, M., Mooi-Reci, I. (2018). Regression Analysis. In: Market Research. Springer Texts in Business and Economics. Springer, Singapore. https://doi.org/10.1007/978-981-10-5218-7_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-5218-7_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5217-0
Online ISBN: 978-981-10-5218-7
eBook Packages: Business and ManagementBusiness and Management (R0)