Abstract
The detection of problematic collinearity in a linear regression model is treated in all the existing statistical software packages. However, such detection is not always done adequately. The main shortcomings relate to treatment of independent qualitative variables and completely ignoring the role of the intercept in the model (consequently, ignoring the nonessential collinearity). This paper presents the R package multiColl, which implements the usually applied measures for detecting near collinearity while overcoming the weaknesses observed in other existing packages.
Notes
This value differs from the threshold equal to 0.7 provided by Halkos and Tsilika (2018) to indicate a problem of near collinearity.
García et al. (2018) show that values of the determinant of the correlation matrix lower than \(0.1013 + 0.00008626 \cdot n - 0.01384 \cdot k\) indicate the presence of problematic near essential multicollinearity. Once again, this value differs from the threshold provided by Field (2019), who claims that when the value of the determinant of the correlation matrix is less than 0.00001 there is severe multicollinearity. In the example presented by Halkos and Tsilika (2018), the conclusion is that collinearity is not detected since the value of the determinant of the matrix of correlations (0.00663839) is higher than the threshold (0.00001). However, taking into account the paper presented by García et al. (2018), the threshold will be 0.01964016 and, consequently, severe near collinearity is detected.
References
Farrar, D. E., & Glauber, R. R. (1967). Multicollinearity in regression analysis: The problem revisited. The Review of Economic and Statistics, pp. 92–107.
Field, A. (2019). Discovering statistics using SPSS for Windows (3rd ed.). Los Angeles: Sage Publications.
García, C., Salmerón, R., García, C., & García, J. (2019). Residualization: justification, properties and application. Journal of Applied Statistics (in review)
García, C., Salmerín, R., & Garcóa, C. (2018). A choice of the ridge factor from the correlation matrix determinant. Journal of Statistical Computation and Simulation, 2(89), 211–231. https://doi.org/10.1080/00949655.2018.1543423.
Gunst, R. F., & Mason, R. L. (1977). Advantages of examining multicollinearities in regression analysis. Biometrics, pp. 249–260.
Halkos, G., & Tsilika, K. (2018). Programming correlation criteria with free cas software. Computational Economics, 52(1), 299–311.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.
Longley, J. (1967). An appraisal of least-squares programs from the point of view of the user. Journal of the American Statistical Association, 62, 819–841.
Marquardt, D., & Snee, R. (1975). Ridge regression in practice. The American Statistician, 1(29), 3–20. https://doi.org/10.1080/00031305.1975.10479105.
Marquardt, D. W. (1970). Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation. Technometrics, 12(3), 591–612.
Marquardt, D. W., & Snee, S. R. (1975). Ridge regression in practice. The American Statistician, 29(1), 3–20.
Salmerón, R., García, C., & García, J. (2019). “multicoll”: An r package to detect multicollinearity. arXiv preprint arXiv:1910.14590
Salmerón, R., García, C., García, J., & López, M. (2017). The raise estimators estimation, inference and properties. Communications in Statistics-Theory and Methods, 46(13), 6446–6462.
Salmerón, R., Rodríguez, A., & García, C. (2019). Diagnosis and quantification of the non-essential collinearity. Computational Statistics. https://doi.org/10.1007/s00180-019-00922-x.
Simon, D., & Lesage, J. (1988). The impact of collinearity involving the intercept term on the numerical acauracy of regression. Computer Science in Economics and Management, 1, 137–152.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
Willan, A. R., & Watts, D. G. (1978). Meaningful multicollinearity measures. Technometrics, 20(4), 407–412.
York, R. (2012). Residualization is not the answer: Rethinking how to address multicollinearity. Social Science Research, 41(6), 1379–1386.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Salmerón-Gómez, R., García-García, C. & García-Pérez, J. A Guide to Using the R Package “multiColl” for Detecting Multicollinearity. Comput Econ 57, 529–536 (2021). https://doi.org/10.1007/s10614-019-09967-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-019-09967-y