Abstract
Coefficient of determination (R 2) and its leave-one-out cross-validated analogue (denoted by Q 2 or R 2cv ) are the most frequantly published values to characterize the predictive performance of models. In this article we use R 2 and Q 2 in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 − Q 2)/(1 − R 2) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 − Q 2)/(1 − R 2) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q 2 and R 2 values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling.
Graphical Abstract
Similar content being viewed by others
References
Frank IE, Todeschini R (1994) The data analysis handbook, 1st edn. Elsevier, Amsterdam
Golbraikh A, Tropsha A (2002) J Mol Graph Model 20:269–276
Todeschini R, Consonni V, Mauri A, Pavan M (2004) Anal Chim Acta 515:199–208
Kubinyi H (2006) QSAR and molecular modelling in rational design of bioactive molecules. In: Yalcin I, Aki Sener E (eds) Proceedings of the 15th European symposium on QSAR and molecular modelling, Istanbul, Turkey, 2004. CADDD Society, Ankara, pp 30–33
Consonni V, Ballabio D, Todeschini R (2009) J Chem Inf Model 49:1669–1678
Roy PP, Paul S, Mitra I, Roy K (2009) Molecules 14:1660–1701
Consonni V, Ballabio D, Todeschini R (2010) J Chemom 24:194–201
Manvar AT, Pissurlenkar RRS, Virsodia VR, Upadhyay KD, Manvar DR, Mishra AK, Acharya HD, Parecha AR, Dholakia CD, Shah AK, Coutinhi EC (2010) Mol Divers 14:285–305
Golbraikh A, Shen M, Xiao Z, Xiao YD, Lee KH, Tropsha A (2003) J Comput Aided Mol Des 17:241–253
Cramer RD, Wendt B (2007) J Comput Aided Mol Des 21:23–32
Jiménez-Contreras E, Torres-Salinas D, Bailón-Moreno R, Ruiz-Baños R, Delgado-López-Cózar E (2008) Scientometrics 79:201–218
Doweyko AM (2008) J Comput Aided Mol Des 22:81–89
Chirico N, Gramatica P (2011) J Chem Inf Model 51:2320–2335
Chirico N, Gramatica P (2012) J Chem Inf Model 52:2044–2058
Roy K, Mitra I, Ojha PK, Kar S, Das RN, Kabir H (2012) Chemom Intell Lab Syst 118:200–210
Bagheri A, Midi H, Ganjali M, Eftekhari S (2010) Appl Math Sci 4:1367–1386
Cook DR, Weisberg S (1982) Residuals and influence regression. Chapman & Hall, New York
Chatterjee S, Hadi AS (1986) Stat Sci 1:379–416
Rousseeuw P, Hubert M (1997) Lab statistical procedures and related topics. In: Dodge Y (ed) Papers from the 3rd international conference on lab-norm related methods Neuchatel 1997, Ins. Math Stat. Hayward, pp 201–214
Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics: identifying influential data and sources of collinearity. Wiley, New York
Bevington PR (1969) Data reduction and error analysis for the physical sciences. McGraw-Hill Book Co., New York
van der Voet H (1999) J Chemom 13:195–208
Zhang X, Ding L, Sun Z, Song L, Sun T (2009) Chromatographia 70:511–518
Tetko IV, Gasteiger J, Todeschini R, Mauri A, Livingstone D, Ertl P, Palyulin VA, Radchenko EV, Zefirov NS, Makarenko AS, Tanchuk VY, Prokopenko VV (2005) J Comput Aid Mol Des 19:453–463
Dearden JC, Netzeva TI (2004) QSAR modelling of hERG potassium channel inhibition with low-dimensional descriptors. J Pharm Pharmacol 56(Suppl):S82–S82
Seipel HA, Kalivas JH (2004) J Chemom 18:306–311
Zhang L, Garcia-Munoz S (2009) Chemometr Intell Lab Syst 97:152–158
Fox J (2008) Applied regression analysis and generalized linear models, 2nd edn. SAGE Publications, Thousand Oaks
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tóth, G., Bodai, Z. & Héberger, K. Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart. J Comput Aided Mol Des 27, 837–844 (2013). https://doi.org/10.1007/s10822-013-9680-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-013-9680-4