Abstract
We discuss measuring and detecting influential observations and outliers in the context of exponential family random graph (ERG) models for social networks. We focus on the level of the nodes of the network and consider those nodes whose removal would result in changes to the model as extreme or “central” with respect to the structural features that “matter”. We construe removal in terms of two case-deletion strategies: the tie-variables of an actor are assumed to be unobserved, or the node is removed resulting in the induced subgraph. We define the difference in inferred model resulting from case deletion from the perspective of information theory and difference in estimates, in both the natural and mean-value parameterisation, representing varying degrees of approximation. We arrive at several measures of influence and propose the use of two that do not require refitting of the model and lend themselves to routine application in the ERGM fitting procedure. MCMC p values are obtained for testing how extreme each node is with respect to the network structure. The influence measures are applied to two well-known data sets to illustrate the information they provide. From a network perspective, the proposed statistics offer an indication of which actors are most distinctive in the network structure, in terms of not abiding by the structural norms present across other actors.
Similar content being viewed by others
Notes
Ties were interactions that were (1) limited to radical organisation activities; (2) extend beyond radical organisations to include such categories as co-workers and roommates; (3) those that would die for each other. Further detail may be found in Rhodes and Jones (2009) who use a different version of the network.
References
Anderson, B. S., Butts, C., & Carley, K. (1999). The interaction of size and density with graph-level indices. Social Networks, 21, 239–267.
Barndorff-Nielsen, O. E. (1978). Information and exponential families in statistical theory. New York: Wiley.
Belsley, D. A., Kuh, E., & Welsh, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity, Wiley series in probability and mathematical statistics. New York: Wiley.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society B, 36, 96–127.
Block, P., Koskinen, J. H., Stadtfeld, C. J., Hollway, J., & Steglich, C. (2018). Change we can believe in: Comparing longitudinal network models on consistency, interpretability and predictive power. Social Networks, 52, 189–191.
Borgatti, S. P., & Everett, M. G. (2006). A graph-theoretic perspective on centrality. Social Networks, 28, 466–484.
Chatterjee, S., & Hadi, A. S. (2009). Sensitivity analysis in linear regression (Vol. 327). New York: John Wiley & Sons.
Cook, R. D. (1977). Detection of influential observations in linear regression. Technometrics, 19, 15–18.
Cook, R. D. (1986). Assessment of local influence. Journal of the Royal Statistical Society, Series B, 48, 133–169.
Corander, J., Dahmström, K., & Dahmström, P. (1998). Maximum likelihood estimation for Markov graphs. Research report, 1998:8, Stockholm University, Department of Statistics.
Corander, J., Dahmström, K., & Dahmström, P. (2002). Maximum likelihood estimation for exponential random graph model. In J. Hagberg (ed.), Contributions to social network analysis, information theory, and other topics in statistics; A Festschrift in honour of Ove Frank (pp. 1–17). University of Stockholm: Department of Statistics.
Crouch, B., Wasserman, S., & Trachtenberg, F. (1998). Markov Chain Monte Carlo maximum likelihood estimation for p* social network models. Paper presented at the Sunbelt XVIII and Fifth European International Social Networks Conference, Sitges (Spain), May 28–31, 1998.
Dahmström, K., & Dahmström, P. (1993). ML-estimation of the clustering parameter in a Markov graph model. Stockholm: Research report, 1993:4, Department of Statistics.
Frank, O., & Strauss, D. (1986). Markov graphs. Journal of the American Statistical Association, 81, 832–842.
Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social Networks, 1, 215–239.
Gelman, A., & Meng, X. L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, 13, 163–185.
Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Working Paper no. 39, Center for Statistics and the Social Sciences, University of Washington. http://www.csss.washington.edu/Papers/wp39.pdf.
Handcock, M., & Gile, K. (2010). Modeling social networks from sampled data. The Annals of Applied Statistics, 4, 5–25.
Hines, R. O. H., & Hines, W. G. S. (1995). Exploring Cook’s statistic graphically. The American Statistician, 49, 389–394.
Hines, R. O. H., Lawless, J. F., & Carter, E. M. (1992). Diagnostics for a cumulative multinomial generalized linear model, with applications to grouped toxicological mortality data. Journal of the American Statistical Association, 87, 1059–1069.
Holland, P., & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs (with discussion). Journal of the American Statistical Association, 76, 33–65.
Huisman, M. (2009). Imputation of missing network data: Some simple procedures. Journal of Social Structure, 10(1), 1–29.
Hunter, D. R., & Handcock, M. S. (2006). Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics, 15, 565–583.
Jonasson, J. (1999). The random triangle model. Journal of Applied Probability, 36, 852–876.
Koskinen, J. (in press). Exponential random graph models. In B. Everitt, G. Molenberghs, W. Piegorsch, F. Ruggeri, M. Davidian, & R. Kenett (Eds.), Wiley StatsRef: Statistics Reference Online. Wiley, stat08136. https://doi.org/10.1002/9781118445112.stat08136.
Koskinen, J., Robins, G., & Pattison, P. E. (2010). Analysing exponential random graph (p-star) models with missing data using bayesian data augmentation. Statistical Methodology, 7(3), 366–384.
Koskinen, J., Robins, G., Wang, P., & Pattison, P. E. (2013). Bayesian analysis for partially observed network data, missing ties, attributes and actors. Social Networks, 35(4), 514–527.
Koskinen, J., & Snijders, T. A. B. (2013). Simulation, estimation and goodness of fit. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods and applications (pp. 141–166). New York, NY: Cambridge University Press.
Kuhnt, S. (2004). Outlier identification procedures for contingency tables using maximum likelihood and \(L_1\) estimates. Scandinavian Journal of Statistics, 31, 431–442.
Laumann, E. O., Marsden, P. V., & Prensky, D. (1983). The boundary specification problem in network analysis. In R. S. Burt & M. J. Minor (Eds.), Applied network analysis (pp. 18–34). London: Sage Publications.
Lazega, E. (2001). The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership. Oxford: Oxford University Press.
Lee, A. H. (1988). Partial influence in generalized linear models. Biometrics, 44, 71–77.
Lehmann, E. L. (1983). Theory of point estimation. New York: Wiley.
Lesaffre, E., & Albert, A. (1989). Multiple-group logistic regression diagnostics. Applied Statistics, 38, 425–440.
Lesaffre, E., & Verbeke, G. (1998). Local influence in linear mixed models. Biometrics, 570–582.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
Lusher, D., Koskinen, J., & Robins, G. L. (2013). Exponential random graph models for social networks: Theory, methods, and applications. Cambridge: Cambridge University Press.
McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444.
Meng, X.-L., & Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica, 6, 831–860.
Neal, R. M. (1993) Probabilistic inference using Markov Chain Monte Carlo methods. Technical Report CRG–TR–93–1, Department of Statistics, University of Toronto. http://www.cs.utoronto.ca/~radford/. Accessed 29 Sept 2008.
Nomikos, J. M. (2007). Terrorism, media, and intelligence in Greece: Capturing the 17 November group. International Journal of Intelligence and CounterIntelligence, 20(1), 65–78.
Pattison, P. E., & Wasserman, S. (1999). Logit models and logistic regressions for social networks: II. Multivariate relations. British Journal of Mathematical and Statistical Psychology, 52, 169–193.
Pierce, D. A., & Schafer, D. W. (1986). Residuals in generalized linear models. Journal of the American Statistical Association, 81, 977–986.
Pregibon, D. (1981). Logistic regression diagnostics. The Annals of Statistics, 9, 705–724.
Rhodes, C. J., & Jones, P. (2009). Inferring missing links in partially observed social networks. Journal of the Operational Research Society, 60, 1373–1383.
Robins, G. L., & Daraganova, G. (2013). Social selection, dyadic covariates, and geospatial effects. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods, and applications (pp. 91–101). Cambridge: Cambridge University Press.
Robins, G. L., Elliott, P., & Pattison, P. E. (2001). Network models for social selection processes. Social networks, 23, 1–30.
Robins, G. L., & Lusher, D. (2013). Illustrations: Simulation, estimation, and goodness of fit. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods, and applications (pp. 167–185). Cambridge: Cambridge University Press.
Robins, G. L., & Morris, M. (2007). Advances in exponential random graph (p*) Models. Social Networks, 29, 169–172.
Robins, G. L., Pattison, P. E., & Elliot, P. (2001). Network models for social influence processes. Psychometrika, 66, 161–190.
Robins, G. L., Pattison, P. E., & Woolcock, J. (2005). Small and other worlds: Global network structures from local processes. American Journal of Sociology, 110, 894–936.
Rubin, D. B. (1976). Inference and missing data (with discussion). Biometrika, 63, 581–592.
Schoch, D., & Brandes, U. (2015). Stars, neighborhood inclusion, and network centrality. In SIAM workshop on network science.
Shalizi, C. R., & Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. The Annals of Statistics, 41, 508–535.
Snijders, T. A. B. (2002). Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure, 3(2), 1–40.
Snijders, T. A. B. (2010). Conditional marginalization for exponential random graph models. Journal of Mathematical Sociology, 34, 239–252.
Snijders, T. A. B., & Borgatti, S. P. (1999). Non-parametric standard errors and tests for network statistics. Connections, 22, 61–70.
Snijders, T. A. B., Pattison, P. E., Robins, G. L., & Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology, 36, 99–153.
Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association, 106, 1361–1370.
Schweinberger, M., Krivitsky, P. N., & Butts, C. T. (2017). Foundations of finite-, super-, and infinite-population random graph inference. arXiv:1707.04800v1
Strauss, D. (1986). On a general class of models for interaction. SIAM Review, 28, 513–527.
The John Jay & ARTIS Transnational Terrorism Database, JJATT. (2009). http://doitapps.jjay.cuny.edu/jjatt/data.php. Accessed 27 July 2016.
van Duijn, M. A. J., Gile, K. J., & Handcock, M. S. (2009). A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models. Social Networks, 31(1), 52–62.
Wang, P., Pattison, P., & Robins, G. (2013). Exponential random graph model specifications for bipartite networks—A dependence hierarchy. Social Networks, 35(2), 211–222.
Wang, P., Robins, G., Pattison, P., & Koskinen, J. (2014). MPNet, Program for the simulation and estimation of (\(p^{\ast }\)) exponential random graph models for Multilevel networks: USER MANUAL. Melbourne School of Psychological Sciences The University of Melbourne Australia.
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: Cambridge University Press.
Wasserman, S., & Pattison, P. E. (1996). Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p*. Psychometrika, 61, 401–425.
Waternaux, C., Laird, N. M., & Ware, J. H. (1989). Methods for analysis of longitudinal data: Blood-lead concentrations and cognitive development. Journal of the American Statistical Association, 84, 33–41.
Weiss, R. E., & Lazaro, C. G. (1992). Residual plots for repeated measures. Statistics in Medicine, 11, 115–124.
Williams, D. A. (1984). Residuals in generalized linear models. In Proceedings of the XIIth international biometric conference, Tokyo (pp. 59–68).
Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics, 36, 181–191.
Author information
Authors and Affiliations
Corresponding author
Additional information
Johan Koskinen would like to acknowledge financial support from the Leverhulme Trust Grant RPG-2013-140 and SRG2012.
Rights and permissions
About this article
Cite this article
Koskinen, J., Wang, P., Robins, G. et al. Outliers and Influential Observations in Exponential Random Graph Models. Psychometrika 83, 809–830 (2018). https://doi.org/10.1007/s11336-018-9635-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-018-9635-8