Skip to main content
Log in

Outliers and Influential Observations in Exponential Random Graph Models

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

We discuss measuring and detecting influential observations and outliers in the context of exponential family random graph (ERG) models for social networks. We focus on the level of the nodes of the network and consider those nodes whose removal would result in changes to the model as extreme or “central” with respect to the structural features that “matter”. We construe removal in terms of two case-deletion strategies: the tie-variables of an actor are assumed to be unobserved, or the node is removed resulting in the induced subgraph. We define the difference in inferred model resulting from case deletion from the perspective of information theory and difference in estimates, in both the natural and mean-value parameterisation, representing varying degrees of approximation. We arrive at several measures of influence and propose the use of two that do not require refitting of the model and lend themselves to routine application in the ERGM fitting procedure. MCMC p values are obtained for testing how extreme each node is with respect to the network structure. The influence measures are applied to two well-known data sets to illustrate the information they provide. From a network perspective, the proposed statistics offer an indication of which actors are most distinctive in the network structure, in terms of not abiding by the structural norms present across other actors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Ties were interactions that were (1) limited to radical organisation activities; (2) extend beyond radical organisations to include such categories as co-workers and roommates; (3) those that would die for each other. Further detail may be found in Rhodes and Jones (2009) who use a different version of the network.

References

  • Anderson, B. S., Butts, C., & Carley, K. (1999). The interaction of size and density with graph-level indices. Social Networks, 21, 239–267.

    Article  Google Scholar 

  • Barndorff-Nielsen, O. E. (1978). Information and exponential families in statistical theory. New York: Wiley.

    Google Scholar 

  • Belsley, D. A., Kuh, E., & Welsh, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity, Wiley series in probability and mathematical statistics. New York: Wiley.

    Book  Google Scholar 

  • Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society B, 36, 96–127.

    Google Scholar 

  • Block, P., Koskinen, J. H., Stadtfeld, C. J., Hollway, J., & Steglich, C. (2018). Change we can believe in: Comparing longitudinal network models on consistency, interpretability and predictive power. Social Networks, 52, 189–191.

    Article  Google Scholar 

  • Borgatti, S. P., & Everett, M. G. (2006). A graph-theoretic perspective on centrality. Social Networks, 28, 466–484.

    Article  Google Scholar 

  • Chatterjee, S., & Hadi, A. S. (2009). Sensitivity analysis in linear regression (Vol. 327). New York: John Wiley & Sons.

    Google Scholar 

  • Cook, R. D. (1977). Detection of influential observations in linear regression. Technometrics, 19, 15–18.

    Google Scholar 

  • Cook, R. D. (1986). Assessment of local influence. Journal of the Royal Statistical Society, Series B, 48, 133–169.

    Google Scholar 

  • Corander, J., Dahmström, K., & Dahmström, P. (1998). Maximum likelihood estimation for Markov graphs. Research report, 1998:8, Stockholm University, Department of Statistics.

  • Corander, J., Dahmström, K., & Dahmström, P. (2002). Maximum likelihood estimation for exponential random graph model. In J. Hagberg (ed.), Contributions to social network analysis, information theory, and other topics in statistics; A Festschrift in honour of Ove Frank (pp. 1–17). University of Stockholm: Department of Statistics.

  • Crouch, B., Wasserman, S., & Trachtenberg, F. (1998). Markov Chain Monte Carlo maximum likelihood estimation for p* social network models. Paper presented at the Sunbelt XVIII and Fifth European International Social Networks Conference, Sitges (Spain), May 28–31, 1998.

  • Dahmström, K., & Dahmström, P. (1993). ML-estimation of the clustering parameter in a Markov graph model. Stockholm: Research report, 1993:4, Department of Statistics.

  • Frank, O., & Strauss, D. (1986). Markov graphs. Journal of the American Statistical Association, 81, 832–842.

    Article  Google Scholar 

  • Freeman, L. C. (1978). Centrality in social networks conceptual clarification. Social Networks, 1, 215–239.

    Article  Google Scholar 

  • Gelman, A., & Meng, X. L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statistical Science, 13, 163–185.

    Article  Google Scholar 

  • Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Working Paper no. 39, Center for Statistics and the Social Sciences, University of Washington. http://www.csss.washington.edu/Papers/wp39.pdf.

  • Handcock, M., & Gile, K. (2010). Modeling social networks from sampled data. The Annals of Applied Statistics, 4, 5–25.

    Article  Google Scholar 

  • Hines, R. O. H., & Hines, W. G. S. (1995). Exploring Cook’s statistic graphically. The American Statistician, 49, 389–394.

    Google Scholar 

  • Hines, R. O. H., Lawless, J. F., & Carter, E. M. (1992). Diagnostics for a cumulative multinomial generalized linear model, with applications to grouped toxicological mortality data. Journal of the American Statistical Association, 87, 1059–1069.

    Article  Google Scholar 

  • Holland, P., & Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs (with discussion). Journal of the American Statistical Association, 76, 33–65.

    Article  Google Scholar 

  • Huisman, M. (2009). Imputation of missing network data: Some simple procedures. Journal of Social Structure, 10(1), 1–29.

    Google Scholar 

  • Hunter, D. R., & Handcock, M. S. (2006). Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics, 15, 565–583.

    Article  Google Scholar 

  • Jonasson, J. (1999). The random triangle model. Journal of Applied Probability, 36, 852–876.

    Article  Google Scholar 

  • Koskinen, J. (in press). Exponential random graph models. In B. Everitt, G. Molenberghs, W. Piegorsch, F. Ruggeri, M. Davidian, & R. Kenett (Eds.), Wiley StatsRef: Statistics Reference Online. Wiley, stat08136. https://doi.org/10.1002/9781118445112.stat08136.

  • Koskinen, J., Robins, G., & Pattison, P. E. (2010). Analysing exponential random graph (p-star) models with missing data using bayesian data augmentation. Statistical Methodology, 7(3), 366–384.

    Article  Google Scholar 

  • Koskinen, J., Robins, G., Wang, P., & Pattison, P. E. (2013). Bayesian analysis for partially observed network data, missing ties, attributes and actors. Social Networks, 35(4), 514–527.

    Article  Google Scholar 

  • Koskinen, J., & Snijders, T. A. B. (2013). Simulation, estimation and goodness of fit. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods and applications (pp. 141–166). New York, NY: Cambridge University Press.

    Google Scholar 

  • Kuhnt, S. (2004). Outlier identification procedures for contingency tables using maximum likelihood and \(L_1\) estimates. Scandinavian Journal of Statistics, 31, 431–442.

    Article  Google Scholar 

  • Laumann, E. O., Marsden, P. V., & Prensky, D. (1983). The boundary specification problem in network analysis. In R. S. Burt & M. J. Minor (Eds.), Applied network analysis (pp. 18–34). London: Sage Publications.

    Google Scholar 

  • Lazega, E. (2001). The collegial phenomenon: The social mechanisms of cooperation among peers in a corporate law partnership. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Lee, A. H. (1988). Partial influence in generalized linear models. Biometrics, 44, 71–77.

    Article  Google Scholar 

  • Lehmann, E. L. (1983). Theory of point estimation. New York: Wiley.

    Book  Google Scholar 

  • Lesaffre, E., & Albert, A. (1989). Multiple-group logistic regression diagnostics. Applied Statistics, 38, 425–440.

    Article  Google Scholar 

  • Lesaffre, E., & Verbeke, G. (1998). Local influence in linear mixed models. Biometrics, 570–582.

    Article  Google Scholar 

  • Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.

    Google Scholar 

  • Lusher, D., Koskinen, J., & Robins, G. L. (2013). Exponential random graph models for social networks: Theory, methods, and applications. Cambridge: Cambridge University Press.

    Google Scholar 

  • McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444.

    Article  Google Scholar 

  • Meng, X.-L., & Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica, 6, 831–860.

    Google Scholar 

  • Neal, R. M. (1993) Probabilistic inference using Markov Chain Monte Carlo methods. Technical Report CRG–TR–93–1, Department of Statistics, University of Toronto. http://www.cs.utoronto.ca/~radford/. Accessed 29 Sept 2008.

  • Nomikos, J. M. (2007). Terrorism, media, and intelligence in Greece: Capturing the 17 November group. International Journal of Intelligence and CounterIntelligence, 20(1), 65–78.

    Article  Google Scholar 

  • Pattison, P. E., & Wasserman, S. (1999). Logit models and logistic regressions for social networks: II. Multivariate relations. British Journal of Mathematical and Statistical Psychology, 52, 169–193.

    Article  Google Scholar 

  • Pierce, D. A., & Schafer, D. W. (1986). Residuals in generalized linear models. Journal of the American Statistical Association, 81, 977–986.

    Article  Google Scholar 

  • Pregibon, D. (1981). Logistic regression diagnostics. The Annals of Statistics, 9, 705–724.

    Article  Google Scholar 

  • Rhodes, C. J., & Jones, P. (2009). Inferring missing links in partially observed social networks. Journal of the Operational Research Society, 60, 1373–1383.

    Article  Google Scholar 

  • Robins, G. L., & Daraganova, G. (2013). Social selection, dyadic covariates, and geospatial effects. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods, and applications (pp. 91–101). Cambridge: Cambridge University Press.

    Google Scholar 

  • Robins, G. L., Elliott, P., & Pattison, P. E. (2001). Network models for social selection processes. Social networks, 23, 1–30.

    Article  Google Scholar 

  • Robins, G. L., & Lusher, D. (2013). Illustrations: Simulation, estimation, and goodness of fit. In D. Lusher, J. Koskinen, & G. Robins (Eds.), Exponential random graph models for social networks: Theory, methods, and applications (pp. 167–185). Cambridge: Cambridge University Press.

    Google Scholar 

  • Robins, G. L., & Morris, M. (2007). Advances in exponential random graph (p*) Models. Social Networks, 29, 169–172.

    Article  Google Scholar 

  • Robins, G. L., Pattison, P. E., & Elliot, P. (2001). Network models for social influence processes. Psychometrika, 66, 161–190.

    Article  Google Scholar 

  • Robins, G. L., Pattison, P. E., & Woolcock, J. (2005). Small and other worlds: Global network structures from local processes. American Journal of Sociology, 110, 894–936.

    Article  Google Scholar 

  • Rubin, D. B. (1976). Inference and missing data (with discussion). Biometrika, 63, 581–592.

    Article  Google Scholar 

  • Schoch, D., & Brandes, U. (2015). Stars, neighborhood inclusion, and network centrality. In SIAM workshop on network science.

  • Shalizi, C. R., & Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. The Annals of Statistics, 41, 508–535.

    Article  Google Scholar 

  • Snijders, T. A. B. (2002). Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure, 3(2), 1–40.

    Google Scholar 

  • Snijders, T. A. B. (2010). Conditional marginalization for exponential random graph models. Journal of Mathematical Sociology, 34, 239–252.

    Article  Google Scholar 

  • Snijders, T. A. B., & Borgatti, S. P. (1999). Non-parametric standard errors and tests for network statistics. Connections, 22, 61–70.

    Google Scholar 

  • Snijders, T. A. B., Pattison, P. E., Robins, G. L., & Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology, 36, 99–153.

    Article  Google Scholar 

  • Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. Journal of the American Statistical Association, 106, 1361–1370.

    Article  Google Scholar 

  • Schweinberger, M., Krivitsky, P. N., & Butts, C. T. (2017). Foundations of finite-, super-, and infinite-population random graph inference. arXiv:1707.04800v1

  • Strauss, D. (1986). On a general class of models for interaction. SIAM Review, 28, 513–527.

    Article  Google Scholar 

  • The John Jay & ARTIS Transnational Terrorism Database, JJATT. (2009). http://doitapps.jjay.cuny.edu/jjatt/data.php. Accessed 27 July 2016.

  • van Duijn, M. A. J., Gile, K. J., & Handcock, M. S. (2009). A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models. Social Networks, 31(1), 52–62.

    Article  Google Scholar 

  • Wang, P., Pattison, P., & Robins, G. (2013). Exponential random graph model specifications for bipartite networks—A dependence hierarchy. Social Networks, 35(2), 211–222.

    Article  Google Scholar 

  • Wang, P., Robins, G., Pattison, P., & Koskinen, J. (2014). MPNet, Program for the simulation and estimation of (\(p^{\ast }\)) exponential random graph models for Multilevel networks: USER MANUAL. Melbourne School of Psychological Sciences The University of Melbourne Australia.

  • Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Wasserman, S., & Pattison, P. E. (1996). Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p*. Psychometrika, 61, 401–425.

    Article  Google Scholar 

  • Waternaux, C., Laird, N. M., & Ware, J. H. (1989). Methods for analysis of longitudinal data: Blood-lead concentrations and cognitive development. Journal of the American Statistical Association, 84, 33–41.

    Article  Google Scholar 

  • Weiss, R. E., & Lazaro, C. G. (1992). Residual plots for repeated measures. Statistics in Medicine, 11, 115–124.

    Article  Google Scholar 

  • Williams, D. A. (1984). Residuals in generalized linear models. In Proceedings of the XIIth international biometric conference, Tokyo (pp. 59–68).

  • Williams, D. A. (1987). Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics, 36, 181–191.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johan Koskinen.

Additional information

Johan Koskinen would like to acknowledge financial support from the Leverhulme Trust Grant RPG-2013-140 and SRG2012.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Koskinen, J., Wang, P., Robins, G. et al. Outliers and Influential Observations in Exponential Random Graph Models. Psychometrika 83, 809–830 (2018). https://doi.org/10.1007/s11336-018-9635-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-018-9635-8

Keywords

Navigation