Skip to main content
Log in

A synthetic data generator for online social network graphs

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Two of the difficulties for data analysts of online social networks are (1) the public availability of data and (2) respecting the privacy of the users. One possible solution to both of these problems is to use synthetically generated data. However, this presents a series of challenges related to generating a realistic dataset in terms of topologies, attribute values, communities, data distributions, correlations and so on. In the following work, we present and validate an approach for populating a graph topology with synthetic data which approximates an online social network. The empirical tests confirm that our approach generates a dataset which is both diverse and with a good fit to the target requirements, with a realistic modeling of noise and fitting to communities. A good match is obtained between the generated data and the target profiles and distributions, which is competitive with other state of the art methods. The data generator is also highly configurable, with a sophisticated control parameter set for different “similarity/diversity” levels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Ali AM (2014) Synthetic generators for simulating social networks, 2014. Masters thesis, Univ. Florida

  • Ali AM, Alvari H, Hajibagheri A, Lakkaraj K, Sukthankar G (2014) Synthetic generators for cloning social network data. In: Proceedings of SocInfo 2014

  • Barrett CL, Beckman RJ, Khan M, Kumar VSA, Marathe MV, Stretz PE, Dutta T, Lewis B (2009) Generation and Analysis of Large Synthetic Social Contact Networks. In: Proceedings of the 2009 Winter Simulation Conference, 13–16 Dec 2009, pp 1003–1014

  • Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. Int AAAI Conf Weblogs Soc Media ICWSM 8(2009):361–362

    Google Scholar 

  • Block P, Grund T (2014) Multidimensional homophily in friendship networks. Netw Sci (Camb Univ Press) 2(2):189–212

    Google Scholar 

  • Blondel VD, Guillaume JL, Lambiotte R, Lefebure E (2008) Fast unfolding of communities in large networks. J Stat Mech P10008

  • Boncz P, Perez M, Gavalda R., Angles R, Erling O, Gubichev A, Spasić M, Pham MD, Martínez N (2014) Benchmark Design for Navigational Pattern Matching Benchmarking. LDBC Cooperative Project FP7 – 317548. Coordinators: Arnau Prat, Alex Averbuch. Issue 3 28/09/2014

  • Cha M, Haddadi H, Benevenuto F, Gummadi KP (2010) Measuring User Influence in Twitter: The Million Follower Fallacy. In: Proceedings of 4th Int. AAAI Conf. on Weblogs and Social Media (ICWSM), vol 10, pp 10–17

  • Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: A recursive model for graph mining. In: Proc. SIAM Data Mining Conference, 2004. SIAM, Philadelphia, PA

  • Currarini S, Redondoy FV. A Simple Model of Homophily in Social Networks (2013) University Ca’ Foscari of Venice, Dept. of Economics Research Paper Series No. 24, 2013

  • Dean J, Sanjay G (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  • Dehghani M, Johnson K, Hoover J, Sagi E, Garten J, Parmar NJ, Vaisey S, Iliev R, Graham J (2016) Purity homophily in social networks. J Exp Psychol Gen 145(3):366–375

    Article  Google Scholar 

  • Dunbar RIM (1993) Coevolution of neocortical size, group size and language in humans. Behav Brain Sci 16(4):681–735

    Article  Google Scholar 

  • EU’s Data Protection Directive (2015) Justice, Protection of personal data. http://ec.europa.eu/justice/data-protection/

  • Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826

    Article  MathSciNet  MATH  Google Scholar 

  • Hagberg A, Schult D, Swart, P, Conway D, Séguin-Charbonneau L, Ellison C, Edwards B, Torrents J (2004) Networkx. High productivity software for complex networks. Webová strá nka http://networkx.lanl.gov/wiki

  • Hajibagheri A, Hamzeh A, Sukthankar G (2013). Modeling information diffusion and community membership using stochastic optimization. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on (pp 175–182). IEEE. describes our community detection algorithm, GPSODM

  • Hajibagheri A, Lakkaraju K, Sukthankar G, Wigand RT, Agarwal N (2015) Conflict and Communication in Massively-Multiplayer Online Games, Social Computing, Behavioral-Cultural Modeling, and Prediction, Vol. 9021, Lecture Notes in Computer Science, pp 65–74, 17 March 2015

  • Jones R, Kumar R, Pang B, Tomkins A (2007) I know what you did last summer: Query logs and user privacy, Sixteenth ACM Conf. on Information and Knowledge Management, ser. CIKM. 2007, pp 909–914

  • Kelly, H. (2012) “83 million Facebook accounts are fakes and dupes”. CNN, August 3, 2012. http://edition.cnn.com/2012/08/02/tech/social-media/facebook-fake-accounts/

  • Kim M, Leskovec J (2011) Modeling Social Networks with Node Attributes using the Multiplicative Attribute Graph Model. In: Proc. UAI 2011, 27th Conf. on Uncertainty in Artificial Intelligence, Barcelona, Spain, July 14–17, 2011

  • Korsgaard M, Picot A, Wigand R, Welpe I, Assmann J (2010) Cooperation, coordination, and trust in virtual teams: Insights from virtual games. In: Online Worlds: Convergence of the Real and the Virtual

  • Kossinets G, Watts D (2006) Empirical analysis of an evolving social network. Science 311(5757):88–90

    Article  MathSciNet  MATH  Google Scholar 

  • Kossinets G, Watts D (2009) Origins of homophily in an evolving social network. Am J Sociol 115(2):405–450

    Article  Google Scholar 

  • Lakkaraju K, Whetzel J (2013) Group roles in massively multiplayer online games. In: Proceedings of the Workshop on Collaborative Online Organizations at the 14th International Conference on Autonomous Agents and Multiagent Systems

  • Lee J, Lakkaraju K (2014) Predicting guild membership in massively multiplayer online games. In: Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction, Washington, D.C., April 2014

  • Leskovec J (2008) Dynamics of Large Networks. PhD Thesis, School of Computer Science, Carnegie-Mellon Univ

  • Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proc. KDD ‘05, 11th ACM SIGKDD Int. Conf. of Knowledge Discovery and Data Mining, 2005, pp 177–187

  • McAfee, A., Brynjolfsson, E. (2012) Big Data: The Management Revolution, Harvard Business Review, October 2012 Issue

  • McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444

    Article  Google Scholar 

  • Mislove A, Marcon M, Gummad, KP, Druschel P, Bhattacharjee B (2007) Measurement and Analysis of Online Social Networks. In: Proceedings of IMC ‘07, 7th ACM SIGCOMM Conference on Internet Measurement, pp 29–42

  • Minitab 17 Statistical Software (2010). [Computer software]. State College, PA: Minitab, Inc. (www.minitab.com)

  • Nettleton DF (2013) Data mining of social networks represented as graphs. Comput Sci Rev 7:1–34

    Article  MathSciNet  MATH  Google Scholar 

  • Nettleton, DF (2015) Generating synthetic online social network graph data and topologies, 3rd Workshop on Graph-based Technologies and Applications (Graph-TA), UPC, Barcelona, Spain, March 18th 2015

  • Nettleton DF, Salas J (2016) A data driven anonymization system for information rich online social network graphs. Expert Syst Appl 55:87–105

    Article  Google Scholar 

  • Newman MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69:066133

    Article  Google Scholar 

  • Ovelgonne M (2013) Distributed community detection in web-scale networks. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on, pp 66–73

  • Pérez-Rosés H, Sebé F (2015) Synthetic generation of social network data with endorsements. J Simul 9(4):279–286

    Article  Google Scholar 

  • Pérez-Rosés H, Sebé F, Ribó JM (2016) Endorsement Deduction and Ranking in Social Networks, Computer Communications, Vol. 73, Part B, 1 January 2016, Pages 200–210, Elsevier

  • Pham MD, Boncz P, Erling O (2012) S3G2: a Scalable Structure-correlated Social Graph Generator. In: Proc. 4th TPC Technology Conference, TPCTC 2012, Istanbul, Turkey, August 27, 2012, Lecture Notes in Computer Science, vol. 7755, pp 156–172

  • Plimpton SJ, Devine KD (2011) MapReduce in MPI for large-scale graph algorithms. Parallel Comput 37(9):610–632

    Article  Google Scholar 

  • Que X, Checconi F, Petrini F, Wang T, Yu W (2013) Lightning-fast Community Detection in Social Media: A Scalable Implementation of the Louvain Algorithm. Technical Report AU-CSSE-PASL/13-TR01 (Auburn University, IBM TJ Watson)

  • Ramakrishnan N, Keller B, Mirza BJ. (2001). A. Grama, and G. Karypis, “Privacy risks in recommender systems,” IEEE Internet Computing, vol. 5, no. 6, pp. 54–62, 2001

  • Robins G, Pattison P, Woolcock J (2005) Small and other worlds: global network structures from local processes. Am J Sociol (AJS) 110(4):894–936

    Article  Google Scholar 

  • Sala A, Cao L, Wilson C, Zablit R, Zheng H, Zhao BY (2010) Measurement-calibrated Graph Models for Social Network Experiments, WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA

  • Schult DA, Swart P (2008) Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conferences (SciPy 2008). Vol. 2008. 2008

  • Tang L, Liu H, Zhang J, Nazeri N (2008). Community evolution in dynamic multi-mode networks. In: Proc. of the 14th ACM SIGKDD, KDD’08, New York, NY, USA, 2008, pp 677–685

  • Tarbush B, Teytelboym A (2012) Homophily in Online Social Networks, Internet and Network Economics, Volume 7695 of the series Lecture Notes in Computer Science pp 512-518 (2012). In: Proc. Internet and Network Economics: 8th International Workshop, WINE 2012, Liverpool, UK, December 10–12, 2012. Springer Berlin Heidelberg

  • Verbrugge LM (1983) A research note on adult friendship contact: a dyadic perspective. Soc Forces 62(1):78–83

    Article  Google Scholar 

  • Viswanath, B, Mislove A, Cha M, Gummadi, KP. (2009). On the Evolution of User Interaction in Facebook. In: Proceedings of 2nd ACM workshop on Online Social Networks, WOSN’09, Barcelona, Spain, 2009, pp 37–42

  • Wang X, Sukthankar G (2013) Link prediction in multirelational collaboration networks. In: Proceedings of the IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining, pp 1445–1447, Canada, Aug 2013

  • Wang X, Maghami M, Sukthankar G (2011) Leveraging network properties for trust evaluation in multi-agent systems. In: Proc. IEEE/WIC/ACM Int. Conf. on Web Intelligence and Intelligent Agent Technology, pp 288–295

  • Wattenhofer M, Wattenhofer R, Zhu Z (2012) The YouTube Social Network. In: Proc. 6th Int. AAAI Conf. on Weblogs and Social Media, Dublin, Ireland, 4–7 June, 2012, pp 354–361

  • Weil, J. (2015) “Mark Zuckerberg: Creator of Facebook”, Abdo Publishing, Minneapolis, USA. Ed. Arnold Ringstad, ISBN 978-1-62403-647-7 (2015)

  • Wigand R, Agrawal N, Osesina O, Hering W, Korsgaard M, Picot A, Drescher M (2012) Social network indices as performance predictors in a virtual organization. In: proceedings of the 4th international conference on Computational Aspects of Social Networks (CASoN) pp 144–149

  • Xie J, Szymanski BK (2013). Labelrank: A stabilized label propagation algorithm for community detection in networks. In: Network Science Workshop (NSW), 2013 IEEE 2nd (pp 138–143)

  • Xie J, Chen M, Szymanski BK (2013). LabelrankT: Incremental community detection in dynamic networks via label propagation. In: ACM Proceedings of the Workshop on Dynamic Networks Management and Mining (pp 25–32)

  • Yang J, Leskovec J (2012) Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012

  • Zhao W, Ma H, He Q (2009) Parallel K-Means Clustering Based on MapReduce. In: Proc. CloudCom 2009, LNCS 5931, pp 674–679, 2009

Download references

Acknowledgments

This work is partially funded by the Spanish MEC (project TIN2013-49814-EXP). The author is grateful for the suggestions of Prof. Vladimir Estivill-Castro of the Pompeu Fabra University, Barcelona, Spain, and of Dr. Julián Salas of the University Rovira i Virgili, Tarragona, Spain.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David F. Nettleton.

Appendix: Pseudo-code of synthetic data generator

Appendix: Pseudo-code of synthetic data generator

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nettleton, D.F. A synthetic data generator for online social network graphs. Soc. Netw. Anal. Min. 6, 44 (2016). https://doi.org/10.1007/s13278-016-0352-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-016-0352-y

Keywords

Navigation