Skip to main content
Log in

Exploiting behaviors of communities of twitter users for link prediction

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Currently, online social networks and social media have become increasingly popular showing an exponential growth. This fact have attracted increasing research interest and, in turn, facilitating the emergence of new interdisciplinary research directions, such as social network analysis. In this scenario, link prediction is one of the most important tasks since it deals with the problem of the existence of a future relation among members in a social network. Previous techniques for link prediction were based on structural (or topological) information. Nevertheless, structural information is not enough to achieve a good performance in the link prediction task on large-scale social networks. Thus, the use of additional information, such as interests or behaviors that nodes have into their communities, may improve the link prediction performance. In this paper, we analyze the viability of using a set of simple and non-expensive techniques that combine structural with community information for predicting the existence of future links in a large-scale online social network, such as Twitter. Twitter, a microblogging service, has emerged as a useful source of informative data shared by millions of users whose relationships require no reciprocation. Twitter network was chosen because it is not well understood, mainly due to the occurrence of directed and asymmetric links yet. Experiments show that our proposals can be used efficiently to improve unsupervised and supervised link prediction task in a directed and asymmetric large-scale network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Ratio of total links per user is the ratio between the total number of links, |E|, and the total number of nodes, |V|. This ratio indicates the average of the size of the neighborhood for each node.

References

  • Almeida LJ, de Andrade Lopes A (2009) An ultra-fast modularity-based graph clustering algorithm, Aveiro, Portugal 1–9

  • Barber MJ, Clark JW (2009) Detecting network communities by propagating labels under constraints. Phys Rev E Stat Phys 80(2): 026129

    Google Scholar 

  • Benchettara N, Kanawati R, Rouveirol C (2010) A supervised machine learning link prediction approach for academic collaboration recommendation. In: Proceedings of RecSys Vol 10, pp 253–256

  • Bhat AU (2010) Twitter community detection. Community detection for Twitter follower network. Available: https://github.com/AKSHAYUBHAT/TwitterCommunityDe-tection

  • Boutet A, Kim H, Yoneki E (2013) Whats in Twitter, i know what parties are popular and who you are supporting now!. Soc Netw Anal Min

  • Calderon-Niquin MA, Valverde-Rebaza J (2012) Multiple kernel learning based on local and nonlinear combinations. In: Informatica (CLEI), XXXVIII Conferencia Latinoamericana, pp 1 –7

  • Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70. doi:10.1103/PhysRevE.70.06611110.1103/PhysRevE.70.066111

  • Constine J (2012) How big Is Facebook’s data? 2.5 billion pieces of content and 500+ terabytes ingested every day. Techcrunch. Available:http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/

  • Davis D, Lichtenwalter R, Chawla R (2013) Supervised methods for multi-relational link prediction. Soc Netw Anal Min 3: 127–141

    Article  Google Scholar 

  • Esslimani I, Brun A, Boyer A (2011) Densifying a behavioral recommender system by social networks link prediction methods. Soc Netw Anal Min 1:159–172

    Article  Google Scholar 

  • Fatourechi M, Ward R, Mason S, Huggins J, Schlogl A, Birch G (2008) Comparison of evaluation metrics in classification applications with imbalanced datasets. In: Machine learning and applications. ICMLA ’08. Seventh International Conference on, pp 777–782

  • Feng X, Zhao J, Xu K (2012) Link prediction in complex networks: a clustering perspective. Eur Phys J B 85(1): 3

    Article  Google Scholar 

  • Fire M, Tenenboim L, Lesser O, Puzis R, Rokach L, Elovici Y (2011) Link prediction in social networks using computationally efficient topological features. In: Privacy, security, risk and trust, 2011 IEEE Third International Conference on and 2011 IEEE Third International Conference on Social Computing (SOCIALCOM), pp 73 –80

  • Fortunato S (2010) Community detection in graphs. CoRR abs/0906.0612v2

  • Golder SA, Yardi S (2010) Structural predictors of tie formation in twitter: transitivity and mutuality. In: Proceedings of SOCIALCOM ’10, pp 88–95

  • Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1): 29–36

    Google Scholar 

  • Hasan MA, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: Proceedings of SDM 06 workshop on link analysis, counterterrorism and security

  • Haykin S (1998) Neural networks: a comprehensive foundation, 2nd ed. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  • Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22(1):5–53

    Article  Google Scholar 

  • Hopcroft J, Lou T, Tang J (2011) Who will follow you back?: reciprocal relationship prediction. In: Proceedings of CIKM ’11, pp 1137–1146

  • Hoseini E, SHashemi E, Hamzeh A (2012) Link prediction in social network using co-clustering based approach. In: Proceedings of the 2012 26th international conference on advanced information networking and applications workshops, ser. WAINA ’12. IEEE Computer Society, pp 795–800

  • Itakura KY, Clarke CLA, Geva S, Trotman A, Huang WC (2011) Topical and structural linkage in wikipedia. In: Proceedings of ECIR’11, pp 460–465

  • Kotera M, Yamanishi Y, Moriya Y, Kanehisa M, Goto S (2012) Genies: gene network inference engine based on supervised analysis. Nucleic Acids Res 40: 162–167

    Article  Google Scholar 

  • Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media? In: Proceedings of WWW ’10, pp 591–600

  • Leung I, Hui P, Lio P, Crowcroft J (2009) Towards real-time community detection in large networks. Phys Rev E 79(6): 066107

    Article  Google Scholar 

  • Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. JASIST 58(7): 1019–1031

    Article  Google Scholar 

  • Lichtenwalter RN, Lussier JT, Chawla NV (2010) New perspectives and methods in link prediction. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD’10. ACM, New York, pp 243–252

  • Liu Z, Zhang Q-M, Lü L, Zhou T (2011) Link prediction in complex networks: a local naïve bayes model. Europhys Lett 96(48007)

  • Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A Stat Mech Appl 390(6): 1150–1170

    Article  Google Scholar 

  • Lunden I (2012) Analyst: Twitter passed 500M users in June 2012, 140M of them in US; Jakarta ’biggest tweeting’ city. Techcrunch. Available: http://techcrunch.com/2012/07/30/analyst-twitter-passed-500m-users-in-june-2012-140m-of-them-in-us-jakarta-biggest-tweeting-city/

  • Newman MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69(6): 066133

    Article  Google Scholar 

  • Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2): 026113

    Article  Google Scholar 

  • Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):191–218

    Article  MathSciNet  MATH  Google Scholar 

  • Perez-Cervantes E, Mena-Chalco JP, de Oliveira MCF, Cesar-Jr RM (2013) Using link prediction to estimate the collaborative influence of researchers. In: IEEE 9th International Conference on e-Science 2013, Beijing, pp 1–8

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco

    Google Scholar 

  • Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76: 036106

    Article  Google Scholar 

  • Romero DM, Kleinberg JM (2010) The directed closure process in hybrid social-information networks, with an analysis of link formation on twitter. In: ICWSM

  • Soundarajan S, Hopcroft J (2012) Using community information to improve the precision of link prediction methods. In: Proceedings of the 21st international conference companion on World Wide Web, ser. Proceedings of WWW ’12 Companion, pp 607–608

  • Tang J, Sun J, Wang C, Yang Z (2009) Social influence analysis in large-scale networks. In: Proceedings of KDD ’09, pp 807–816

  • Valverde-Rebaza J, de Andrade Lopes A (2012) Link prediction in complex networks based on cluster information. In: Advances in artificial intelligence, SBIA 2012, 21th Brazilian symposium on artificial intelligence, ser, Vol 7589. Lecture Notes in Computer Science, Springer 92–101

  • Valverde-Rebaza J, de Andrade Lopes A (2012) Structural link prediction using community information on twitter. In: Computational aspects of social networks (CASoN), 2012 Fourth International Conference on, Nov 2012, pp 132–137

  • Vapnik VN (1995) The nature of statistical learning theory. Springer-Verlag New York, Inc., New York

    Book  MATH  Google Scholar 

  • Wei D, Deng X, Zhang X, Deng Y, Mahadevan S (2013) Identifying influential nodes in weighted networks based on evidence theory. Phys A Stat Mech Appl 392(10): 2564–2575

    Article  Google Scholar 

  • Weka 3: Data mining software in java (2013) The University of Waikato (2013). Available: http://www.cs.waikato.ac.nz/ml/weka/

  • Yin D, Hong L, Davison BD (2011) Structural link analysis and prediction in microblogs. In: Proceedings of CIKM ’11, pp 1163–1168

  • Zhang Q-M, Lü L, Wang W-Q, Zhu Y-X, Zhou T (2012) Potential theory for directed networks. CoRR abs/1202.2709

  • Zheleva E, Getoor L, Golbeck J, Kuter U (2008) Using friendship ties and family circles for link prediction. In: Proceedings of the 2nd international conference on advances in social network mining and analysis, ser. SNAKDD’08, pp 97–113

  • Zhou T, Lü L, Zhang Y-C (2009) Predicting missing links via local information. Eur Phys J B 71(4): 623–630

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work is partially supported by Grants 2011/22749-8 from São Paulo Research Foundation (FAPESP) and 151836/2013-2 from National Council for Scientific and Technological Development (CNPq).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge Valverde-Rebaza.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Valverde-Rebaza, J., de Andrade Lopes, A. Exploiting behaviors of communities of twitter users for link prediction. Soc. Netw. Anal. Min. 3, 1063–1074 (2013). https://doi.org/10.1007/s13278-013-0142-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-013-0142-8

Keywords

Navigation