ABSTRACT
We study the problem of large-scale social identity linkage across different social media platforms, which is of critical importance to business intelligence by gaining from social data a deeper understanding and more accurate profiling of users. This paper proposes HYDRA, a solution framework which consists of three key steps: (I) modeling heterogeneous behavior by long-term behavior distribution analysis and multi-resolution temporal information matching; (II) constructing structural consistency graph to measure the high-order structure consistency on users' core social structures across different platforms; and (III) learning the mapping function by multi-objective optimization composed of both the supervised learning on pair-wise ID linkage information and the cross-platform structure consistency maximization. Extensive experiments on 10 million users across seven popular social network platforms demonstrate that HYDRA correctly identifies real user linkage across different platforms, and outperforms existing state-of-the-art algorithms by at least 20% under different settings, and 4 times better in most settings.
- T. W. Athan and P. Y. Papalambros. A note on weighted criteria methods for compromise solutions in multi-objective optimization. Engineering Optimization, 27:155--176, 1996.Google ScholarCross Ref
- M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7:2399--2434, 2006. Google ScholarDigital Library
- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning, 3(1):1--122, 2011. Google ScholarDigital Library
- J. Cai and M. Strube. End-to-end coreference resolution via hypergraph partitioning. In COLING'10. Google ScholarDigital Library
- R. Cilibrasi and P. M. B. Vitanyi. Clustering by compression. IEEE Transactions on Information Theory, pages 1523--1545, 2005. Google ScholarDigital Library
- W. Cui, Y. Xiao, H. Wang, Y. Lu, and W. Wang. Online search of overlapping communities. In SIGMOD'13. Google ScholarDigital Library
- O. de Vel, A. Anderson, M. Corney, and G. Mohay. Mining e-mail content for author identification forensics. SIGMOD Record, 30(4):55--64, 2001. Google ScholarDigital Library
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, pages 1--16, 2007. Google ScholarDigital Library
- R. C. Gonzalez and R. E. Woods. Digital image processing. 1992. Google ScholarDigital Library
- A. Hanjalic and L.-Q. Xu. Affective video content representation and modeling. IEEE Transactions on Multimedia, pages 143--154, 2005. Google ScholarDigital Library
- O. Hassanzadeh, K. Q. Pu, S. H. Yeganeh, R. J. Miller, M. Hernandez, L. Popa, and H. Ho. Discovering linkage points over web data. PVLDB, 6(6):444--456, 2013. Google ScholarDigital Library
- http://www.briancbecker.com/bcbcms/site/proj/facerec/fbextract.html.Google Scholar
- T. Iofciu, P. Fankhauser, F. Abel, and K. Bischoff. Identifying users across social tagging systems. In ICWSM'11.Google Scholar
- D. V. Kalashnikov, Z. Chen, S. Mehrotra, and R. Nuray-Turan. Web people search via connection analysis. IEEE Transactions on Knowledge and Data Engineering, pages 1550--1565, 2008. Google ScholarDigital Library
- S. Kumar, R. Zafarani, and H. Liu. Understanding user migration patterns in social media. In AAAI'11.Google Scholar
- J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, and H.-W. Hon. What's in a name?: an unsupervised approach to link users across communities. In WSDM'13. Google ScholarDigital Library
- S. Liu, S. Wang, K. Jeyarajah, A. Misra, and R. Krishnan. TODMIS: Mining communities from trajectories. In ACM CIKM'13. Google ScholarDigital Library
- A. Malhotra, L. C. Totti, W. M. Jr., P. Kumaraguru, and V. Almeida. Studying user footprints in different online social networks. In ASONAM'12. Google ScholarDigital Library
- R. T. Marler and J. S. Arora. Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization, 26(6):369--395, 2004.Google Scholar
- Y. nan Qian, Y. Hu, J. Cui, Q. Zheng, and Z. Nie. Combining machine learning and human judgment in author disambiguation. In CIKM'11. Google ScholarDigital Library
- J. Novak, P. Raghavan, and A. Tomkins. Anti-aliasing on the web. In WWW'04. Google ScholarDigital Library
- A. Nunes, P. Calado, and B. Martins. Resolving user identities over social networks through supervised learning and rich similarity features. In SAC'12. Google ScholarDigital Library
- G. Pickard, W. Pan, I. Rahwan, M. Cebrian, R. Crane, A. Madan, and A. Pentland. Time-critical social mobilization. Science, 334(6055):509--512, 2011.Google ScholarCross Ref
- M. Sadinle and S. E. Fienberg. A generalized fellegi-sunter framework for multiple record linkage with application to homicide record systems. Journal of the American Statistical Association, 105(502):385--397, 2013.Google ScholarCross Ref
- B. Schlkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge: TheMITPress, 2002. Google ScholarDigital Library
- J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD '13. Google ScholarDigital Library
- J. Vosecky, D. Hong, and V. Shen. User identification across multiple social networks. In NDT'09.Google Scholar
- J. Wang, G. Li, J. X. Yu, and J. Feng. Entity matching: How similar is similar. PVLDB, pages 622--633, 2011. Google ScholarDigital Library
- J. Weston, C. Leslie, E. Ie, D. Zhou, A. Elisseeff, and W. Noble. Semi-supervised protein classification using cluster kernels. Bioinformatics, pages 55--64, 2005. Google ScholarDigital Library
- P.-L. Yu, Y.-R. Lee, and A. Stam. Multiple-criteria decision making: concepts, techniques, and extensions. Plenum Press New York, 1985.Google ScholarCross Ref
- R. Zafarani and H. Liu. Connecting corresponding identities across communities. In ICWSM'09.Google Scholar
- R. Zafarani and H. Liu. Connecting users across social media sites: A behavioral-modeling approach. In KDD'13. Google ScholarDigital Library
- R. Zheng, J. Li, H. Chen, and Z. Huang. A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the Association for Information Science and Technology, 57(3), 2006. Google ScholarDigital Library
Index Terms
- HYDRA: large-scale social identity linkage via heterogeneous behavior modeling
Recommendations
Exploiting Spatio-Temporal User Behaviors for User Linkage
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementCross-device and cross-domain user linkage have been attracting a lot of attention recently. An important branch of the study is to achieve user linkage with spatio-temporal data generated by the ubiquitous GPS-enabled devices. The main task in this ...
Building social capital with Facebook: Type of network, availability of other media, and social self-efficacy matter#
Highlights- Type of friends affects building social capital via Facebook and traditional media.
AbstractFindings about Facebook's effect on relationships are mixed, possibly due to lack of models that acknowledge differences across users, types of their friends, and use of competing media. To address this, we proposed and tested how ...
I ź FB: A Q-Methodology Analysis of Why People 'Like' Facebook
Virtually seductive qualities of identity sharing, content gratification, and ample social atmosphere have made Facebook the most popular social network, boasting 890 million daily users "Facebook Reports Fourth Quarter," 2015; Joinson, 2008; Orchard et ...
Comments