skip to main content
10.1145/2588555.2588559acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

HYDRA: large-scale social identity linkage via heterogeneous behavior modeling

Published:18 June 2014Publication History

ABSTRACT

We study the problem of large-scale social identity linkage across different social media platforms, which is of critical importance to business intelligence by gaining from social data a deeper understanding and more accurate profiling of users. This paper proposes HYDRA, a solution framework which consists of three key steps: (I) modeling heterogeneous behavior by long-term behavior distribution analysis and multi-resolution temporal information matching; (II) constructing structural consistency graph to measure the high-order structure consistency on users' core social structures across different platforms; and (III) learning the mapping function by multi-objective optimization composed of both the supervised learning on pair-wise ID linkage information and the cross-platform structure consistency maximization. Extensive experiments on 10 million users across seven popular social network platforms demonstrate that HYDRA correctly identifies real user linkage across different platforms, and outperforms existing state-of-the-art algorithms by at least 20% under different settings, and 4 times better in most settings.

References

  1. T. W. Athan and P. Y. Papalambros. A note on weighted criteria methods for compromise solutions in multi-objective optimization. Engineering Optimization, 27:155--176, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  2. M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7:2399--2434, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends R in Machine Learning, 3(1):1--122, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Cai and M. Strube. End-to-end coreference resolution via hypergraph partitioning. In COLING'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Cilibrasi and P. M. B. Vitanyi. Clustering by compression. IEEE Transactions on Information Theory, pages 1523--1545, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Cui, Y. Xiao, H. Wang, Y. Lu, and W. Wang. Online search of overlapping communities. In SIGMOD'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. O. de Vel, A. Anderson, M. Corney, and G. Mohay. Mining e-mail content for author identification forensics. SIGMOD Record, 30(4):55--64, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, pages 1--16, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. C. Gonzalez and R. E. Woods. Digital image processing. 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Hanjalic and L.-Q. Xu. Affective video content representation and modeling. IEEE Transactions on Multimedia, pages 143--154, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. O. Hassanzadeh, K. Q. Pu, S. H. Yeganeh, R. J. Miller, M. Hernandez, L. Popa, and H. Ho. Discovering linkage points over web data. PVLDB, 6(6):444--456, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. http://www.briancbecker.com/bcbcms/site/proj/facerec/fbextract.html.Google ScholarGoogle Scholar
  13. T. Iofciu, P. Fankhauser, F. Abel, and K. Bischoff. Identifying users across social tagging systems. In ICWSM'11.Google ScholarGoogle Scholar
  14. D. V. Kalashnikov, Z. Chen, S. Mehrotra, and R. Nuray-Turan. Web people search via connection analysis. IEEE Transactions on Knowledge and Data Engineering, pages 1550--1565, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Kumar, R. Zafarani, and H. Liu. Understanding user migration patterns in social media. In AAAI'11.Google ScholarGoogle Scholar
  16. J. Liu, F. Zhang, X. Song, Y.-I. Song, C.-Y. Lin, and H.-W. Hon. What's in a name?: an unsupervised approach to link users across communities. In WSDM'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Liu, S. Wang, K. Jeyarajah, A. Misra, and R. Krishnan. TODMIS: Mining communities from trajectories. In ACM CIKM'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Malhotra, L. C. Totti, W. M. Jr., P. Kumaraguru, and V. Almeida. Studying user footprints in different online social networks. In ASONAM'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. T. Marler and J. S. Arora. Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization, 26(6):369--395, 2004.Google ScholarGoogle Scholar
  20. Y. nan Qian, Y. Hu, J. Cui, Q. Zheng, and Z. Nie. Combining machine learning and human judgment in author disambiguation. In CIKM'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Novak, P. Raghavan, and A. Tomkins. Anti-aliasing on the web. In WWW'04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Nunes, P. Calado, and B. Martins. Resolving user identities over social networks through supervised learning and rich similarity features. In SAC'12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Pickard, W. Pan, I. Rahwan, M. Cebrian, R. Crane, A. Madan, and A. Pentland. Time-critical social mobilization. Science, 334(6055):509--512, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  24. M. Sadinle and S. E. Fienberg. A generalized fellegi-sunter framework for multiple record linkage with application to homicide record systems. Journal of the American Statistical Association, 105(502):385--397, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  25. B. Schlkopf and A. J. Smola. Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge: TheMITPress, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD '13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Vosecky, D. Hong, and V. Shen. User identification across multiple social networks. In NDT'09.Google ScholarGoogle Scholar
  28. J. Wang, G. Li, J. X. Yu, and J. Feng. Entity matching: How similar is similar. PVLDB, pages 622--633, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Weston, C. Leslie, E. Ie, D. Zhou, A. Elisseeff, and W. Noble. Semi-supervised protein classification using cluster kernels. Bioinformatics, pages 55--64, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. P.-L. Yu, Y.-R. Lee, and A. Stam. Multiple-criteria decision making: concepts, techniques, and extensions. Plenum Press New York, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  31. R. Zafarani and H. Liu. Connecting corresponding identities across communities. In ICWSM'09.Google ScholarGoogle Scholar
  32. R. Zafarani and H. Liu. Connecting users across social media sites: A behavioral-modeling approach. In KDD'13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Zheng, J. Li, H. Chen, and Z. Huang. A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the Association for Information Science and Technology, 57(3), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HYDRA: large-scale social identity linkage via heterogeneous behavior modeling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
      June 2014
      1645 pages
      ISBN:9781450323765
      DOI:10.1145/2588555

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 June 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader