research-article

LINE: Large-scale Information Network Embedding

Authors:
Jian Tang

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Meng Qu

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Mingzhe Wang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Ming Zhang

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Jun Yan

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Qiaozhu Mei

University of Michigan, Ann Arbor, MI, USA

University of Michigan, Ann Arbor, MI, USA
View Profile

WWW '15: Proceedings of the 24th International Conference on World Wide WebMay 2015Pages 1067–1077https://doi.org/10.1145/2736277.2741093

Published:18 May 2015Publication History

WWW '15: Proceedings of the 24th International Conference on World Wide Web

Pages 1067–1077

ABSTRACT

This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online\footnote{\url{https://github.com/tangjianpku/LINE}}.

References

A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In Proceedings of the 22nd international conference on World Wide Web, pages 37--48. International World Wide Web Conferences Steering Committee, 2013. Google ScholarDigital Library
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, volume 14, pages 585--591, 2001.Google ScholarDigital Library
S. Bhagat, G. Cormode, and S. Muthukrishnan. Node classification in social networks. In Social Network Data Analytics, pages 115--148. Springer, 2011.Google ScholarCross Ref
T. F. Cox and M. A. Cox. Multidimensional scaling. CRC Press, 2000.Google ScholarCross Ref
J. R. Firth. A synopsis of linguistic theory, 1930--1955. In J. R. Firth (Ed.), Studies in linguistic analysis, pages 1--32.Google Scholar
M. S. Granovetter. The strength of weak ties. American journal of sociology, pages 1360--1380, 1973.Google Scholar
Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of The 31st International Conference on Machine Learning, pages 1188--1196, 2014.Google ScholarDigital Library
O. Levy and Y. Goldberg. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems, pages 2177--2185, 2014.Google ScholarDigital Library
A. Q. Li, A. Ahmed, S. Ravi, and A. J. Smola. Reducing the sampling complexity of topic models. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 891--900. ACM, 2014. Google ScholarDigital Library
D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019--1031, 2007. Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarDigital Library
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013.Google ScholarDigital Library
S. A. Myers, A. Sharma, P. Gupta, and J. Lin. Information network or social network?: the structure of the twitter follow graph. In Proceedings of the companion publication of the 23rd international conference on World wide web companion, pages 493--498. International World Wide Web Conferences Steering Committee, 2014. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. 1999.Google Scholar
B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701--710. ACM, 2014. Google ScholarDigital Library
B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 693--701, 2011.Google ScholarDigital Library
S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.Google ScholarCross Ref
J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 990--998. ACM, 2008. Google ScholarDigital Library
J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.Google ScholarCross Ref
L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579-2605):85, 2008.Google Scholar
S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: a general framework for dimensionality reduction. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(1):40--51, 2007. Google ScholarDigital Library
X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 283--292. ACM, 2014. Google ScholarDigital Library

Index Terms

LINE: Large-scale Information Network Embedding
1. Computing methodologies
  1. Machine learning

Recommendations

Network embedding with class discriminability
ACM TURC '19: Proceedings of the ACM Turing Celebration Conference - China

Network embedding, which learns low-dimensional representations from networks for network information preservation, has gained considerable attention in recent years. Network embedding has been shown to outperform many traditional node representation ...
Read More
Unifying community detection and network embedding in attributed networks
Abstract
Traditionally, community detection and network embedding are two separate tasks. Network embedding aims to output a vector representation for each node in the network, and community detection aims to find all densely connected groups of nodes and ...
Read More
Research on image steganography analysis based on deep learning
Abstract
Although steganalysis has developed rapidly in recent years, it still faces many difficulties and challenges. Based on the theory of in-depth learning method and image-based general steganalysis, this paper makes a deep study of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '15: Proceedings of the 24th International Conference on World Wide Web
May 2015
1460 pages
ISBN:9781450334693
General Chairs:
Aldo Gangemi
National Research Council, Italy & Paris 13 University-CNRS, France
,
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy
Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 18 May 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dimension reduction
feature learning
information network embedding
scalability
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '15 Paper Acceptance Rate131of929submissions,14%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3,517
  Total Citations
  View Citations
- 10,488
  Total Downloads
- Downloads (Last 12 months)875
- Downloads (Last 6 weeks)124
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LINE: Large-scale Information Network Embedding

WWW '15: Proceedings of the 24th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Network embedding with class discriminability

Unifying community detection and network embedding in attributed networks

Research on image steganography analysis based on deep learning