ABSTRACT
This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online\footnote{\url{https://github.com/tangjianpku/LINE}}.
- A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In Proceedings of the 22nd international conference on World Wide Web, pages 37--48. International World Wide Web Conferences Steering Committee, 2013. Google ScholarDigital Library
- M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, volume 14, pages 585--591, 2001.Google ScholarDigital Library
- S. Bhagat, G. Cormode, and S. Muthukrishnan. Node classification in social networks. In Social Network Data Analytics, pages 115--148. Springer, 2011.Google ScholarCross Ref
- T. F. Cox and M. A. Cox. Multidimensional scaling. CRC Press, 2000.Google ScholarCross Ref
- J. R. Firth. A synopsis of linguistic theory, 1930--1955. In J. R. Firth (Ed.), Studies in linguistic analysis, pages 1--32.Google Scholar
- M. S. Granovetter. The strength of weak ties. American journal of sociology, pages 1360--1380, 1973.Google Scholar
- Q. Le and T. Mikolov. Distributed representations of sentences and documents. In Proceedings of The 31st International Conference on Machine Learning, pages 1188--1196, 2014.Google ScholarDigital Library
- O. Levy and Y. Goldberg. Neural word embedding as implicit matrix factorization. In Advances in Neural Information Processing Systems, pages 2177--2185, 2014.Google ScholarDigital Library
- A. Q. Li, A. Ahmed, S. Ravi, and A. J. Smola. Reducing the sampling complexity of topic models. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 891--900. ACM, 2014. Google ScholarDigital Library
- D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7):1019--1031, 2007. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schutze. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarDigital Library
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119, 2013.Google ScholarDigital Library
- S. A. Myers, A. Sharma, P. Gupta, and J. Lin. Information network or social network?: the structure of the twitter follow graph. In Proceedings of the companion publication of the 23rd international conference on World wide web companion, pages 493--498. International World Wide Web Conferences Steering Committee, 2014. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. 1999.Google Scholar
- B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701--710. ACM, 2014. Google ScholarDigital Library
- B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 693--701, 2011.Google ScholarDigital Library
- S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.Google ScholarCross Ref
- J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 990--998. ACM, 2008. Google ScholarDigital Library
- J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.Google ScholarCross Ref
- L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579-2605):85, 2008.Google Scholar
- S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: a general framework for dimensionality reduction. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(1):40--51, 2007. Google ScholarDigital Library
- X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM international conference on Web search and data mining, pages 283--292. ACM, 2014. Google ScholarDigital Library
Index Terms
- LINE: Large-scale Information Network Embedding
Recommendations
Network embedding with class discriminability
ACM TURC '19: Proceedings of the ACM Turing Celebration Conference - ChinaNetwork embedding, which learns low-dimensional representations from networks for network information preservation, has gained considerable attention in recent years. Network embedding has been shown to outperform many traditional node representation ...
Unifying community detection and network embedding in attributed networks
AbstractTraditionally, community detection and network embedding are two separate tasks. Network embedding aims to output a vector representation for each node in the network, and community detection aims to find all densely connected groups of nodes and ...
Research on image steganography analysis based on deep learning
AbstractAlthough steganalysis has developed rapidly in recent years, it still faces many difficulties and challenges. Based on the theory of in-depth learning method and image-based general steganalysis, this paper makes a deep study of the ...
Comments