ABSTRACT
Graph representation learning has been extensively studied in recent years, in which sampling is a critical point. Prior arts usually focus on sampling positive node pairs, while the strategy for negative sampling is left insufficiently explored. To bridge the gap, we systematically analyze the role of negative sampling from the perspectives of both objective and risk, theoretically demonstrating that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance. To the best of our knowledge, we are the first to derive the theory and quantify that a nice negative sampling distribution is pn(u|v) ∝ pd(u|v)α, 0 < α < 1. With the guidance of the theory, we propose MCNS, approximating the positive distribution with self-contrast approximation and accelerating negative sampling by Metropolis-Hastings. We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and recommendation, on a total of 19 experimental settings. These relatively comprehensive experimental results demonstrate its robustness and superiorities.
Supplemental Material
- Yoshua Bengio and Jean-Sébastien Senécal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Transactions on Neural Networks, Vol. 19, 4 (2008), 713--722.Google ScholarDigital Library
- Avishek Joey Bose, Huan Ling, and Yanshuai Cao. 2018. Adversarial Contrastive Estimation. (2018), 1021--1032.Google Scholar
- Liwei Cai and William Yang Wang. 2018. KBGAN: Adversarial Learning for Knowledge Graph Embeddings. In NAACL-HLT?18. 1470--1480.Google Scholar
- Hugo Caselles-Dupré, Florian Lesaint, and Jimena Royo-Letelier. 2018. Word2vec applied to recommendation: Hyperparameters matter. In RecSys'18. ACM, 352--356.Google Scholar
- Jie Chen, Tengfei Ma, and Cao Xiao. 2018. FastGCN: fast learning with graph convolutional networks via importance sampling. ICLR'18 (2018).Google Scholar
- Siddhartha Chib and Edward Greenberg. 1995. Understanding the metropolis-hastings algorithm. The american statistician, Vol. 49, 4 (1995), 327--335.Google Scholar
- Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 39--46.Google ScholarDigital Library
- Ming Ding, Jie Tang, and Jie Zhang. 2018. Semi-supervised learning on graphs with generative adversarial nets. In CIKM'18. ACM, 913--922.Google ScholarDigital Library
- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of machine learning research, Vol. 9, Aug (2008), 1871--1874.Google ScholarDigital Library
- Hongchang Gao and Heng Huang. 2018. Self-Paced Network Embedding. (2018), 1406--1415.Google Scholar
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD'16. ACM, 855--864.Google ScholarDigital Library
- Michael U Gutmann and Aapo Hyv"arinen. 2012. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research, Vol. 13, Feb (2012), 307--361.Google ScholarDigital Library
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NIPS'17. 1024--1034.Google ScholarDigital Library
- Henry Hsu and Peter A Lachenbruch. 2007. Paired t test. Wiley encyclopedia of clinical trials (2007), 1--3.Google Scholar
- Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for implicit feedback datasets. In ICDM'08. Ieee, 263--272.Google ScholarDigital Library
- Hong Huang, Jie Tang, Sen Wu, Lu Liu, and Xiaoming Fu. 2014. Mining triadic closure patterns in social networks. In WWW'14. 499--504.Google ScholarDigital Library
- Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. ICLR'17 (2017).Google Scholar
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. TKDD'07, Vol. 1, 1 (2007), 2--es.Google ScholarDigital Library
- Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. In NIPS'14. 2177--2185.Google Scholar
- Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper insights into graph convolutional networks for semi-supervised learning. In AAAI'18.Google ScholarCross Ref
- Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, Vol. 7, 1 (2003), 76--80.Google Scholar
- Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. 2015. Image-based recommendations on styles and substitutes. In SIGIR'15. ACM, 43--52.Google ScholarDigital Library
- Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. 1953. Equation of state calculations by fast computing machines. The journal of chemical physics, Vol. 21, 6 (1953), 1087--1092.Google ScholarCross Ref
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS'13. 3111--3119.Google ScholarDigital Library
- Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS'13. 2265--2273.Google Scholar
- Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. 2008. One-class collaborative filtering. In ICDM'08. IEEE, 502--511.Google ScholarDigital Library
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD'14. ACM, 701--710.Google ScholarDigital Library
- Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM'18. ACM, 459--467.Google ScholarDigital Library
- Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In UAI'09. AUAI Press, 452--461.Google ScholarDigital Library
- Kazunari Sugiyama and Min-Yen Kan. 2010. Scholarly paper recommendation via user's recent research interests. In JCDL'10. ACM, 29--38.Google ScholarDigital Library
- Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. Rotate: Knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197 (2019).Google Scholar
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW'15. 1067--1077.Google ScholarDigital Library
- Cunchao Tu, Han Liu, Zhiyuan Liu, and Maosong Sun. 2017. Cane: Context-aware network embedding for relation modeling. In ACL'17. 1722--1731.Google ScholarCross Ref
- Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. ICLR'18 (2018).Google Scholar
- Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, and Dell Zhang. 2017b. Irgan: A minimax game for unifying generative and discriminative information retrieval models. In SIGIR'17. ACM, 515--524.Google ScholarDigital Library
- Qinyong Wang, Hongzhi Yin, Zhiting Hu, Defu Lian, Hao Wang, and Zi Huang. 2018. Neural memory streaming recommender networks with adversarial training. In KDD'18. ACM, 2467--2475.Google ScholarDigital Library
- Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. 2017a. Community preserving network embedding. In AAAI'17.Google ScholarCross Ref
- Jason Weston, Samy Bengio, and Nicolas Usunier. 2011. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI'11 .Google Scholar
- Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).Google Scholar
- Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In KDD'18. ACM, 974--983.Google ScholarDigital Library
- Weinan Zhang, Tianqi Chen, Jun Wang, and Yong Yu. 2013. Optimizing Top-N Collaborative Filtering via Dynamic Negative Item Sampling. In SIGIR'13. ACM, 785--788.Google ScholarDigital Library
- Yongqi Zhang, Quanming Yao, Yingxia Shao, and Lei Chen. 2019. NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding. (2019), 614--625.Google Scholar
- Zheng Zhang and Pierre Zweigenbaum. 2018. GNEG: Graph-Based Negative Sampling for word2vec. In ACL'18. 566--571.Google ScholarCross Ref
- Tong Zhao, Julian McAuley, and Irwin King. 2015. Improving latent factor models via personalized feature projection for one class recommendation. In CIKM'15. ACM, 821--830.Google ScholarDigital Library
- Chang Zhou, Yuqiong Liu, Xiaofei Liu, Zhongyi Liu, and Jun Gao. 2017. Scalable graph embedding for asymmetric proximity. In AAAI'17.Google ScholarCross Ref
- Chang Zhou, Jianxin Ma, Jianwei Zhang, Jingren Zhou, and Hongxia Yang. 2020. Contrastive Learning for Debiased Candidate Generation in Large-Scale Recommender Systems. arxiv: cs.IR/2005.12964Google Scholar
Index Terms
- Understanding Negative Sampling in Graph Representation Learning
Recommendations
False Negative Sample Aware Negative Sampling for Recommendation
Advances in Knowledge Discovery and Data MiningAbstractNegative sampling plays a key role in implicit feedback collaborative filtering. It draws high-quality negative samples from a large number of uninteracted samples. Existing methods primarily focus on hard negative samples, while overlooking the ...
Revisiting Negative Sampling vs. Non-sampling in Implicit Recommendation
Recommendation systems play an important role in alleviating the information overload issue. Generally, a recommendation model is trained to discern between positive (liked) and negative (disliked) instances for each user. However, under the open-world ...
Leveraging network structure for efficient dynamic negative sampling in network embedding
AbstractUnsupervised network embedding learns low-dimensional vector representations of nodes based on the network structure. However, typical graphs only contain positive edges. Hence, most network embedding models take sampled negative edges ...
Comments