skip to main content
research-article
Public Access

Prioritized Relationship Analysis in Heterogeneous Information Networks

Published:23 January 2018Publication History
Skip Abstract Section

Abstract

An increasing number of applications are modeled and analyzed in network form, where nodes represent entities of interest and edges represent interactions or relationships between entities. Commonly, such relationship analysis tools assume homogeneity in both node type and edge type. Recent research has sought to redress the assumption of homogeneity and focused on mining heterogeneous information networks (HINs) where both nodes and edges can be of different types. Building on such efforts, in this work, we articulate a novel approach for mining relationships across entities in such networks while accounting for user preference over relationship type and interestingness metric. We formalize the problem as a top-k lightest paths problem, contextualized in a real-world communication network, and seek to find the k most interesting path instances matching the preferred relationship type. Our solution, PROphetic HEuristic Algorithm for Path Searching (PRO-HEAPS), leverages a combination of novel graph preprocessing techniques, well-designed heuristics and the venerable A* search algorithm. We run our algorithm on real-world large-scale graphs and show that our algorithm significantly outperforms a wide variety of baseline approaches with speedups as large as 100X.

To widen the range of applications, we also extend PRO-HEAPS to (i) support relationship analysis between two groups of entities and (ii) allow pattern path in the query to contain logical statements with operators AND, OR, NOT, and wild-card “.”. We run experiments using this generalized version of PRO-HEAPS and demonstrate that the advantage of PRO-HEAPS becomes even more pronounced for these general cases. Furthermore, we conduct a comprehensive analysis to study how the performance of PRO-HEAPS varies with respect to various attributes of the input HIN. We finally conduct a case study to demonstrate valuable applications of our algorithm.

References

  1. Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2013. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 349--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Boanerges Aleman-Meza, Christian Halaschek-Wiener, Satya Sanket Sahoo, Amit Sheth, and I. Budak Arpinar. 2005. Template based semantic similarity for security applications. In Proceedings of the Intelligence and Security Informatics. Springer, 621--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Noga Alon, Raphael Yuster, and Uri Zwick. 1995. Color-coding. Journal of the ACM 42, 4 (1995), 844--856. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yiyuan Bai, Chaokun Wang, Xiang Ying, Meng Wang, and Yunqing Gong. 2014. Path pattern query processing on large graphs. In Proceedings of the IEEE 4th International Conference on Big Data and Cloud Computing (BdCloud). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and Shashank Sudarshan. 2002. Keyword searching and browsing in databases using BANKS. In ICDE. IEEE.Google ScholarGoogle Scholar
  6. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Thayne Coffman, Seth Greenblatt, and Sherry Marcus. 2004. Graph-based technologies for intelligence analysis. Communications of the ACM 47, 3 (2004), 45--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Diane J. Cook and Lawrence B. Holder. 1994. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research 1 (1994), 231--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Atish Das Sarma, Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy. 2010. A sketch-based distance oracle for web-scale graphs. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 401--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Christos Faloutsos, Kevin S. McCurley, and Andrew Tomkins. 2004. Fast discovery of connection subgraphs. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 118--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lujun Fang, Anish Das Sarma, Cong Yu, and Philip Bohannon. 2011. Rex: Explaining relationships between entity pairs. Proceedings of the VLDB Endowment 5, 3 (2011), 241--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Michael R. Garey and David S. Johnson. 2002. Computers and Intractability, Vol. 29. WH Freeman.Google ScholarGoogle Scholar
  13. Rosalba Giugno and Dennis Shasha. 2002. Graphgrep: A fast and universal method for querying graphs. In Proceedings of 16th International Conference on Pattern Recognition, Vol. 2. IEEE, 112--115.Google ScholarGoogle ScholarCross RefCross Ref
  14. Eleni Hadjiconstantinou and Nicos Christofides. 1999. An efficient implementation of an algorithm for finding k shortest simple paths. Networks 34.2 (1999), 88--101.Google ScholarGoogle Scholar
  15. John Hershberger, Matthew Maxel, and Subhash Suri. 2007. Finding the k shortest simple paths: A new algorithm and its implementation. ACM Transactions on Algorithms 3, 4 (2007), 45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Petter Holme and Beom Jun Kim. 2002. Growing scale-free networks with tunable clustering. Physical Review E 65, 2 (2002), 026107.Google ScholarGoogle ScholarCross RefCross Ref
  17. Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S. Sudarshan, Rushi Desai, and Hrishikesh Karambelkar. 2005. Bidirectional expansion for keyword search on graph databases. Proceedings of the VLDB Endowment (2005), 505--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Naoki Katoh, Ibaraki Toshihide, and Mine Hisashi. 1982. An efficient algorithm for k shortest simple paths. Networks 12, 4 (1982), 411--427.Google ScholarGoogle ScholarCross RefCross Ref
  19. Arijit Khan, Nan Li, Xifeng Yan, Ziyu Guan, Supriyo Chakraborty, and Shu Tao. 2011. Neighborhood based fast graph search in large networks. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, 901--912. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Arijit Khan, Yinghui Wu, Charu C. Aggarwal, and Xifeng Yan. 2013. Nema: Fast graph search with label similarity. Proceedings of the VLDB Endowment 6, 3 (2013), 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ni Lao and William W. Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine Learning 81, 1 (2010), 53--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jiongqian Liang, Deepak Ajwani, Patrick K. Nicholson, Alessandra Sala, and Srinivasan Parthasarathy. 2016. What links alice and bob? matching and ranking semantic patterns in heterogeneous networks. In Proceedings of the 25th International Conference on World Wide Web. ACM, 879--889. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jiongqian Liang, Peter Jacobs, Jiankai Sun, and Srinivasan Parthasarathy. 2018. SEANO: Semi-supervised embedding in attributed networks with outliers. In Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM.Google ScholarGoogle ScholarCross RefCross Ref
  24. Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. 2015. Discovering meta-paths in large heterogeneous information networks. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 754--764. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Judea Pearl. 1984. Heuristics: Intelligent search strategies for computer problem solving. Addison-Wesley (1984). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stuart Russell and Peter Norvig. 1995. Artificial Intelligence: A modern approach. Pearson Education 25 (1995), 97--104.Google ScholarGoogle Scholar
  27. Jacob Scott, Trey Ideker, Richard M. Karp, and Roded Sharan. 2006. Efficient algorithms for detecting signaling pathways in protein interaction networks. Journal of Computational Biology 13, 2 (2006), 133--144.Google ScholarGoogle ScholarCross RefCross Ref
  28. Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno. 2002. Algorithmics and applications of tree and graph searching. In Proceedings of the ACM SIGMOD Symposium on Principles of Database Systems. ACM, 39--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chuan Shi, Xiangnan Kong, Yue Huang, S. Yu Philip, and Bin Wu. 2014. Hetesim: A general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering 26, 10 (2014), 2479--2492.Google ScholarGoogle ScholarCross RefCross Ref
  30. Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S. Yu Philip. 2017. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2017), 17--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Chuan Shi, Zhiqiang Zhang, Ping Luo, Philip S. Yu, Yading Yue, and Bin Wu. 2015. Semantic path based personalized recommendation on weighted heterogeneous information networks. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 453--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yu-Keng Shih and Srinivasan Parthasarathy. 2012. A single source k-shortest paths algorithm to infer regulatory pathways in a gene network. Bioinformatics 28, 12 (2012), i49--i58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Christian Sommer. 2014. Shortest-path queries in static networks. ACM Computing Surveys 46, 4 (2014), 45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter 14, 2 (2013), 20--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yizhou Sun, Jiawei Han, Charu C. Aggarwal, and Nitesh V. Chawla. 2012. When will it happen? Relationship prediction in heterogeneous information networks. In Proceedings of the 5th ACM international conference on Web search and data mining. ACM, 663--672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992--1003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Hanghang Tong and Christos Faloutsos. 2006. Center-piece subgraphs: Problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 404--413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Alexander Ullrich and Christian V. Forst. 2009. k-PathA: K-shortest path algorithm. In Proceedings of the IEEE International Workshop on High Performance Computational Systems Biology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Michael Wolverton, Pauline Berry, Ian W. Harrison, John D. Lowrance, David N. Morley, Andres C. Rodriguez, Enrique H. Ruspini, and Jerome Thomere. 2003. LAW: A workbench for approximate pattern matching in relational data. In Proceedings of the 5th Innovative Applications of Artificial Intelligence Conference, Vol. 3. 143--150.Google ScholarGoogle Scholar
  41. Jin Y. Yen. 1971. Finding the shortest loopless paths in a network. Management Science 17, 11 (1971), 712--716.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Prioritized Relationship Analysis in Heterogeneous Information Networks

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 12, Issue 3
        June 2018
        360 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3178546
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 January 2018
        • Accepted: 1 October 2017
        • Revised: 1 September 2017
        • Received: 1 January 2017
        Published in tkdd Volume 12, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader