Prioritized Relationship Analysis in Heterogeneous Information Networks

Authors:
Jiongqian Liang

The Ohio State University

The Ohio State University
View Profile

,
Deepak Ajwani

Bell Labs, Ireland

Bell Labs, Ireland
View Profile

,
Patrick K. Nicholson

Bell Labs, Ireland

Bell Labs, Ireland
View Profile

,
Alessandra Sala

Bell Labs, Ireland

Bell Labs, Ireland
View Profile

,
Srinivasan Parthasarathy

The Ohio State University

The Ohio State University
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 12 Issue 3Article No.: 29pp 1–27https://doi.org/10.1145/3154401

Published:23 January 2018Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

An increasing number of applications are modeled and analyzed in network form, where nodes represent entities of interest and edges represent interactions or relationships between entities. Commonly, such relationship analysis tools assume homogeneity in both node type and edge type. Recent research has sought to redress the assumption of homogeneity and focused on mining heterogeneous information networks (HINs) where both nodes and edges can be of different types. Building on such efforts, in this work, we articulate a novel approach for mining relationships across entities in such networks while accounting for user preference over relationship type and interestingness metric. We formalize the problem as a top-k lightest paths problem, contextualized in a real-world communication network, and seek to find the k most interesting path instances matching the preferred relationship type. Our solution, PROphetic HEuristic Algorithm for Path Searching (PRO-HEAPS), leverages a combination of novel graph preprocessing techniques, well-designed heuristics and the venerable A* search algorithm. We run our algorithm on real-world large-scale graphs and show that our algorithm significantly outperforms a wide variety of baseline approaches with speedups as large as 100X.

To widen the range of applications, we also extend PRO-HEAPS to (i) support relationship analysis between two groups of entities and (ii) allow pattern path in the query to contain logical statements with operators AND, OR, NOT, and wild-card “.”. We run experiments using this generalized version of PRO-HEAPS and demonstrate that the advantage of PRO-HEAPS becomes even more pronounced for these general cases. Furthermore, we conduct a comprehensive analysis to study how the performance of PRO-HEAPS varies with respect to various attributes of the input HIN. We finally conduct a case study to demonstrate valuable applications of our algorithm.

References

Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2013. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 349--360. Google ScholarDigital Library
Boanerges Aleman-Meza, Christian Halaschek-Wiener, Satya Sanket Sahoo, Amit Sheth, and I. Budak Arpinar. 2005. Template based semantic similarity for security applications. In Proceedings of the Intelligence and Security Informatics. Springer, 621--622. Google ScholarDigital Library
Noga Alon, Raphael Yuster, and Uri Zwick. 1995. Color-coding. Journal of the ACM 42, 4 (1995), 844--856. Google ScholarDigital Library
Yiyuan Bai, Chaokun Wang, Xiang Ying, Meng Wang, and Yunqing Gong. 2014. Path pattern query processing on large graphs. In Proceedings of the IEEE 4th International Conference on Big Data and Cloud Computing (BdCloud). IEEE. Google ScholarDigital Library
Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and Shashank Sudarshan. 2002. Keyword searching and browsing in databases using BANKS. In ICDE. IEEE.Google Scholar
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003), 993--1022. Google ScholarDigital Library
Thayne Coffman, Seth Greenblatt, and Sherry Marcus. 2004. Graph-based technologies for intelligence analysis. Communications of the ACM 47, 3 (2004), 45--47. Google ScholarDigital Library
Diane J. Cook and Lawrence B. Holder. 1994. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research 1 (1994), 231--255. Google ScholarDigital Library
Atish Das Sarma, Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy. 2010. A sketch-based distance oracle for web-scale graphs. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 401--410. Google ScholarDigital Library
Christos Faloutsos, Kevin S. McCurley, and Andrew Tomkins. 2004. Fast discovery of connection subgraphs. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 118--127. Google ScholarDigital Library
Lujun Fang, Anish Das Sarma, Cong Yu, and Philip Bohannon. 2011. Rex: Explaining relationships between entity pairs. Proceedings of the VLDB Endowment 5, 3 (2011), 241--252. Google ScholarDigital Library
Michael R. Garey and David S. Johnson. 2002. Computers and Intractability, Vol. 29. WH Freeman.Google Scholar
Rosalba Giugno and Dennis Shasha. 2002. Graphgrep: A fast and universal method for querying graphs. In Proceedings of 16th International Conference on Pattern Recognition, Vol. 2. IEEE, 112--115.Google ScholarCross Ref
Eleni Hadjiconstantinou and Nicos Christofides. 1999. An efficient implementation of an algorithm for finding k shortest simple paths. Networks 34.2 (1999), 88--101.Google Scholar
John Hershberger, Matthew Maxel, and Subhash Suri. 2007. Finding the k shortest simple paths: A new algorithm and its implementation. ACM Transactions on Algorithms 3, 4 (2007), 45. Google ScholarDigital Library
Petter Holme and Beom Jun Kim. 2002. Growing scale-free networks with tunable clustering. Physical Review E 65, 2 (2002), 026107.Google ScholarCross Ref
Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S. Sudarshan, Rushi Desai, and Hrishikesh Karambelkar. 2005. Bidirectional expansion for keyword search on graph databases. Proceedings of the VLDB Endowment (2005), 505--516. Google ScholarDigital Library
Naoki Katoh, Ibaraki Toshihide, and Mine Hisashi. 1982. An efficient algorithm for k shortest simple paths. Networks 12, 4 (1982), 411--427.Google ScholarCross Ref
Arijit Khan, Nan Li, Xifeng Yan, Ziyu Guan, Supriyo Chakraborty, and Shu Tao. 2011. Neighborhood based fast graph search in large networks. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, 901--912. Google ScholarDigital Library
Arijit Khan, Yinghui Wu, Charu C. Aggarwal, and Xifeng Yan. 2013. Nema: Fast graph search with label similarity. Proceedings of the VLDB Endowment 6, 3 (2013), 181--192. Google ScholarDigital Library
Ni Lao and William W. Cohen. 2010. Relational retrieval using a combination of path-constrained random walks. Machine Learning 81, 1 (2010), 53--67. Google ScholarDigital Library
Jiongqian Liang, Deepak Ajwani, Patrick K. Nicholson, Alessandra Sala, and Srinivasan Parthasarathy. 2016. What links alice and bob? matching and ranking semantic patterns in heterogeneous networks. In Proceedings of the 25th International Conference on World Wide Web. ACM, 879--889. Google ScholarDigital Library
Jiongqian Liang, Peter Jacobs, Jiankai Sun, and Srinivasan Parthasarathy. 2018. SEANO: Semi-supervised embedding in attributed networks with outliers. In Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM.Google ScholarCross Ref
Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. 2015. Discovering meta-paths in large heterogeneous information networks. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 754--764. Google ScholarDigital Library
Judea Pearl. 1984. Heuristics: Intelligent search strategies for computer problem solving. Addison-Wesley (1984). Google ScholarDigital Library
Stuart Russell and Peter Norvig. 1995. Artificial Intelligence: A modern approach. Pearson Education 25 (1995), 97--104.Google Scholar
Jacob Scott, Trey Ideker, Richard M. Karp, and Roded Sharan. 2006. Efficient algorithms for detecting signaling pathways in protein interaction networks. Journal of Computational Biology 13, 2 (2006), 133--144.Google ScholarCross Ref
Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno. 2002. Algorithmics and applications of tree and graph searching. In Proceedings of the ACM SIGMOD Symposium on Principles of Database Systems. ACM, 39--52. Google ScholarDigital Library
Chuan Shi, Xiangnan Kong, Yue Huang, S. Yu Philip, and Bin Wu. 2014. Hetesim: A general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering 26, 10 (2014), 2479--2492.Google ScholarCross Ref
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S. Yu Philip. 2017. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 29, 1 (2017), 17--37. Google ScholarDigital Library
Chuan Shi, Zhiqiang Zhang, Ping Luo, Philip S. Yu, Yading Yue, and Bin Wu. 2015. Semantic path based personalized recommendation on weighted heterogeneous information networks. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 453--462. Google ScholarDigital Library
Yu-Keng Shih and Srinivasan Parthasarathy. 2012. A single source k-shortest paths algorithm to infer regulatory pathways in a gene network. Bioinformatics 28, 12 (2012), i49--i58. Google ScholarDigital Library
Christian Sommer. 2014. Shortest-path queries in static networks. ACM Computing Surveys 46, 4 (2014), 45. Google ScholarDigital Library
Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter 14, 2 (2013), 20--28. Google ScholarDigital Library
Yizhou Sun, Jiawei Han, Charu C. Aggarwal, and Nitesh V. Chawla. 2012. When will it happen? Relationship prediction in heterogeneous information networks. In Proceedings of the 5th ACM international conference on Web search and data mining. ACM, 663--672. Google ScholarDigital Library
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment 4, 11 (2011), 992--1003.Google ScholarDigital Library
Hanghang Tong and Christos Faloutsos. 2006. Center-piece subgraphs: Problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 404--413. Google ScholarDigital Library
Julian R. Ullmann. 1976. An algorithm for subgraph isomorphism. Journal of the ACM 23, 1 (1976), 31--42. Google ScholarDigital Library
Alexander Ullrich and Christian V. Forst. 2009. k-PathA: K-shortest path algorithm. In Proceedings of the IEEE International Workshop on High Performance Computational Systems Biology. Google ScholarDigital Library
Michael Wolverton, Pauline Berry, Ian W. Harrison, John D. Lowrance, David N. Morley, Andres C. Rodriguez, Enrique H. Ruspini, and Jerome Thomere. 2003. LAW: A workbench for approximate pattern matching in relational data. In Proceedings of the 5th Innovative Applications of Artificial Intelligence Conference, Vol. 3. 143--150.Google Scholar
Jin Y. Yen. 1971. Finding the shortest loopless paths in a network. Management Science 17, 11 (1971), 712--716.Google ScholarCross Ref

Index Terms

Prioritized Relationship Analysis in Heterogeneous Information Networks
1. Information systems
  1. Information systems applications
    1. Data mining
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Algorithmic game theory and mechanism design
      1. Social networks

Recommendations

What Links Alice and Bob?: Matching and Ranking Semantic Patterns in Heterogeneous Networks
WWW '16: Proceedings of the 25th International Conference on World Wide Web

An increasing number of applications are modeled and analyzed in network form, where nodes represent entities of interest and edges represent interactions or relationships between entities. Commonly, such relationship analysis tools assume homogeneity ...
Read More
On relationship formation in heterogeneous information networks: An inferring method based on multilabel learning

This paper studies how relationships form in heterogeneous information networks (HINs). The objective is not only to predict relationships in a given HIN more accurately but also to discover the interdependency between different type of relationships. A ...
Read More
Mining heterogeneous information networks: the next frontier
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Real world physical and abstract data objects are interconnected, forming gigantic, interconnected networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real world ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 12, Issue 3
June 2018
360 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3178546
Editors:
Charu Aggarwal
IBM T. J. Watson Research, USA
,
Xindong Wu
University of Louisiana at Lafayette, USA
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 January 2018
- Accepted: 1 October 2017
- Revised: 1 September 2017
- Received: 1 January 2017
Published in tkdd Volume 12, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Heterogeneous information networks
graph algorithms
semantic relationship queries
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 323
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Prioritized Relationship Analysis in Heterogeneous Information Networks

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

What Links Alice and Bob?: Matching and Ranking Semantic Patterns in Heterogeneous Networks

On relationship formation in heterogeneous information networks: An inferring method based on multilabel learning

Mining heterogeneous information networks: the next frontier