research-article

Ranking-based clustering of heterogeneous information networks with star network schema

Authors:
Yizhou Sun

University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA

University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
View Profile

,
Yintao Yu

University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA

University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
View Profile

,
Jiawei Han

University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA

University of Illinois at Urbana-Champaign, Urbana-Champaign, IL, USA
View Profile

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningJune 2009Pages 797–806https://doi.org/10.1145/1557019.1557107

Published:28 June 2009Publication History

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 797–806

ABSTRACT

A heterogeneous information network is an information network

composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on homogeneous networks has been studied over decades, clustering on heterogeneous networks has not been addressed until recently.

A recent study proposed a new algorithm, RankClus, for clustering on bi-typed heterogeneous networks. However, a real-world network may consist of more than two types, and the interactions among multi-typed objects play a key role at disclosing the rich semantics that a network carries. In this paper, we study clustering of multi-typed heterogeneous networks with a star network schema and propose a novel algorithm, NetClus, that utilizes links across multityped objects to generate high-quality net-clusters. An iterative enhancement method is developed that leads to effective ranking-based clustering in such heterogeneous networks. Our experiments on DBLP data show that NetClus generates more accurate clustering results than the baseline topic model algorithm PLSA and the recently proposed algorithm, RankClus. Further, NetClus generates informative clusters, presenting good ranking and cluster membership information for each attribute object in each net-cluster.

Supplemental Material

p797-sun.mp4

mp4

145.4 MB

Download

References

A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In Proceedings of the 7th SIAM International Conference on Data Mining SIAM'07, 2007.Google ScholarCross Ref
R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML '05: Proceedings of the 22nd international conference on Machine learning ICML'05, pages 41--48, 2005. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107--117, 1998. Google ScholarDigital Library
C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM'01) ICDM'01, pages 107--114. IEEE Computer Society, 2001. Google ScholarDigital Library
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM '99: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communicationSIGCOMM'99, pages 251--262, 1999. Google ScholarDigital Library
T. Hofmann. Probabilistic latent semantic analysis. In In Proc. of Uncertainty in Artificial Intelligence (UAI'99)UAI'99, pages 289--296, 1999. Google ScholarDigital Library
G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD conference (KDD'02)KDD'02, pages 538--543. ACM, 2002. Google ScholarDigital Library
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarDigital Library
B. Long, Z. M. Zhang, X. Wú, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML '06: Proceedings of the 23rd international conference on Machine learning ICML'06, pages 585--592, 2006. Google ScholarDigital Library
Q. Mei, D. Zhang, and C. Zhai. A general optimization framework for smoothing language models on graph structures. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval SIGIR'08SIGIR'08, pages 611--618, 2008. Google ScholarDigital Library
M. E. J. Newman. The structure of scientific collaboration networks. Working Papers 00-07-037, Santa Fe Institute, July 2000.Google Scholar
M. E. J. Newman. Assortative mixing in networks. Physical Review Letters, 89(20):208701, October 2002.Google ScholarCross Ref
Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma. Object-level ranking: Bringing order to web objects. In Proceedings of the fourteenth International World Wide Web Conference (WWW'05)WWW'05, pages 567--574. ACM, May 2005. Google ScholarDigital Library
J. Shi and J. Malik. Normalized cuts and image segmentation. In Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR'97)CVPR'97, page 731. IEEE Computer Society, 1997. Google ScholarDigital Library
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'04)KDD'04, pages 306--315, 2004. Google ScholarDigital Library
Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu. Rankclus: Integrating clustering with ranking for heterogenous information network analysis. In Proceedings of the 12th International Conference on Extending Database Technology Conference (EDBT'09)EDBT'09, 2009. Google ScholarDigital Library
Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD'08)SIGMOD'08, pages 567--580, 2008. Google ScholarDigital Library
U. von Luxburg. A tutorial on spectral clustering. Technical report, Max Planck Institute for Biological Cybernetics, 2006.Google Scholar
S. White and P. Smyth. A spectral clustering approach to finding communities in graph. In Proceedings of the Fifth SIAM International Conference on Data Mining (SDM'05)SDM'05, 2005.Google ScholarCross Ref
X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'07)KDD'07, pages 824--833, 2007. Google ScholarDigital Library
C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004. Google ScholarDigital Library
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data miningKDD'04, pages 743--748, 2004. Google ScholarDigital Library
N. Wang, S. Parthasarathy, K.-L. Tan, and A. K. H. Tung. Csv: visualizing and mining cohesive subgraphs. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD'08)SIGMOD'08, pages 445--458, 2008. Google ScholarDigital Library

Index Terms

Ranking-based clustering of heterogeneous information networks with star network schema
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Heterogeneous Information Networks Bi-clustering with Similarity Regularization
Proceedings of the 11th Pacific Asia Workshop on Intelligence and Security Informatics - Volume 9650

Clustering analysis of multi-typed objects in heterogeneous information network HINs is an important and challenging problem. Nonnegative Matrix Tri-Factorization NMTF is a popular bi-clustering algorithm on document data and relational data. However, ...
Read More
Ranking-based Clustering on General Heterogeneous Information Networks by Network Projection
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Recently there is an increasing attention in heterogeneous information network analysis, which models networked data as networks including different types of objects and relations. Many data mining tasks have been exploited in heterogeneous networks, ...
Read More
Clustering for heterogeneous information networks with extended star-structure

Clustering of objects in a heterogeneous information network, where different types of objects are linked to each other, is an important problem in heterogeneous information network analysis. Several existing clustering approaches deal with star-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
heterogeneous information network
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 383
  Total Citations
  View Citations
- 2,671
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ranking-based clustering of heterogeneous information networks with star network schema

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Heterogeneous Information Networks Bi-clustering with Similarity Regularization

Ranking-based Clustering on General Heterogeneous Information Networks by Network Projection

Clustering for heterogeneous information networks with extended star-structure

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Ranking-based clustering of heterogeneous information networks with star network schema

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Heterogeneous Information Networks Bi-clustering with Similarity Regularization

Ranking-based Clustering on General Heterogeneous Information Networks by Network Projection

Clustering for heterogeneous information networks with extended star-structure

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media