skip to main content
10.1145/1557019.1557107acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Ranking-based clustering of heterogeneous information networks with star network schema

Authors Info & Claims
Published:28 June 2009Publication History

ABSTRACT

A heterogeneous information network is an information network

composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on homogeneous networks has been studied over decades, clustering on heterogeneous networks has not been addressed until recently.

A recent study proposed a new algorithm, RankClus, for clustering on bi-typed heterogeneous networks. However, a real-world network may consist of more than two types, and the interactions among multi-typed objects play a key role at disclosing the rich semantics that a network carries. In this paper, we study clustering of multi-typed heterogeneous networks with a star network schema and propose a novel algorithm, NetClus, that utilizes links across multityped objects to generate high-quality net-clusters. An iterative enhancement method is developed that leads to effective ranking-based clustering in such heterogeneous networks. Our experiments on DBLP data show that NetClus generates more accurate clustering results than the baseline topic model algorithm PLSA and the recently proposed algorithm, RankClus. Further, NetClus generates informative clusters, presenting good ranking and cluster membership information for each attribute object in each net-cluster.

Skip Supplemental Material Section

Supplemental Material

p797-sun.mp4

mp4

145.4 MB

References

  1. A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In Proceedings of the 7th SIAM International Conference on Data Mining SIAM'07, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  2. R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML '05: Proceedings of the 22nd international conference on Machine learning ICML'05, pages 41--48, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM'01) ICDM'01, pages 107--114. IEEE Computer Society, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM '99: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communicationSIGCOMM'99, pages 251--262, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Hofmann. Probabilistic latent semantic analysis. In In Proc. of Uncertainty in Artificial Intelligence (UAI'99)UAI'99, pages 289--296, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD conference (KDD'02)KDD'02, pages 538--543. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Long, Z. M. Zhang, X. Wú, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML '06: Proceedings of the 23rd international conference on Machine learning ICML'06, pages 585--592, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Q. Mei, D. Zhang, and C. Zhai. A general optimization framework for smoothing language models on graph structures. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval SIGIR'08SIGIR'08, pages 611--618, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. E. J. Newman. The structure of scientific collaboration networks. Working Papers 00-07-037, Santa Fe Institute, July 2000.Google ScholarGoogle Scholar
  12. M. E. J. Newman. Assortative mixing in networks. Physical Review Letters, 89(20):208701, October 2002.Google ScholarGoogle ScholarCross RefCross Ref
  13. Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma. Object-level ranking: Bringing order to web objects. In Proceedings of the fourteenth International World Wide Web Conference (WWW'05)WWW'05, pages 567--574. ACM, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Shi and J. Malik. Normalized cuts and image segmentation. In Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR'97)CVPR'97, page 731. IEEE Computer Society, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'04)KDD'04, pages 306--315, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu. Rankclus: Integrating clustering with ranking for heterogenous information network analysis. In Proceedings of the 12th International Conference on Extending Database Technology Conference (EDBT'09)EDBT'09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD'08)SIGMOD'08, pages 567--580, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. U. von Luxburg. A tutorial on spectral clustering. Technical report, Max Planck Institute for Biological Cybernetics, 2006.Google ScholarGoogle Scholar
  19. S. White and P. Smyth. A spectral clustering approach to finding communities in graph. In Proceedings of the Fifth SIAM International Conference on Data Mining (SDM'05)SDM'05, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  20. X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD'07)KDD'07, pages 824--833, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Zhai and J. D. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data miningKDD'04, pages 743--748, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Wang, S. Parthasarathy, K.-L. Tan, and A. K. H. Tung. Csv: visualizing and mining cohesive subgraphs. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (SIGMOD'08)SIGMOD'08, pages 445--458, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ranking-based clustering of heterogeneous information networks with star network schema

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
      June 2009
      1426 pages
      ISBN:9781605584959
      DOI:10.1145/1557019

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 June 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader