skip to main content
10.1145/1081870.1081910acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Cross-relational clustering with user's guidance

Published:21 August 2005Publication History

ABSTRACT

Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across multiple relations. To ensure effective and efficient high-dimensional, cross-relational clustering, we propose a new approach, called CrossClus, which performs cross-relational clustering with user's guidance. We believe that user's guidance, even likely in very simple forms, could be essential for effective high-dimensional clustering since a user knows well the application requirements and data semantics. CrossClus is carried out as follows: A user specifies a clustering task and selects one or a small set of features pertinent to the task. CrossClus extracts the set of highly relevant features in multiple relations connected via linkages defined in the database schema, evaluates their effectiveness based on user's guidance, and identifies interesting clusters that fit user's needs. This method takes care of both quality in feature extraction and efficiency in clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of this approach.

References

  1. C.C. Aggarwal, P.S. Yu. Finding Generalized Projected Clusters in High Dimensional Spaces. SIGMOD, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C.C. Aggarwal, C. Procopiuc, J.L. Wolf, P.S. Yu, J.S. Park. Fast Algorithms for Projected Clustering. SIGMOD, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Cheeseman, et al. AutoClass: A Bayesian Classfication System. ICML, 1988.]]Google ScholarGoogle Scholar
  4. J.G. Dy, C.E. Brodley. Feature Selection for Unsupervised Learning. J. Machine Learning Research, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. Emde, D. Wettschereck. Relational Instance-Based Learning. ICML, 1996.]]Google ScholarGoogle Scholar
  6. V. Ganti, J. Gehrke, R. Ramakrishnan. CACTUS - Clustering Categorical Data Using Summaries. KDD, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Gärtner, J. W. Lloyd, P. A. Flach. Kernels and Distances for Structured Data. Machine Learning, 57, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Guyon, A. Elisseeff. An Introduction to Variable and Feature Selection. J. Machine Learning Research, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M.A. Hall. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. ICML, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. Hristidis, Y. Papakonstantinou. DISCOVER: Keyword Search in Relational Databases. VLDB, 2002.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Kaufman, P.J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley and Sons, 1990.]]Google ScholarGoogle Scholar
  12. K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl. Constrained k-means clustering with background knowledge. ICML, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Kim, S. Lee. A semi-supervised document clustering technique for information organization. CIKM, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Kirsten, S. Wrobel. Relational Distance-Based Clustering. ILP, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Kirsten, S. Wrobel. Extending K-Means Clustering to First-order Representations. ILP, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. MacQueen. Some Methods for Classification and Analysis of Multivariate Observations. Berkeley Symposium, 1967.]]Google ScholarGoogle Scholar
  17. T.M. Mitchell. Machine Learning. McGraw Hill, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Mitra, C.A. Murthy, S.K. Pal. Unsupervised Feature Selection Using Feature Similarity. PAMI, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R.T. Ng, J. Han. Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. Yin, J. Han, J. Yang, P.S. Yu. CrossMine: Efficient Classification Across Multiple Database Relations. ICDE, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. P. Xing, A. Y. Ng, M. I. Jordan, S. Russell. Distance metric learning, with application to clustering with side-information. NIPS, 2002.]]Google ScholarGoogle Scholar

Index Terms

  1. Cross-relational clustering with user's guidance

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
      August 2005
      844 pages
      ISBN:159593135X
      DOI:10.1145/1081870

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader