Skip to main content

Distance Based Approaches to Relational Learning and Clustering

  • Chapter
Relational Data Mining

Abstract

Within data analysis, distance-based methods have always been very popular. Such methods assume that it is possible to compute for each pair of objects in a domain their mutual distance (or similarity). In a distance-based setting, many of the tasks usually considered in data mining can be carried out in a surprisingly simple yet powerful way. In this chapter, we give a tutorial introduction to the use of distance-based methods for relational representations, concentrating in particular on predictive learning and clustering. We describe in detail one relational distance measure that has proven very successful in applications, and introduce three systems that actually carry out relational distance-based learning and clustering: Ribl2, Rdbc and Forc. We also present a detailed case study of how these three systems were applied to a domain from molecular biology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. Bisson. Conceptual Clustering in a First-Order Logic Representation. In Proceedings of the Tenth European Conference on Artificial Intelligence, pages 458–462. John Wiley and Sons, Chichester, 1992.

    Google Scholar 

  2. U. Bohnebeck, T. Horväth, and S. Wrobel. Term comparisons in first-order similarity measures. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 65–79. Springer, Berlin, 1998.

    Chapter  Google Scholar 

  3. W. Dillon and M. Goldstein. Multivariate analysis, pages 157–208. John Wiley and Sons, Chichester, 1984.

    Google Scholar 

  4. S. Dzeroski, S. Schulze-Kremer, K. Heidtke, K. Siems, and D. Wettschereck. Diterpene structure elucidation from 13C NMR spectra with machine learning. In N. Lavrač, E. Keravnou, and B. Zupan, editors, Intelligent Data Analysis in Medicine and Pharmacology, pages 207–225. Kluwer, Boston, 1997.

    Chapter  Google Scholar 

  5. W. Emde and D. Wettschereck. Relational Instance-Based Learning. In Proceedings of the Thirteen International Conference on Machine Learning, pages 122–130. Morgan Kaufmann, San Francisco, CA, 1996.

    Google Scholar 

  6. J. Hartigan. Clustering Algorithms, pages 58–73. John Wiley and Sons, Chichester, 1975.

    MATH  Google Scholar 

  7. T. Horväth, Z. Alexin, T. Gyimothy, and S. Wrobel. Application of different learning methods to Hungarian part-of-speech tagging. In Proceedings of the Ninth International Workshop on Inductive Logic Programming, pages 128–139. Springer, Berlin, 1999.

    Chapter  Google Scholar 

  8. T. Horvath, S. Wrobel, and U. Bohnebeck. Relational instance-based learning with lists and terms. Machine Learning, 43(1/2): 53–80, 2001.

    Article  MATH  Google Scholar 

  9. A. Hutchinson. Metrics on Terms and Clauses. In Proceedings of the Ninth European Conference on Machine Learning, pages 138–145. Springer, Berlin, 1997.

    Google Scholar 

  10. L. Kaufmann and P. J. Rousseeuw. Clustering by means of medoids. In Y. Dodge, editor, Statistical Data Analysis based on the L 1 Norm, pages 405–416. Elsevier, Amsterdam, 1987.

    Google Scholar 

  11. L. Kaufmann and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley and Sons, Chichester, 1990.

    Book  Google Scholar 

  12. M. Kirsten and S. Wrobel. Relational Distance-Based Clustering. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 261–270. Springer, Berlin, 1998.

    Chapter  Google Scholar 

  13. M. Kirsten and S. Wrobel. Extending k-means clustering to first-order representations. In Proceedings of the Tenth International Conference on Inductive Logic Programming, pages 112–129. Springer, Berlin, 2000.

    Chapter  Google Scholar 

  14. P. Mahalanobis. On the generalized distance in statistics. Proceedings of the Indian National Institute Science, 2: 49–55, Calcutta, 1936.

    MATH  Google Scholar 

  15. J. McQueen. Some methods of classification and analysis of multivariate observations. In Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281–293, 1967.

    Google Scholar 

  16. S.-H. Nienhuys-Cheng. Distance Between Herbrand Interpretations: A Measure for Approximations to a Target Concept. In Proceedings of the Seventh International Workshop on Inductive Logic Programming, pages 213–226. Springer, Berlin, 1997.

    Chapter  Google Scholar 

  17. S.-H. Nienhuys-Cheng. Distances and limits on Herbrand interpretations. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 250–260. Springer, Berlin, 1998.

    Chapter  Google Scholar 

  18. J. Ramon and M. Bruynooghe. A framework for defining distances between first-order logic objects. Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 271–280. Springer, Berlin, 1998.

    Google Scholar 

  19. M. Sebag. Distance Induction in First Order Logic. In Proceedings of the Seventh International Workshop on Inductive Logic Programming, pages 264–272. Springer, Berlin, 1997.

    Chapter  Google Scholar 

  20. B. A. Shapiro and K. Zhang. Comparing Multiple RNA Secondary Structures Using Tree Comparisons. Computer Applications in Bio sciences, 6(4): 309–318, 1990.

    Google Scholar 

  21. A. Srinivasan, S. Muggleton, and R. D. King. Comparing the use of background knowledge by inductive logic programming systems. In Proceedings of the Fifth International Workshop on Inductive Logic Programming, pages 199–230. Department of Computer Science, Katholieke Universiteit Leuven, Belgium, 1995.

    Google Scholar 

  22. E. Ukkonen. Algorithms for Approximate String Matching. Information and Control, 64: 100–118, 1985.

    Article  MathSciNet  MATH  Google Scholar 

  23. D. Wettschereck and D. Aha. Weighting Features. In Proceedings of the First International Conference on Case-Based Reasoning, pages 347–358. Springer, Berlin, 1995.

    Google Scholar 

  24. K. Zhang and D. Shasha. Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SI AM Journal on Computing, 18(6): 1245–1262, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  25. M. Zuker and P. Stiegler. Optimal Computer Folding of Large RNA Sequences Using Thermodynamics and Auxiliary Information. Nucleic Acids Research, 9(1): 133–148, 1981.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Kirsten, M., Wrobel, S., Horváth, T. (2001). Distance Based Approaches to Relational Learning and Clustering. In: Džeroski, S., Lavrač, N. (eds) Relational Data Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04599-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-04599-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-07604-6

  • Online ISBN: 978-3-662-04599-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics