Distance Based Approaches to Relational Learning and Clustering

Kirsten, Mathias; Wrobel, Stefan; Horváth, Tamás

doi:10.1007/978-3-662-04599-2_9

Mathias Kirsten²,
Stefan Wrobel³ &
Tamás Horváth²

458 Accesses
19 Citations

Abstract

Within data analysis, distance-based methods have always been very popular. Such methods assume that it is possible to compute for each pair of objects in a domain their mutual distance (or similarity). In a distance-based setting, many of the tasks usually considered in data mining can be carried out in a surprisingly simple yet powerful way. In this chapter, we give a tutorial introduction to the use of distance-based methods for relational representations, concentrating in particular on predictive learning and clustering. We describe in detail one relational distance measure that has proven very successful in applications, and introduce three systems that actually carry out relational distance-based learning and clustering: Ribl2, Rdbc and Forc. We also present a detailed case study of how these three systems were applied to a domain from molecular biology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

G. Bisson. Conceptual Clustering in a First-Order Logic Representation. In Proceedings of the Tenth European Conference on Artificial Intelligence, pages 458–462. John Wiley and Sons, Chichester, 1992.
Google Scholar
U. Bohnebeck, T. Horväth, and S. Wrobel. Term comparisons in first-order similarity measures. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 65–79. Springer, Berlin, 1998.
Chapter Google Scholar
W. Dillon and M. Goldstein. Multivariate analysis, pages 157–208. John Wiley and Sons, Chichester, 1984.
Google Scholar
S. Dzeroski, S. Schulze-Kremer, K. Heidtke, K. Siems, and D. Wettschereck. Diterpene structure elucidation from ¹³C NMR spectra with machine learning. In N. Lavrač, E. Keravnou, and B. Zupan, editors, Intelligent Data Analysis in Medicine and Pharmacology, pages 207–225. Kluwer, Boston, 1997.
Chapter Google Scholar
W. Emde and D. Wettschereck. Relational Instance-Based Learning. In Proceedings of the Thirteen International Conference on Machine Learning, pages 122–130. Morgan Kaufmann, San Francisco, CA, 1996.
Google Scholar
J. Hartigan. Clustering Algorithms, pages 58–73. John Wiley and Sons, Chichester, 1975.
MATH Google Scholar
T. Horväth, Z. Alexin, T. Gyimothy, and S. Wrobel. Application of different learning methods to Hungarian part-of-speech tagging. In Proceedings of the Ninth International Workshop on Inductive Logic Programming, pages 128–139. Springer, Berlin, 1999.
Chapter Google Scholar
T. Horvath, S. Wrobel, and U. Bohnebeck. Relational instance-based learning with lists and terms. Machine Learning, 43(1/2): 53–80, 2001.
Article MATH Google Scholar
A. Hutchinson. Metrics on Terms and Clauses. In Proceedings of the Ninth European Conference on Machine Learning, pages 138–145. Springer, Berlin, 1997.
Google Scholar
L. Kaufmann and P. J. Rousseeuw. Clustering by means of medoids. In Y. Dodge, editor, Statistical Data Analysis based on the L ¹ Norm, pages 405–416. Elsevier, Amsterdam, 1987.
Google Scholar
L. Kaufmann and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley and Sons, Chichester, 1990.
Book Google Scholar
M. Kirsten and S. Wrobel. Relational Distance-Based Clustering. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 261–270. Springer, Berlin, 1998.
Chapter Google Scholar
M. Kirsten and S. Wrobel. Extending k-means clustering to first-order representations. In Proceedings of the Tenth International Conference on Inductive Logic Programming, pages 112–129. Springer, Berlin, 2000.
Chapter Google Scholar
P. Mahalanobis. On the generalized distance in statistics. Proceedings of the Indian National Institute Science, 2: 49–55, Calcutta, 1936.
MATH Google Scholar
J. McQueen. Some methods of classification and analysis of multivariate observations. In Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281–293, 1967.
Google Scholar
S.-H. Nienhuys-Cheng. Distance Between Herbrand Interpretations: A Measure for Approximations to a Target Concept. In Proceedings of the Seventh International Workshop on Inductive Logic Programming, pages 213–226. Springer, Berlin, 1997.
Chapter Google Scholar
S.-H. Nienhuys-Cheng. Distances and limits on Herbrand interpretations. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 250–260. Springer, Berlin, 1998.
Chapter Google Scholar
J. Ramon and M. Bruynooghe. A framework for defining distances between first-order logic objects. Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 271–280. Springer, Berlin, 1998.
Google Scholar
M. Sebag. Distance Induction in First Order Logic. In Proceedings of the Seventh International Workshop on Inductive Logic Programming, pages 264–272. Springer, Berlin, 1997.
Chapter Google Scholar
B. A. Shapiro and K. Zhang. Comparing Multiple RNA Secondary Structures Using Tree Comparisons. Computer Applications in Bio sciences, 6(4): 309–318, 1990.
Google Scholar
A. Srinivasan, S. Muggleton, and R. D. King. Comparing the use of background knowledge by inductive logic programming systems. In Proceedings of the Fifth International Workshop on Inductive Logic Programming, pages 199–230. Department of Computer Science, Katholieke Universiteit Leuven, Belgium, 1995.
Google Scholar
E. Ukkonen. Algorithms for Approximate String Matching. Information and Control, 64: 100–118, 1985.
Article MathSciNet MATH Google Scholar
D. Wettschereck and D. Aha. Weighting Features. In Proceedings of the First International Conference on Case-Based Reasoning, pages 347–358. Springer, Berlin, 1995.
Google Scholar
K. Zhang and D. Shasha. Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SI AM Journal on Computing, 18(6): 1245–1262, 1989.
Article MathSciNet MATH Google Scholar
M. Zuker and P. Stiegler. Optimal Computer Folding of Large RNA Sequences Using Thermodynamics and Auxiliary Information. Nucleic Acids Research, 9(1): 133–148, 1981.
Article Google Scholar

Download references

Author information

Authors and Affiliations

German National Research Center for Information Technology, GMD — AiS.KD, Schloß Birlinghoven, D-53754, Sankt Augustin, Germany
Mathias Kirsten & Tamás Horváth
School of Computer Science, IWS, University of Magdeburg, Universitätsplatz 2, D-39016, Magdeburg, Germany
Stefan Wrobel

Authors

Mathias Kirsten
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Wrobel
View author publications
You can also search for this author in PubMed Google Scholar
Tamás Horváth
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Sašo Džeroski & Nada Lavrač &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kirsten, M., Wrobel, S., Horváth, T. (2001). Distance Based Approaches to Relational Learning and Clustering. In: Džeroski, S., Lavrač, N. (eds) Relational Data Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04599-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-662-04599-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-07604-6
Online ISBN: 978-3-662-04599-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics