Abstract
Within data analysis, distance-based methods have always been very popular. Such methods assume that it is possible to compute for each pair of objects in a domain their mutual distance (or similarity). In a distance-based setting, many of the tasks usually considered in data mining can be carried out in a surprisingly simple yet powerful way. In this chapter, we give a tutorial introduction to the use of distance-based methods for relational representations, concentrating in particular on predictive learning and clustering. We describe in detail one relational distance measure that has proven very successful in applications, and introduce three systems that actually carry out relational distance-based learning and clustering: Ribl2, Rdbc and Forc. We also present a detailed case study of how these three systems were applied to a domain from molecular biology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
G. Bisson. Conceptual Clustering in a First-Order Logic Representation. In Proceedings of the Tenth European Conference on Artificial Intelligence, pages 458–462. John Wiley and Sons, Chichester, 1992.
U. Bohnebeck, T. Horväth, and S. Wrobel. Term comparisons in first-order similarity measures. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 65–79. Springer, Berlin, 1998.
W. Dillon and M. Goldstein. Multivariate analysis, pages 157–208. John Wiley and Sons, Chichester, 1984.
S. Dzeroski, S. Schulze-Kremer, K. Heidtke, K. Siems, and D. Wettschereck. Diterpene structure elucidation from 13C NMR spectra with machine learning. In N. Lavrač, E. Keravnou, and B. Zupan, editors, Intelligent Data Analysis in Medicine and Pharmacology, pages 207–225. Kluwer, Boston, 1997.
W. Emde and D. Wettschereck. Relational Instance-Based Learning. In Proceedings of the Thirteen International Conference on Machine Learning, pages 122–130. Morgan Kaufmann, San Francisco, CA, 1996.
J. Hartigan. Clustering Algorithms, pages 58–73. John Wiley and Sons, Chichester, 1975.
T. Horväth, Z. Alexin, T. Gyimothy, and S. Wrobel. Application of different learning methods to Hungarian part-of-speech tagging. In Proceedings of the Ninth International Workshop on Inductive Logic Programming, pages 128–139. Springer, Berlin, 1999.
T. Horvath, S. Wrobel, and U. Bohnebeck. Relational instance-based learning with lists and terms. Machine Learning, 43(1/2): 53–80, 2001.
A. Hutchinson. Metrics on Terms and Clauses. In Proceedings of the Ninth European Conference on Machine Learning, pages 138–145. Springer, Berlin, 1997.
L. Kaufmann and P. J. Rousseeuw. Clustering by means of medoids. In Y. Dodge, editor, Statistical Data Analysis based on the L 1 Norm, pages 405–416. Elsevier, Amsterdam, 1987.
L. Kaufmann and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley and Sons, Chichester, 1990.
M. Kirsten and S. Wrobel. Relational Distance-Based Clustering. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 261–270. Springer, Berlin, 1998.
M. Kirsten and S. Wrobel. Extending k-means clustering to first-order representations. In Proceedings of the Tenth International Conference on Inductive Logic Programming, pages 112–129. Springer, Berlin, 2000.
P. Mahalanobis. On the generalized distance in statistics. Proceedings of the Indian National Institute Science, 2: 49–55, Calcutta, 1936.
J. McQueen. Some methods of classification and analysis of multivariate observations. In Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281–293, 1967.
S.-H. Nienhuys-Cheng. Distance Between Herbrand Interpretations: A Measure for Approximations to a Target Concept. In Proceedings of the Seventh International Workshop on Inductive Logic Programming, pages 213–226. Springer, Berlin, 1997.
S.-H. Nienhuys-Cheng. Distances and limits on Herbrand interpretations. In Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 250–260. Springer, Berlin, 1998.
J. Ramon and M. Bruynooghe. A framework for defining distances between first-order logic objects. Proceedings of the Eighth International Conference on Inductive Logic Programming, pages 271–280. Springer, Berlin, 1998.
M. Sebag. Distance Induction in First Order Logic. In Proceedings of the Seventh International Workshop on Inductive Logic Programming, pages 264–272. Springer, Berlin, 1997.
B. A. Shapiro and K. Zhang. Comparing Multiple RNA Secondary Structures Using Tree Comparisons. Computer Applications in Bio sciences, 6(4): 309–318, 1990.
A. Srinivasan, S. Muggleton, and R. D. King. Comparing the use of background knowledge by inductive logic programming systems. In Proceedings of the Fifth International Workshop on Inductive Logic Programming, pages 199–230. Department of Computer Science, Katholieke Universiteit Leuven, Belgium, 1995.
E. Ukkonen. Algorithms for Approximate String Matching. Information and Control, 64: 100–118, 1985.
D. Wettschereck and D. Aha. Weighting Features. In Proceedings of the First International Conference on Case-Based Reasoning, pages 347–358. Springer, Berlin, 1995.
K. Zhang and D. Shasha. Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems. SI AM Journal on Computing, 18(6): 1245–1262, 1989.
M. Zuker and P. Stiegler. Optimal Computer Folding of Large RNA Sequences Using Thermodynamics and Auxiliary Information. Nucleic Acids Research, 9(1): 133–148, 1981.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kirsten, M., Wrobel, S., Horváth, T. (2001). Distance Based Approaches to Relational Learning and Clustering. In: Džeroski, S., Lavrač, N. (eds) Relational Data Mining. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04599-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-662-04599-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-07604-6
Online ISBN: 978-3-662-04599-2
eBook Packages: Springer Book Archive