Abstract
Finding a good similarity assessment algorithm for the use in ontologies is central to the functioning of techniques such as retrieval, matchmaking, clustering, data-mining, ontology translations, automatic database schema matching, and simple object comparisons. This paper assembles a catalogue of ontology based similarity measures, which are experimentally compared with a “similarity gold standard” obtained by surveying 50 human subjects. Results show that human and algorithmic similarity predications varied substantially, but could be grouped into cohesive clusters. Addressing this variance we present a personalized similarity assessment procedure, which uses a machine learning component to predict a subject’s cluster membership, providing an excellent prediction of the gold standard. We conclude by hypothesizing ontology dependent similarity measures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
[And+03]_Andreasen, T.; H. Bulskov; Knappe, R: From Ontology over Similarity to Query Evaluation. 2nd CoLogNET-ElsNET Symposium — Questions and Answers: Theoretical and Applied Perspectives. Amsterdam, Holland, 2003: pp. 39–50.
Baeza-Yates, R.; Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press; Addison-Wesley: New York; Harlow, England; Reading, Mass., 1999.
[Ber+04]_Bernstein, A.; Kaufmann, E.; Bürki, C; Klein, M.: Object Similarity in Ontologies: A Foundation for Business Intelligence Systems and High-performance Retrieval. Twenty-Fifth International Conference on Information Systems. Washington, DC, 2004: pp. 11–25.
[Blo+02]_Blok, S.; Medin, D.; Osherson, D.: Probability from Similarity. AAAI Conference on Commonsense Reasoning. AAAI Press: Stanford, CA, 2002.
Brin, S.; Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Seventh International World Wide Web Conference. ACM-Press: Brisbane, Australia, 1998.
Budanitsky, A.; Hirst, G.: Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. Second meeting of the North American Chapter of the Association for Computational Linguistics (NAACL2001). Pittsburgh, PA, 2001.
[DLN+03]_Di Noia, T.; Di Sciascio, E.; Donini, F. M.; Mongiello, M.: A System for Principled Matchmaking in an Electronic Marketplace. WWW2003. Budapest, Hungary, 2003.
Dzeroski, S.; Lavrac, N.: Relational Data Mining. Springer: Berlin; New York, 2001.
Gentner, D.; Medina, J.: Similarity and the Development of Rules. Cognition 65, 1998: pp. 263–297.
Grosof, B.; Poon, T. C: Representing Agent Contracts with Exceptions using XML Rules, Ontologies, and Process Descriptions. International Workshop on Rule Markup Languages for Business Rules on the Semantic Web (held at ISWC2002). Sardinia, Italy, 2002.
Jarmasz, M.; Szpakowicz, S.: Roget’s Thesaurus and Semantic Similarity. International Conference on Recent Advances in Natural Language Processing (RANLP2003). Borovets, Bulgaria, 2003.
Jiang, J. J.; Conrath, D. W.: Semantic Similairty Based on Corpus Statistics and Lexical Taxonomy. International Conference on Research on Computational Linguistics (ROCLING X). Taiwan, 1997.
Klein, M.; Bernstein, A.: Towards High-Precision Service Retrieval. IEEE Internet Computing 8, 2004: pp. 30–36.
[Lee+93]_Lee, J. H.; Kim, M. H.; Lee, Y. J.: Information Retrieval Based on Conceptual Distance in IS-A Hierarchies. Journal of Documentation 49, 1993: pp. 188–207.
Levenshtein, V. L: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10, 1966: pp. 707–710.
Lin, D.: An Information-Theoretic Definition of Similarity. Fifteenth International Conference on Machine Learning (ICML‘98). Morgan-Kaufmann: Madison, WI, 1998.
[Lor+03]_Lord, P. W.; Stevens, R. D.; Brass, A.; Goble, C. A.: Investigating semantic similarity measures across the Gene Ontology: The relationship between sequence and annotation. Bioinformatics 19, 2003: pp. 1275–1283.
[Mal+03]_Malone, T. W., K. Crowston, and G. A. Herman (Eds.). 2003. Organizing Business Knowledge: The MIT Process Handbook. Cambridge, MA: MIT Press. 2003.
[Mal+99]_Malone, T. W.; Crowston, K.; Lee, J.; Pentland, B.; Dellarocaset C. et al.: Tools for inventing organizations: Toward a handbook of organizational processes. Management Science 45, 1999: pp. 425–443.
McCallum, A. K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. Unpublished manuscript (available online at: http://www.cs.cmu.edu/~mccallum/bow), 1996, Download: 2004-04-15.
[Mil++93]_Miller, G. A.; Beckwith, R.; Fellbaum, C; Gross, D.; Miller, K.: Introduction to WordNet: An On-line Lexical Database. Technical Report. Cognitive Science Laboratory, Princeton University, Princeton, NJ, 1993.
Miller, G. A.; Charles, W. G.: Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 6, 1991: pp. 1–28.
Mitchell, T. M.: Machine Learning. McGraw-Hill: New York, 1997.
Ouzzani, M.; Bouguettaya, A.: Efficient Access to Web Services. IEEE Xplore: Internet Computing 8, 2004: pp. 34–44.
Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann: San Mateo, CA, 1993.
[Rad+89]_Rada, R.; Mili, H.; Bicknell, E.; Bletner, M.: Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics 19, 1989: pp. 17–30.
Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. 14th International Joint Conference on Artificial Intelligence. Montreal 1995: pp. 448–453.
Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 1999: pp. 95–130.
Rodriguez, M. A.; Egenhofer, M. J.: Determining Semantic Similarity Among Entity Classes from Different Ontologies. IEEE Transactions on Knowledge and Data Engineering 15, 2003: pp. 442–456.
Sachs, L.: Angewandte Statistik. Springer: Berlin, 2002.
Salton, G.; McGill, M. J.: Introduction to modern information retrieval. McGraw-Hill: New York, 1983.
Witten, I. H.; Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan-Kaufmann: San Francisco, 2000.
Wu, Z.; Palmer, M.: Verb Semantics and Lexical Selection. 32nd Annual Meeting of the Associations for Computational Linguistics. Las Cruces, New Mexico, 1994: pp. 133–138
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Physica-Verlag Heidelberg
About this paper
Cite this paper
Bernstein, A., Kaufmann, E., Bürki, C., Klein, M. (2005). How Similar Is It? Towards Personalized Similarity Measures in Ontologies. In: Ferstl, O.K., Sinz, E.J., Eckert, S., Isselhorst, T. (eds) Wirtschaftsinformatik 2005. Physica, Heidelberg. https://doi.org/10.1007/3-7908-1624-8_71
Download citation
DOI: https://doi.org/10.1007/3-7908-1624-8_71
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-1574-0
Online ISBN: 978-3-7908-1624-2
eBook Packages: Business and Economics (German Language)