Abstract
The selection of appropriate proximity measures is one of the crucial success factors of content-based visual information retrieval. In this area of research, proximity measures are used to estimate the similarity of media objects by the distance of feature vectors. The research focus of this work is the identification of proximity measures that perform better than the usual choices (e.g., Minkowski metrics). We evaluate a catalogue of 37 measures that are selected from various areas (psychology, sociology, economics, etc.). The evaluation is based on content-based MPEG-7 descriptions of carefully selected media collections. Unfortunately, some proximity measures are only defined on predicates (e.g., most psychological measures). One major contribution of this paper is a model that allows for the application of such measures on continuous feature data. The evaluation results uncover proximity measures that perform better than others on content-based features. Some predicate-based measures clearly outperform the frequently used distance norms. Eventually, the discussion of the evaluation leads to a catalogue of mathematical terms of successful retrieval and browsing measures.
Similar content being viewed by others
References
Benchathlon network website (available from http://www. benchathlon.net/, last visited 2006-02-17)
Bober M. (2001) MPEG-7 visual shape descriptors. Special issue on MPEG-7. IEEE Trans. Circuits Syst. Video Technol. 11(6): 716–719
Catell R.B. (1949) rp and other coefficients of pattern similarity. Psychometrica 14, 279–298
Chang S.F., Sikora T., Puri A. (2001) Overview of the MPEG-7 standard. Special issue on MPEG-7. IEEE Trans. Circuits Syst. Video Technol. 11(6): 688–695
Clark P.S. (1952) An extension of the coefficient of divergence for use with multiple characters. Copeia 2: 61–64
Cohen J. (1969) A profile similarity coefficient invariant over variable reflection. Psychological Bulletin 71, 281–284
Czekanowski J.: Zarys metod statystycznych w zastosowaniu do antropologii. Prace Towarzystwa Naukowego Warszawskiego 5 (1913)
Del Bimbo A. (1999) Visual Information Retrieval. Morgan Kaufmann, San Francisco CA
Eidenberger H.: Distance measures for MPEG-7-based retrieval. In: Proceedings ACM SIGMM International Workshop on Multimedia Information Retrieval, Berkeley, CA: 130–137 (2003)
Eidenberger H., Breiteneder C.: Visual similarity measurement with the Feature Contrast Model. SPIE (Storage and Retrieval for Media Databases Conference) 5021, 64–76
Eidenberger H., Breiteneder C. (2003) VizIR – a framework for visual information retrieval. Vis. Lang. Comput. 14, 443–469
Eidenberger H (2004) Statistical analysis of visual MPEG-7 descriptors. ACM Multimedia Systems 10(2): 84–97
Fuhr N. Information retrieval methods for multimedia objects. In: Veltkamp R.C., Burkhardt H., Kriegel H.P., (eds) State-of-the-Art in Content-Based Image and Video Retrieval. Kluwer, Boston, pp. 191–212
Goodall D.W. (1967) The distribution of the matching coefficient. Biometrics 23, 647–656
Gower J.G. (1967) Multivariate analysis and multidimensional geometry. The Statistician 17, 13–25
Jaccard P. (1908) Nouvelles recherches sur la distribution florale. Bull. Soc. Vaudoise Sci. Nat. 44, 223–270
Jolion J.M. (2001) Feature similarity. In: Lew M.S. (ed) Principles of Visual Information Retrieval. Springer, Berlin Heidelberg Newyork, pp. 121–144
Kulczynski S.: Bulletin International de l’Acadamie Polonaise des Sciences et des Lettres, Classe des Sciences Mathématiques et Naturelles, Série B (Sciences Naturelles), Suppl. II 57–203 (1927)
Lance G.N., Williams W.T. (1967) Mixed data classificatory programs. Agglom. Syst. Aust. Co. J. 9, 373–380
Manjunath B.S., Ohm J.R., Vasudevan V.V., Yamada A. (2001) MPEG-7 color and texture descriptors. IEEE Trans. Circuits Syst. Video Technol. 11(6): 703–715
Manjunath B.S., Salembier P., Sikora T. (2002) Introduction to MPEG-7. Wiley, San Francisco CA
Meehl P.E. (1997) The problem is epistemology, not statistics: replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In: Harlow L.L., Mulaik S.A., Steiger J.H. (eds) What if there were no significance tests?. Erlbaum, Mahwah NJ, pp 393–425
MPEG-7 eXperimentation Model website (available from http://www.lis.ei.tum.de/research/bv/topics/mmdb/e_mpeg7. html, last visited 2006-02-17)
MPEG-7 similarity measurement website (available from http://vizir.ims.tuwien.ac.at/SimEval, last visited 20060217)
Ochiai A. (1957) Zoogeographic studies on the soleoid fishes found in Japan and its neighbouring regions. Bull. Jpn. Soc. Sci. Fish 22, 526–530
Over P., Leung C., Ip H., Grubinger M. (2004) Multimedia Retrieval Benchmarks. IEEE Multimed. 11(2): 80–84
Pearson K. (1926) On the coefficients of racial likeness. Biometrica 18, 105–117
Rogers D.J., Tanimoto T.T. (1960) A computer program for classifying plants. Science 132, 1115–1118
Russel P.F, Rao T.R. (1940) On habitat and association of species of anopheline larvae in south-eastern Madras. Malar. Inst. J. 3, 153–178
Santini S., Jain R. (1997) Similarity is a geometer. Multimed. Tools Appl. 5/3: 277–306
Santini S., Jain R. (1999) Similarity measures. IEEE Trans. on Pattern Anal. Mach. Intell. 21(9): 871–883
Sint P.P.: Similarity structures and similarity measures. Austrian Academy of Sciences Press, Vienna (in German) (1975)
Smeulders A.W.M., Worring M., Santini S., Gupta A., Jain R. (2000) Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12): 1349–1380
Sneath P.H.A., Sokal R.R. (1973) Numerical Taxonomy. W. H. Freeman, San Francisco CA
Tversky A. (1977) Features of similarity. Psychol. Rev. 84(4): 327–351
Web appendix of data tables (available from http://www. ims.tuwien.ac.at/~hme/papers/acmms04b-appendix-datatables-1.pdf, last visited 2006-02-17)
Webster H. (1952) A note on profile similarity. Psychol. Bull. 49, 538–539
Yule G.U. (1911) An Introduction of the Theory of Statistics. Charles Griffin & Co., London UK
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Eidenberger, H. Evaluation and analysis of similarity measures for content-based visual information retrieval. Multimedia Systems 12, 71–87 (2006). https://doi.org/10.1007/s00530-006-0043-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-006-0043-z