skip to main content
article

Unified framework for fast exact and approximate search in dissimilarity spaces

Published:01 November 2007Publication History
Skip Abstract Section

Abstract

In multimedia systems we usually need to retrieve database (DB) objects based on their similarity to a query object, while the similarity assessment is provided by a measure which defines a (dis)similarity score for every pair of DB objects. In most existing applications, the similarity measure is required to be a metric, where the triangle inequality is utilized to speed up the search for relevant objects by use of metric access methods (MAMs), for example, the M-tree. A recent research has shown, however, that nonmetric measures are more appropriate for similarity modeling due to their robustness and ease to model a made-to-measure similarity. Unfortunately, due to the lack of triangle inequality, the nonmetric measures cannot be directly utilized by MAMs. From another point of view, some sophisticated similarity measures could be available in a black-box nonanalytic form (e.g., as an algorithm or even a hardware device), where no information about their topological properties is provided, so we have to consider them as nonmetric measures as well. From yet another point of view, the concept of similarity measuring itself is inherently imprecise and we often prefer fast but approximate retrieval over an exact but slower one.

To date, the mentioned aspects of similarity retrieval have been solved separately, that is, exact versus approximate search or metric versus nonmetric search. In this article we introduce a similarity retrieval framework which incorporates both of the aspects into a single unified model. Based on the framework, we show that for any dissimilarity measure (either a metric or nonmetric) we are able to change the “amount” of triangle inequality, and so obtain an approximate or full metric which can be used for MAM-based retrieval. Due to the varying “amount” of triangle inequality, the measure is modified in a way suitable for either an exact but slower or an approximate but faster retrieval. Additionally, we introduce the TriGen algorithm aimed at constructing the desired modification of any black-box distance automatically, using just a small fraction of the database.

References

  1. Aggarwal, C. C., Hinneburg, A., and Keim, D. A. 2001. On the surprising behavior of distance metrics in high dimensional spaces. In Proceedings of Database Theory---ICDT 2001: 8th International Conference, London, UK, 2001. Lecture Notes in Computer Science, vol. 1973. Springer, Berlin, Germany, 420--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aggarwal, C. C. and Yu, P. S. 2000. The IGrid index: Reversing the dimensionality curse for similarity indexing in high dimensional space. In KDD '00: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 119--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amato, G., Rabitti, F., Savino, P., and Zezula, P. 2003. Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inform. Syst. 21, 2, 192--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., and Wu, A. Y. 1998. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. J. ACM 45, 6, 891--923. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ashby, F. and Perrin, N. 1988. Toward a unified theory of similarity and recognition. Psych. Rev. 95, 1, 124--150.Google ScholarGoogle ScholarCross RefCross Ref
  6. Athitsos, V., Hadjieleftheriou, M., Kollios, G., and Sclaroff, S. 2005. Query-sensitive embeddings. In SIGMOD '05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 706--717. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Baeza-Yates, R. A. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing, Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bartolini, I., Ciaccia, P., and Patella, M. 2005. WARP: Accurate Retrieval of Shapes Using Phase of Fourier Descriptors and Time Warping Distance. IEEE Patt. Analys. Mach. Intell. 27, 1, 142--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. 2000. Genbank. Nucl. Acids Res 28, 1 (Jan.), 15--18.Google ScholarGoogle ScholarCross RefCross Ref
  10. Böhm, C., Berchtold, S., and Keim, D. 2001. Searching in high-dimensional spaces---index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33, 3, 322--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bozkaya, T. and Özsoyoglu, M. 1999. Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. 24, 3, 361--404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Brambilla, C., Ventura, A. D., Gagliardi, I., and Schettini, R. 1999. Multiresolution wavelet transform and supervised learning for content-based image retrieval. In Proceedings of the 1999 IEEE Internet Conference on Multimedia Computing and Systems (ICMMCS'99), vol. 1. 9183.Google ScholarGoogle Scholar
  13. Brin, S. 1995. Near neighbor search in large metric spaces. In Proceedings of the 21th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 574--584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bustos, B., Keim, D., and Schreck, T. 2005a. A pivot-based index structure for combination of feature vectors. In Proceedings of the 20th Annual ACM Symposium on Applied Computing, Multimedia and Visualization Track (SAC-MV'05). ACM Press, New York, NY, 1180--1184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Bustos, B., Keim, D. A., Saupe, D., Schreck, T., and Vranic, D. V. 2005b. Feature-based similarity search in 3D object databases. ACM Comput. Surv. 37, 4, 345--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bustos, B. and Navarro, G. 2004. Probabilistic proximity search algorithms based on compact partitions. J. Discrete Alg. 2, 1, 115--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bustos, B., Navarro, G., and Chávez, E. 2003. Pivot selection techniques for proximity searching in metric spaces. Patt. Recog. Lett. 24, 14, 2357--2366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bustos, B. and Skopal, T. 2006. Dynamic similarity search in multi-metric spaces. In Proceedings of the ACM Multimedia, MIR Workshop. ACM Press, New York, NY, 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chávez, E. and Navarro, G. 2001. A Probabilistic spell for the curse of dimensionality. In ALENEX'01. Lecture Notes in Computer Science, vol. 2153. Springer, Berlin, Germany, 147--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chávez, E., Navarro, G., Baeza-Yates, R., and Marroquín, J. L. 2001. Searching in metric spaces. ACM Comput. Surv. 33, 3, 273--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ciaccia, P. and Patella, M. 2000. The M<sup>2</sup>-tree: Processing complex multi-feature queries with just one index. In Proceedings of the DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries (Zurich, Switzerland).Google ScholarGoogle Scholar
  22. Ciaccia, P. and Patella, M. 2002. Searching in metric spaces with user-defined and approximate distances. ACM Database Syst. 27, 4, 398--437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ciaccia, P., Patella, M., and Zezula, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of VLDB'97. 426--435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Corazza, P. 1999. Introduction to metric-preserving functions. Amer. Math. Monthly 104, 4, 309--23.Google ScholarGoogle ScholarCross RefCross Ref
  25. Corboy, A., Raicu, D., and Furst, J. 2005. Texture-based image retrieval for computerized tomography databases. In CBMS '05: Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05). IEEE Computer Society Press, Los Alamitos, CA, 593--598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Dohnal, V., Gennaro, C., Savino, P., and Zezula, P. 2003. D-index: Distance searching index for metric data sets. Multimed. Tools Appl. 21, 1, 9--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Donahue, M., Geiger, D., Liu, T., and Hummel, R. 1996. Sparse representations for image decomposition with occlusions. In Proceedings of CVPR. 7--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Faloutsos, C. and Kamel, I. 1994. Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension. In PODS'94: Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 4--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Faloutsos, C. and Lin, K. 1995. Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Farago, A., Linder, T., and Lugosi, G. 1993. Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans. Patt. Analys. Mach. Intell. 15, 9, 957--962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Filho, R. F. S., Traina, A. J. M., Traina, C., and Faloutsos, C. 2001. Similarity search without tears: The OMNI family of all-purpose access methods. In Proceedings of ICDE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Freeman, M. 2006. Evaluating dataflow and pipelined vector processing architectures for FPGA co-processors. In DSD'06: Proceedings of the 9th EUROMICRO Conference on Digital System Design. IEEE Computer Society Press, Los Alamitos, CA, 127--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Goh, K.-S., Li, B., and Chang, E. 2002. DynDex: A dynamic and non-metric space indexer. In Proceedings of the Tenth ACM Internet Conference on Multimedia. ACM Press, New York, NY, 466--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Goldstein, J. and Ramakrishnan, R. 2000. Contrast plots and p-sphere trees: Space vs. time in nearest neighbour searches. In VLDB'00: Proceedings of the 26th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 429--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Guo, G. D., Jain, A. K., Ma, W. Y., and Zhang, H. J. 2002. Learning similarity measure for natural image retrieval with relevance feedback. IEEE Neural Netw. 13, 4, 811--820.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Hart, P. 1968. The condensed nearest neighbour rule. IEEE Trans. Inform. Theor. 14, 3, 515--516.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Hettich, S. and Bay, S. 1999. The UCI KDD archive. Available online at http://kdd.ics.uci.edu.Google ScholarGoogle Scholar
  38. Hjaltason, G. R. and Samet, H. 2003. Properties of embedding methods for similarity searching in metric spaces. IEEE Patt. Analys. Mach. Intell. 25, 5, 530--549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Howarth, P. and Ruger, S. 2005. Fractional distance measures for content-based image retrieval. In ECIR 2005. Lecture Notes in Computer Science, vol. 3408. Springer-Verlag, Berlin, Germany, 447--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Huttenlocher, D., Klanderman, G., and Rucklidge, W. 1993. Comparing images using the hausdorff distance. IEEE Patt. Analys. Mach. Intell. 15, 9, 850--863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jacobs, D., Weinshall, D., and Gdalyahu, Y. 2000. Classification with nonmetric distances: Image retrieval and class representation. IEEE Patt. Analys. Mach. Intell. 22, 6, 583--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jesorsky, O., Kirchberg, K. J., and Frischholz, R. 2001. Robust face detection using the hausdorff distance. In AVBPA. Lecture Notes in Computer Science, vol. 2091, Springer-Verlag, Berlin, Germany, 90--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Kao, D., Bergeron, R., and Sparr, T. 1997. Mapping metric data to multidimensional spaces, Tech. Rep. TR 97-13. Department of Computer Science, University of New Hampshire, Durham, NH.Google ScholarGoogle Scholar
  44. Keogh, E. J. and Ratanamahatana, C. A. 2005. Exact indexing of dynamic time warping. Knowl. Inform. Syst. 7, 3, 358--386.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Krumhansl, C. L. 1978. Concerning the applicability of geometric models to similar data: The interrelationship between similarity and spatial density. Psych. Rev. 85, 5, 445--463.Google ScholarGoogle ScholarCross RefCross Ref
  46. Kruskal, J. B. 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1, 1--27.Google ScholarGoogle ScholarCross RefCross Ref
  47. Li, C., Chang, E., Garcia-Molina, H., and Wiederhold, G. 2002. Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. Knowl. Data Eng. 14, 4, 792--808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Li, H., Shi, R., Chen, W., and Shen, I.-F. 2006. Image tangent space for image retrieval. In Proceedings of the 18th IEEE International Conference on Pattern Recognition, Vol 2, 1126--1130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Mandl, T. 1998. Learning similarity functions in information retrieval. In Proceedings of EUFIT.Google ScholarGoogle Scholar
  50. Micó, M. L., Oncina, J., and Vidal, E. 1992. An algorithm for finding nearest neighbour in constant average time with a linear space complexity. In Proceedings of the International Conference Pattern Recognition.Google ScholarGoogle Scholar
  51. Mukherjee, A. 1989. Hardware algorithms for determining similarity between two strings. IEEE Trans. Comput. 38, 4, 600--603. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Nierman, A. and Jagadish, H. V. 2002. Evaluating structural similarity in XML documents. In Proceedings of the Fifth International Workshop on the Web and Databases. (WebDB 2002) (Madison, WI).Google ScholarGoogle Scholar
  53. Rosch, E. 1975. Cognitive reference points. Cog. Psych. 7, 532--47.Google ScholarGoogle ScholarCross RefCross Ref
  54. Roth, V., Laub, J., Buhmann, J., and Muller, K. 2002. Going metric: Denoising pairwise data. In Proceedings of the Neural Information Processing Systems Conference.Google ScholarGoogle Scholar
  55. Rothkopf, E. 1957. A measure of stimulus similarity and errors in some paired-associate learning tasks. J. Exp. Psych. 53, 2, 94--101.Google ScholarGoogle ScholarCross RefCross Ref
  56. Rubner, Y., Puzicha, J., Tomasi, C., and Buhmann, J. M. 2001. Empirical evaluation of dissimilarity measures for color and texture. Comput. Vis. Image Underst. 84, 1, 25--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Rubner, Y., Tomasi, C., and Guibas, L. J. 2000. The earth mover's distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 2, 99--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Sankoff, D. and Kruskal, J., Eds. 1983. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA.Google ScholarGoogle Scholar
  59. Santini, S. and Jain, R. 1999. Similarity measures. IEEE Patt. Analys. Mach. Intell. 21, 9, 871--883. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Skopal, T. 2006. On fast non-metric similarity search by metric access methods. In Proceedings of the 10th International Conference on Extending Database Technology (EDBT'06). Lecture Notes in Computer Science, vol. 3896. Springer, Berlin, Germany, 718--736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Skopal, T., Moravec, P., Pokorný, J., and Snášel, V. 2004. Metric indexing for the vector model in text retrieval. In SPIRE, Padova, Italy. Lecture Notes in Computer Science, 3246, Springer, Berlin, Germany, 183--195.Google ScholarGoogle Scholar
  62. Skopal, T., Pokorný, J., Krátký, M., and Snášel, V. 2003. Revisiting M-tree building principles. In ADBIS, Dresden. Lecture Notes in Computer Science, vol. 2798. Springer, Berlin, Germany, 148--162.Google ScholarGoogle Scholar
  63. Skopal, T., Pokorný, J., and Snášel, V. 2005. Nearest neighbours search using the PM-tree. In DASFAA '05, Beijing, China. Lecture Notes in Computer Science, vol. 3453, Springer, Berlin, Germany, 803--815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Traina Jr., C., Traina, A., Seeger, B., and Faloutsos, C. 2000. Slim-trees: High performance metric trees minimizing overlap between nodes. Lecture Notes in Computer Science, vol. 1777. Springer, Berlin, Germany. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Tuncel, E., Ferhatosmanoglu, H., and Rose, K. 2002. VQ-index: An index structure for similarity searching in multimedia databases. In MULTIMEDIA '02: Proceedings of the Tenth ACM International Conference on Multimedia. ACM Press, New York, NY, 543--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Tversky, A. 1977. Features of similarity. Psych. Rev. 84, 4, 327--352.Google ScholarGoogle ScholarCross RefCross Ref
  67. Tversky, A. and Gati, I. 1982. Similarity, separability, and the triangle inequality. Psych. Rev. 89, 2, 123--154.Google ScholarGoogle ScholarCross RefCross Ref
  68. Uhlmann, J. K. 1991. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40, 4, 175--179.Google ScholarGoogle ScholarCross RefCross Ref
  69. Volmer, S. 2002. Buoy indexing of metric feature spaces for fast approximate image queries. In Proceedings of the Sixth Eurographics Workshop on Multimedia 2001. Springer-Verlag, New York, NY, 131--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Weber, R., Schek, H.-J., and Blott, S. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of VLDB. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Wilson, D. L. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybernet. 2, 3, 408--421.Google ScholarGoogle ScholarCross RefCross Ref
  72. Yi, B.-K., Jagadish, H. V., and Faloutsos, C. 1998. Efficient retrieval of similar time sequences under time warping. In Proceedings of ICDE '98. 201--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Yianilos, P. N. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of ACM SIAM SODA. 311--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Zezula, P., Amato, G., Dohnal, V., and Batko, M. 2005. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag, New York, NY. Google ScholarGoogle Scholar
  75. Zezula, P., Savino, P., Amato, G., and Rabitti, F. 1998. Approximate similarity retrieval with M-trees. VLDB J. 7, 4, 275--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Zhou, X., Wang, G., Xu, J. Y., and Yu, G. 2003. M<sup>&plus;</sup>-tree: A new dynamical multidimensional index for metric spaces. In Proceedings of the Fourteenth Australasian Database Conference. (ADC'03, Adelaide, Australia). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Unified framework for fast exact and approximate search in dissimilarity spaces

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 32, Issue 4
        November 2007
        364 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/1292609
        Issue’s Table of Contents

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 November 2007
        Published in tods Volume 32, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader