skip to main content
article

Searching in metric spaces

Authors Info & Claims
Published:01 September 2001Publication History
Skip Abstract Section

Abstract

The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. Many solutions have been proposed in different areas, in many cases without cross-knowledge. Because of this, the same ideas have been reconceived several times, and very different presentations have been given for the same approaches. We present some basic results that explain the intrinsic difficulty of the search problem. This includes a quantitative definition of the elusive concept of "intrinsic dimensionality." We also present a unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework. Most approaches turn out to be variations on a few different concepts. We organize those works in a taxonomy that allows us to devise new algorithms from combinations of concepts not noticed before because of the lack of communication between different communities. We present experiments validating our results and comparing the existing approaches. We finish with recommendations for practitioners and open questions for future development.

References

  1. APERS, P., BLANKEN, H., AND HOUTSMA, M. 1997. Multimedia Databases in Perspective. Springer- Verlag, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ARYA, S., MOUNT, D., NETANYAHU, N., SILVERMAN, R., AND WU, A. 1994. An optimal algorithm for approximate nearest neighbor searching in fixed dimension. In Proceedings of the Fifth ACM-SIAM Symposium on Discrete Algorithms (SODA'94), 573-583.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AURENHAMMER, F. 1991. Voronoi diagrams-A survey of a fundamental geometric data structure. ACM Comput. Surv. 23,3.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BAEZA-YATES, R. 1997. Searching: An algorithmic tour. In Encyclopedia of Computer Science and Technology, vol. 37, A. Kent and J. Williams, Eds., Marcel Dekker, New York, 331-359.]]Google ScholarGoogle Scholar
  5. BAEZA-YATES,R.AND NAVARRO, G. 1998. Fast approximate string matching in a dictionary. In Proceedings of the Fifth South American Symposium on String Processing and Information Retrieval (SPIRE'98), IEEE Computer Science Press, Los Alamitos, Calif., 14-22.]]Google ScholarGoogle ScholarCross RefCross Ref
  6. BAEZA-YATES,R.AND RIBEIRO-NETO, B. 1999. Modern Information Retrieval. Addison-Wesley, Reading, Mass.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. BAEZA-YATES, R., CUNTO, W., MANBER,U.,AND WU,S. 1994. Proximity matching using fixed-queries trees. In Proceedings of the Fifth Combinatorial Pattern Matching (CPM'94), Lecture Notes in Computer Science, vol. 807, 198-212.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. BENTLEY, J. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9, 509-517.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. BENTLEY, J. 1979. Multidimensional binary search trees in database applications. IEEE Trans. Softw. Eng. 5, 4, 333-340.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. BENTLEY, J., WEIDE,B.,AND YAO, A. 1980. Optimal expected-time algorithms for closest point problems. ACM Trans. Math. Softw. 6, 4, 563-580.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. BERCHTOLD, S., KEIM,D.,AND KRIEGEL, H. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd Conference on Very Large Databases (VLDB'96), 28-39.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. BHANU, B., PENG,J.,AND QING, S. 1998. Learning feature relevance and similarity metrics in image databases. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries (Santa Barbara, Calif.), IEEE Computer Society, Los Alamitos, Calif., 14-18.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. BIMBO,A.D.AND VICARIO, E. 1998. Using weighted spatial relationships in retrieval by visual contents. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries (Santa Barbara, Calif.), IEEE Computer Society, Los Alamitos, Calif., 35-39.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B~HM, C., BERCHTOLD,S.,AND KEIM, A. 2002. Searching in high-dimensional spaces-index structures for improving the performance of multimedia databases. ACM Comput. Surv. To appear.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. BOZKAYA,T.AND OZSOYOGLU, M. 1997. Distancebased indexing for high-dimensional metric spaces. In Proceedings of ACM SIGMOD International Conference on Management of Data, SIGMOD Rec. 26, 2, 357-368.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. BRIN, S. 1995. Near neighbor search in large metric spaces. In Proceedings of the 21st Conference on Very Large Databases (VLDB'95), 574-584.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. BRITO, M., CH~VEZ, E., QUIROZ, A., AND YUKICH,J. 1996. Connectivity of the mutual k-nearest neighbor graph in clustering and outlier detection. Stat. Probab. Lett. 35, 33-42.]]Google ScholarGoogle ScholarCross RefCross Ref
  18. BUGNION, E., FHEI, S., ROOS, T., WIDMAYER,P., AND WIDMER, F. 1993. A spatial index for approximate multiple string matching. In Proceedings of the First South American Workshop on String Processing (WSP'93), R. Baeza-Yates and N. Ziviani, Eds., 43-53.]]Google ScholarGoogle Scholar
  19. BURKHARD,W.AND KELLER, R. 1973. Some approaches to best-match file searching. Commun. ACM 16, 4, 230-236.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. CASCIA, M. L., SETHI,S.,AND SCLAROFF, S. 1998. Combining textual and visual cues for content-based image retrieval on the world wide web. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries (Santa Barbara, Calif.), IEEE Computer Society, Los Alamitos, Calif., 24-28.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. CH~VEZ, E. 1999. Optimal discretization for pivot based algorithms. Manuscript. ftp:// garota.fismat.umich.mx/pub/users/elchavez/ minimax.ps.gz.]]Google ScholarGoogle Scholar
  22. CH~VEZ,E.AND MARROQUYN, J. 1997. Proximity queries in metric spaces. In Proceedings of the Fourth South American Workshop on String Processing ( WSP'97), R. Baeza-Yates, Ed. Carleton University Press, Ottawa, 21-36.]]Google ScholarGoogle Scholar
  23. CH~VEZ,E.AND NAVARRO, G. 2000. An effective clustering algorithm to index high dimensional metric spaces. In Proceedings of the Seventh String Processing and Informational Retrieval (SPIRE'00), IEEE CS Press, 75-86.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. CH~VEZ,E.AND NAVARRO, G. 2001a. Towards measuring the searching complexity of metric spaces. In Proceedings of the Mexican Computing Meeting, vol. II, 969-972.]]Google ScholarGoogle Scholar
  25. CH~VEZ,E.AND NAVARRO, G. 2001b. A probabilistic spell for the curse of dimensionality. In Proceedings of the Third Workshop on Algorithm Engineering and Experimentation (ALENEX'01), 147-160. Lecture Notes in Computer Science v. 2153.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. CH~VEZ,E.AND NAVARRO, G. 2002. A metric index for approximate string matching. In Proceedings of the Fifth Latin American Symposium on Theoretical Informatics (LATIN'02), To appear, Lecture Notes in Computer Science.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. CH~VEZ, E., MARROQU~N,J.,AND BAEZA-YATES,R. 1999. Spaghettis: An array based algorithm for similarity queries in metric spaces. In Proceedings of String Processing and Information Retrieval (SPIRE'99), IEEE Computer Science Press, Los Alamitos, Calif., 38-46.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. CH~VEZ, E., MARROQUIN,J.L.,AND NAVARRO, G. 2001. Fixed queries array: a fast and economical data structure for proximity searching. Multimedia Tools and Applications 14 (2), 113-135. Kluwer.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. CHAZELLE, B. 1994. Computational geometry: A retrospective. In Proceedings of the 26th ACM Symposium on the Theory of Computing (STOC'94), 75-94.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. CHIUEH, T. 1994. Content-based image indexing. In Proceedings of the Twentieth Conference on Very Large Databases ( VLDB'94), 582-593.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. CIACCIA, P., PATELLA, M., AND ZEZULA, P. 1997. M- tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd Conference on Very Large Databases ( VLDB'97), 426-435.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. CIACCIA, P., PATELLA, M., AND ZEZULA, P. 1998a. A cost model for similarity queries in metric spaces. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'98).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. CIACCIA, P., PATELLA, M., AND ZEZULA, P. 1998b. Processing complex similarity queries with distance-based access methods. In Proceedings of the Sixth International Conference on Extending Database Technology (EDBT'98).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. CLARKSON, K. 1999. Nearest neighbor queries in metric spaces. Discrete Comput. Geom. 22,1, 63-93.]]Google ScholarGoogle ScholarCross RefCross Ref
  35. COX,T.AND COX, M. 1994. Multidimensional Scaling. Chapman and Hall, London.]]Google ScholarGoogle Scholar
  36. DEHNE,F.AND NOLTEMEIER, H. 1987. Voronoi trees and clustering problems. Inf. Syst. 12,2, 171-175.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. DEVROYE, L. 1987. A Course in Density Estimation. Birkhauser, Boston.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. DUDA,R.AND HART, P. 1973. Pattern Classification and Scene Analysis. Wiley, New York.]]Google ScholarGoogle Scholar
  39. FALOUTSOS,C.AND KAMEL, I. 1994. Beyond uniformity and independence: Analysis of R-trees using the concept of fractal dimension. In Proceedings of the Thirteenth ACM Symposium on Principles of Database Principles (PODS'94), 4-13.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. FALOUTSOS,C.AND LIN, K. 1995. Fastmap: A fast algorithm for indexing, data mining and visualization of traditional and multimedia datasets. ACM SIGMOD Rec. 24, 2, 163-174.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. FALOUTSOS, C., EQUITZ, W., FLICKNER, M., NIBLACK,W., PETKOVIC,D.,AND BARBER, R. 1994. Efficient and effective querying by image content. J. Intell. Inf. Syst. 3, 3/4, 231-262.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. FARAG~, A., LINDER,T.,AND LUGOSI, G. 1993. Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans. Patt. Anal. Mach. Intell. 15,9, 957-962.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. FRAKES,W.AND BAEZA-YATES, R., EDS. 1992. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs, N.J.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. GAEDE,V.AND GUNTHER, O. 1998. Multidimensional access methods. ACMComput. Surv. 30,2, 170-231.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. GUTTMAN, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 47-57.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. HAIR, J., ANDERSON, R., TATHAM, R., AND BLACK,W. 1995. Multivariate Data Analysis with Readings, 4th ed. Prentice-Hall, Englewood Cliffs, N.J.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. HJALTASON,G.AND SAMET, H. 1995. Ranking in spatial databases. In Proceedings of Fourth International Symposium on Large Spatial Databases, 83-95.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. JAIN,A.AND DUBES, R. 1988. Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs, N.J.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. KALANTARI,I.AND MCDONALD, G. 1983. A data structure and an algorithm for the nearest point problem. IEEE Trans. Softw. Eng. 9,5.]]Google ScholarGoogle Scholar
  50. MEHLHORN, K. 1984. Data Structures and Algorithms, Volume III "Multidimensional Searching and Computational Geometry. Springer-Verlag, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. MICO, L., ONCINA,J.,AND CARRASCO, R. 1996. A fast branch and bound nearest neighbour classifier in metric spaces. Patt. Recogn. Lett. 17, 731-739.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. MICO, L., ONCINA,J.,AND VIDAL, E. 1994. A new version of the nearest-neighbor approximating and eliminating search (AESA) with linear preprocessing-time and memory requirements. Patt. Recog. Lett. 15, 9-17.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. NAVARRO, G. 1999. Searching in metric spaces by spatial approximation. In Proceedings of String Processing and Information Retrieval (SPIRE'99), IEEE Computer Science Press, Los Alamitos, Calif., 141-148.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. NAVARRO,G.AND REYES, N. 2001. Dynamic spatial approximation trees. In Proceedings of the Eleventh Conference of the Chilean Computer Science Society (SCCC'01), IEEE CS Press, 213-222.]]Google ScholarGoogle ScholarCross RefCross Ref
  55. NENE,S.AND NAYAR, S. 1997. A simple algorithm for nearest neighbor search in high dimensions. IEEE Trans. Patt. Anal. Mach. Intell. 19,9, 989-1003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. NIEVERGELT,J.AND HINTERBERGER, H. 1984. The grid file: An adaptable, symmetric multikey file structure. ACM Trans. Database Syst. 9,1, 38-71.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. NOLTEMEIER, H. 1989. Voronoi trees and applications. In Proceedings of the International Workshop on Discrete Algorithms and Complex-ity (Fukuoka, Japan), 69-74.]]Google ScholarGoogle Scholar
  58. NOLTEMEIER, H., VERBARG, K., AND ZIRKELBACH,C. 1992. Monotonous bisector" treesA tool for efficient partitioning of complex schenes of geometric objects. In Data Structures and Efficient Algorithms, Lecture Notes in Computer Science, vol. 594, Springer-Verlag, New York, 186-203.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. PRABHAKAR, S., AGRAWAL,D.,AND ABBADI, A. E. 1998. Efficient disk allocation for fast similarity searching. In Proceedings of ACM SPAA'98 (Puerto Vallarta, Mexico).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. ROUSSOPOULOS, N., KELLEY,S.,AND VINCENT, F. 1995. Nearest neighbor queries. In Proceedings of the ACM SIGMOD'95, 71-79.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. SALTON,G.AND MCGILL, M. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. SAMET, H. 1984. The quadtree and related hierarchical data structures. ACM Comput. Surv. 16, 2, 187-260.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. SANKOFF,D.AND KRUSKAL, J., EDS. 1983. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, Mass.]]Google ScholarGoogle Scholar
  64. SASHA,D.AND WANG, T. 1990. New techniques for best-match retrieval. ACM Trans. Inf. Syst. 8,2, 140-158.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. SHAPIRO, M. 1977. The choice of reference points in best-match file searching. Commun. ACM 20, 5, 339-343.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. SUTTON,R.AND BARTO, A. 1998. Reinforcement Learning : An Introduction. MIT Press, Cambridge, Mass.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. UHLMANN, J. 1991a. Implementing metric trees to satisfy general proximity/similarity queries. Manuscript.]]Google ScholarGoogle Scholar
  68. UHLMANN, J. 1991b. Satisfying general proximity/ similarity queries with metric trees. Inf. Proc. Lett. 40, 175-179.]]Google ScholarGoogle ScholarCross RefCross Ref
  69. VERBARG, K. 1995. The C-Ttree: A dynamically balanced spatial index. Comput. Suppl. 10, 323-340.]]Google ScholarGoogle ScholarCross RefCross Ref
  70. VIDAL, E. 1986. An algorithm for finding nearest neighbors in (approximately) constant average time. Patt. Recogn. Lett. 4, 145-157.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. WATERMAN, M. 1995. Introduction to Computational Biology. Chapman and Hall, London.]]Google ScholarGoogle Scholar
  72. WEBER,R.SCHEK, H.-J., AND BLOTT, S. 1998. A quantitative analysis and performance study for similarity-search methods in highdimensional spaces. In Proceedings of the International Conference on Very Large Databases (VLDB'98).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. WHITE,D.AND JAIN, R. 1996. Algorithms and strategies for similarity retrieval. Tech. Rep. VCL-96-101 (July), Visual Computing Laboratory, University of California, La Jolla.]]Google ScholarGoogle Scholar
  74. YAO, A. 1990. Computational Geometry,J.Van Leeuwen, Ed. Elsevier Science, New York, 345-380.]]Google ScholarGoogle Scholar
  75. YIANILOS, P. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the Fourth ACM-SIAM Symposium on Discrete Algorithms (SODA'93), 311-321.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. YIANILOS, P. 1999. Excluded middle vantage point forests for nearest neighbor search. In DIMACS Implementation Challenge, ALENEX'99 (Baltimore, Md).]]Google ScholarGoogle Scholar
  77. YIANILOS, P. 2000. Locally lifting the curse of dimensionality for nearest neighbor search. In Proceedings of the Eleventh ACM-SIAM Symposium on Discrete Algorithms (SODA'00), 361-370.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. YOSHITAKA,A.AND ICHIKAWA, T. 1999. A survey on content-based retrieval for multimedia databases. IEEE Trans. Knowl. Data Eng. 11,1, 81-93.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Searching in metric spaces

                        Recommendations

                        Comments

                        Login options

                        Check if you have access through your login credentials or your institution to get full access on this article.

                        Sign in

                        Full Access

                        • Published in

                          cover image ACM Computing Surveys
                          ACM Computing Surveys  Volume 33, Issue 3
                          September 2001
                          153 pages
                          ISSN:0360-0300
                          EISSN:1557-7341
                          DOI:10.1145/502807
                          Issue’s Table of Contents

                          Copyright © 2001 ACM

                          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                          Publisher

                          Association for Computing Machinery

                          New York, NY, United States

                          Publication History

                          • Published: 1 September 2001
                          Published in csur Volume 33, Issue 3

                          Permissions

                          Request permissions about this article.

                          Request Permissions

                          Check for updates

                          Qualifiers

                          • article

                        PDF Format

                        View or Download as a PDF file.

                        PDF

                        eReader

                        View online with eReader.

                        eReader