Abstract
In multimedia systems we usually need to retrieve database (DB) objects based on their similarity to a query object, while the similarity assessment is provided by a measure which defines a (dis)similarity score for every pair of DB objects. In most existing applications, the similarity measure is required to be a metric, where the triangle inequality is utilized to speed up the search for relevant objects by use of metric access methods (MAMs), for example, the M-tree. A recent research has shown, however, that nonmetric measures are more appropriate for similarity modeling due to their robustness and ease to model a made-to-measure similarity. Unfortunately, due to the lack of triangle inequality, the nonmetric measures cannot be directly utilized by MAMs. From another point of view, some sophisticated similarity measures could be available in a black-box nonanalytic form (e.g., as an algorithm or even a hardware device), where no information about their topological properties is provided, so we have to consider them as nonmetric measures as well. From yet another point of view, the concept of similarity measuring itself is inherently imprecise and we often prefer fast but approximate retrieval over an exact but slower one.
To date, the mentioned aspects of similarity retrieval have been solved separately, that is, exact versus approximate search or metric versus nonmetric search. In this article we introduce a similarity retrieval framework which incorporates both of the aspects into a single unified model. Based on the framework, we show that for any dissimilarity measure (either a metric or nonmetric) we are able to change the “amount” of triangle inequality, and so obtain an approximate or full metric which can be used for MAM-based retrieval. Due to the varying “amount” of triangle inequality, the measure is modified in a way suitable for either an exact but slower or an approximate but faster retrieval. Additionally, we introduce the TriGen algorithm aimed at constructing the desired modification of any black-box distance automatically, using just a small fraction of the database.
- Aggarwal, C. C., Hinneburg, A., and Keim, D. A. 2001. On the surprising behavior of distance metrics in high dimensional spaces. In Proceedings of Database Theory---ICDT 2001: 8th International Conference, London, UK, 2001. Lecture Notes in Computer Science, vol. 1973. Springer, Berlin, Germany, 420--434. Google ScholarDigital Library
- Aggarwal, C. C. and Yu, P. S. 2000. The IGrid index: Reversing the dimensionality curse for similarity indexing in high dimensional space. In KDD '00: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY, 119--129. Google ScholarDigital Library
- Amato, G., Rabitti, F., Savino, P., and Zezula, P. 2003. Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inform. Syst. 21, 2, 192--227. Google ScholarDigital Library
- Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., and Wu, A. Y. 1998. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. J. ACM 45, 6, 891--923. Google ScholarDigital Library
- Ashby, F. and Perrin, N. 1988. Toward a unified theory of similarity and recognition. Psych. Rev. 95, 1, 124--150.Google ScholarCross Ref
- Athitsos, V., Hadjieleftheriou, M., Kollios, G., and Sclaroff, S. 2005. Query-sensitive embeddings. In SIGMOD '05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. ACM Press, New York, NY, 706--717. Google ScholarDigital Library
- Baeza-Yates, R. A. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley Longman Publishing, Reading, MA. Google ScholarDigital Library
- Bartolini, I., Ciaccia, P., and Patella, M. 2005. WARP: Accurate Retrieval of Shapes Using Phase of Fourier Descriptors and Time Warping Distance. IEEE Patt. Analys. Mach. Intell. 27, 1, 142--147. Google ScholarDigital Library
- Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. 2000. Genbank. Nucl. Acids Res 28, 1 (Jan.), 15--18.Google ScholarCross Ref
- Böhm, C., Berchtold, S., and Keim, D. 2001. Searching in high-dimensional spaces---index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33, 3, 322--373. Google ScholarDigital Library
- Bozkaya, T. and Özsoyoglu, M. 1999. Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst. 24, 3, 361--404. Google ScholarDigital Library
- Brambilla, C., Ventura, A. D., Gagliardi, I., and Schettini, R. 1999. Multiresolution wavelet transform and supervised learning for content-based image retrieval. In Proceedings of the 1999 IEEE Internet Conference on Multimedia Computing and Systems (ICMMCS'99), vol. 1. 9183.Google Scholar
- Brin, S. 1995. Near neighbor search in large metric spaces. In Proceedings of the 21th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 574--584. Google ScholarDigital Library
- Bustos, B., Keim, D., and Schreck, T. 2005a. A pivot-based index structure for combination of feature vectors. In Proceedings of the 20th Annual ACM Symposium on Applied Computing, Multimedia and Visualization Track (SAC-MV'05). ACM Press, New York, NY, 1180--1184. Google ScholarDigital Library
- Bustos, B., Keim, D. A., Saupe, D., Schreck, T., and Vranic, D. V. 2005b. Feature-based similarity search in 3D object databases. ACM Comput. Surv. 37, 4, 345--387. Google ScholarDigital Library
- Bustos, B. and Navarro, G. 2004. Probabilistic proximity search algorithms based on compact partitions. J. Discrete Alg. 2, 1, 115--134. Google ScholarDigital Library
- Bustos, B., Navarro, G., and Chávez, E. 2003. Pivot selection techniques for proximity searching in metric spaces. Patt. Recog. Lett. 24, 14, 2357--2366. Google ScholarDigital Library
- Bustos, B. and Skopal, T. 2006. Dynamic similarity search in multi-metric spaces. In Proceedings of the ACM Multimedia, MIR Workshop. ACM Press, New York, NY, 137--146. Google ScholarDigital Library
- Chávez, E. and Navarro, G. 2001. A Probabilistic spell for the curse of dimensionality. In ALENEX'01. Lecture Notes in Computer Science, vol. 2153. Springer, Berlin, Germany, 147--160. Google ScholarDigital Library
- Chávez, E., Navarro, G., Baeza-Yates, R., and Marroquín, J. L. 2001. Searching in metric spaces. ACM Comput. Surv. 33, 3, 273--321. Google ScholarDigital Library
- Ciaccia, P. and Patella, M. 2000. The M<sup>2</sup>-tree: Processing complex multi-feature queries with just one index. In Proceedings of the DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries (Zurich, Switzerland).Google Scholar
- Ciaccia, P. and Patella, M. 2002. Searching in metric spaces with user-defined and approximate distances. ACM Database Syst. 27, 4, 398--437. Google ScholarDigital Library
- Ciaccia, P., Patella, M., and Zezula, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of VLDB'97. 426--435. Google ScholarDigital Library
- Corazza, P. 1999. Introduction to metric-preserving functions. Amer. Math. Monthly 104, 4, 309--23.Google ScholarCross Ref
- Corboy, A., Raicu, D., and Furst, J. 2005. Texture-based image retrieval for computerized tomography databases. In CBMS '05: Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05). IEEE Computer Society Press, Los Alamitos, CA, 593--598. Google ScholarDigital Library
- Dohnal, V., Gennaro, C., Savino, P., and Zezula, P. 2003. D-index: Distance searching index for metric data sets. Multimed. Tools Appl. 21, 1, 9--33. Google ScholarDigital Library
- Donahue, M., Geiger, D., Liu, T., and Hummel, R. 1996. Sparse representations for image decomposition with occlusions. In Proceedings of CVPR. 7--12. Google ScholarDigital Library
- Faloutsos, C. and Kamel, I. 1994. Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension. In PODS'94: Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 4--13. Google ScholarDigital Library
- Faloutsos, C. and Lin, K. 1995. Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of SIGMOD. Google ScholarDigital Library
- Farago, A., Linder, T., and Lugosi, G. 1993. Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans. Patt. Analys. Mach. Intell. 15, 9, 957--962. Google ScholarDigital Library
- Filho, R. F. S., Traina, A. J. M., Traina, C., and Faloutsos, C. 2001. Similarity search without tears: The OMNI family of all-purpose access methods. In Proceedings of ICDE. Google ScholarDigital Library
- Freeman, M. 2006. Evaluating dataflow and pipelined vector processing architectures for FPGA co-processors. In DSD'06: Proceedings of the 9th EUROMICRO Conference on Digital System Design. IEEE Computer Society Press, Los Alamitos, CA, 127--130. Google ScholarDigital Library
- Goh, K.-S., Li, B., and Chang, E. 2002. DynDex: A dynamic and non-metric space indexer. In Proceedings of the Tenth ACM Internet Conference on Multimedia. ACM Press, New York, NY, 466--475. Google ScholarDigital Library
- Goldstein, J. and Ramakrishnan, R. 2000. Contrast plots and p-sphere trees: Space vs. time in nearest neighbour searches. In VLDB'00: Proceedings of the 26th International Conference on Very Large Data Bases. Morgan Kaufmann, San Francisco, CA, 429--440. Google ScholarDigital Library
- Guo, G. D., Jain, A. K., Ma, W. Y., and Zhang, H. J. 2002. Learning similarity measure for natural image retrieval with relevance feedback. IEEE Neural Netw. 13, 4, 811--820.Google ScholarDigital Library
- Hart, P. 1968. The condensed nearest neighbour rule. IEEE Trans. Inform. Theor. 14, 3, 515--516.Google ScholarDigital Library
- Hettich, S. and Bay, S. 1999. The UCI KDD archive. Available online at http://kdd.ics.uci.edu.Google Scholar
- Hjaltason, G. R. and Samet, H. 2003. Properties of embedding methods for similarity searching in metric spaces. IEEE Patt. Analys. Mach. Intell. 25, 5, 530--549. Google ScholarDigital Library
- Howarth, P. and Ruger, S. 2005. Fractional distance measures for content-based image retrieval. In ECIR 2005. Lecture Notes in Computer Science, vol. 3408. Springer-Verlag, Berlin, Germany, 447--456. Google ScholarDigital Library
- Huttenlocher, D., Klanderman, G., and Rucklidge, W. 1993. Comparing images using the hausdorff distance. IEEE Patt. Analys. Mach. Intell. 15, 9, 850--863. Google ScholarDigital Library
- Jacobs, D., Weinshall, D., and Gdalyahu, Y. 2000. Classification with nonmetric distances: Image retrieval and class representation. IEEE Patt. Analys. Mach. Intell. 22, 6, 583--600. Google ScholarDigital Library
- Jesorsky, O., Kirchberg, K. J., and Frischholz, R. 2001. Robust face detection using the hausdorff distance. In AVBPA. Lecture Notes in Computer Science, vol. 2091, Springer-Verlag, Berlin, Germany, 90--95. Google ScholarDigital Library
- Kao, D., Bergeron, R., and Sparr, T. 1997. Mapping metric data to multidimensional spaces, Tech. Rep. TR 97-13. Department of Computer Science, University of New Hampshire, Durham, NH.Google Scholar
- Keogh, E. J. and Ratanamahatana, C. A. 2005. Exact indexing of dynamic time warping. Knowl. Inform. Syst. 7, 3, 358--386.Google ScholarDigital Library
- Krumhansl, C. L. 1978. Concerning the applicability of geometric models to similar data: The interrelationship between similarity and spatial density. Psych. Rev. 85, 5, 445--463.Google ScholarCross Ref
- Kruskal, J. B. 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1, 1--27.Google ScholarCross Ref
- Li, C., Chang, E., Garcia-Molina, H., and Wiederhold, G. 2002. Clustering for approximate similarity search in high-dimensional spaces. IEEE Trans. Knowl. Data Eng. 14, 4, 792--808. Google ScholarDigital Library
- Li, H., Shi, R., Chen, W., and Shen, I.-F. 2006. Image tangent space for image retrieval. In Proceedings of the 18th IEEE International Conference on Pattern Recognition, Vol 2, 1126--1130. Google ScholarDigital Library
- Mandl, T. 1998. Learning similarity functions in information retrieval. In Proceedings of EUFIT.Google Scholar
- Micó, M. L., Oncina, J., and Vidal, E. 1992. An algorithm for finding nearest neighbour in constant average time with a linear space complexity. In Proceedings of the International Conference Pattern Recognition.Google Scholar
- Mukherjee, A. 1989. Hardware algorithms for determining similarity between two strings. IEEE Trans. Comput. 38, 4, 600--603. Google ScholarDigital Library
- Nierman, A. and Jagadish, H. V. 2002. Evaluating structural similarity in XML documents. In Proceedings of the Fifth International Workshop on the Web and Databases. (WebDB 2002) (Madison, WI).Google Scholar
- Rosch, E. 1975. Cognitive reference points. Cog. Psych. 7, 532--47.Google ScholarCross Ref
- Roth, V., Laub, J., Buhmann, J., and Muller, K. 2002. Going metric: Denoising pairwise data. In Proceedings of the Neural Information Processing Systems Conference.Google Scholar
- Rothkopf, E. 1957. A measure of stimulus similarity and errors in some paired-associate learning tasks. J. Exp. Psych. 53, 2, 94--101.Google ScholarCross Ref
- Rubner, Y., Puzicha, J., Tomasi, C., and Buhmann, J. M. 2001. Empirical evaluation of dissimilarity measures for color and texture. Comput. Vis. Image Underst. 84, 1, 25--43. Google ScholarDigital Library
- Rubner, Y., Tomasi, C., and Guibas, L. J. 2000. The earth mover's distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 2, 99--121. Google ScholarDigital Library
- Sankoff, D. and Kruskal, J., Eds. 1983. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, MA.Google Scholar
- Santini, S. and Jain, R. 1999. Similarity measures. IEEE Patt. Analys. Mach. Intell. 21, 9, 871--883. Google ScholarDigital Library
- Skopal, T. 2006. On fast non-metric similarity search by metric access methods. In Proceedings of the 10th International Conference on Extending Database Technology (EDBT'06). Lecture Notes in Computer Science, vol. 3896. Springer, Berlin, Germany, 718--736. Google ScholarDigital Library
- Skopal, T., Moravec, P., Pokorný, J., and Snášel, V. 2004. Metric indexing for the vector model in text retrieval. In SPIRE, Padova, Italy. Lecture Notes in Computer Science, 3246, Springer, Berlin, Germany, 183--195.Google Scholar
- Skopal, T., Pokorný, J., Krátký, M., and Snášel, V. 2003. Revisiting M-tree building principles. In ADBIS, Dresden. Lecture Notes in Computer Science, vol. 2798. Springer, Berlin, Germany, 148--162.Google Scholar
- Skopal, T., Pokorný, J., and Snášel, V. 2005. Nearest neighbours search using the PM-tree. In DASFAA '05, Beijing, China. Lecture Notes in Computer Science, vol. 3453, Springer, Berlin, Germany, 803--815. Google ScholarDigital Library
- Traina Jr., C., Traina, A., Seeger, B., and Faloutsos, C. 2000. Slim-trees: High performance metric trees minimizing overlap between nodes. Lecture Notes in Computer Science, vol. 1777. Springer, Berlin, Germany. Google ScholarDigital Library
- Tuncel, E., Ferhatosmanoglu, H., and Rose, K. 2002. VQ-index: An index structure for similarity searching in multimedia databases. In MULTIMEDIA '02: Proceedings of the Tenth ACM International Conference on Multimedia. ACM Press, New York, NY, 543--552. Google ScholarDigital Library
- Tversky, A. 1977. Features of similarity. Psych. Rev. 84, 4, 327--352.Google ScholarCross Ref
- Tversky, A. and Gati, I. 1982. Similarity, separability, and the triangle inequality. Psych. Rev. 89, 2, 123--154.Google ScholarCross Ref
- Uhlmann, J. K. 1991. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40, 4, 175--179.Google ScholarCross Ref
- Volmer, S. 2002. Buoy indexing of metric feature spaces for fast approximate image queries. In Proceedings of the Sixth Eurographics Workshop on Multimedia 2001. Springer-Verlag, New York, NY, 131--140. Google ScholarDigital Library
- Weber, R., Schek, H.-J., and Blott, S. 1998. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of VLDB. Google ScholarDigital Library
- Wilson, D. L. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybernet. 2, 3, 408--421.Google ScholarCross Ref
- Yi, B.-K., Jagadish, H. V., and Faloutsos, C. 1998. Efficient retrieval of similar time sequences under time warping. In Proceedings of ICDE '98. 201--208. Google ScholarDigital Library
- Yianilos, P. N. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of ACM SIAM SODA. 311--321. Google ScholarDigital Library
- Zezula, P., Amato, G., Dohnal, V., and Batko, M. 2005. Similarity Search: The Metric Space Approach (Advances in Database Systems). Springer-Verlag, New York, NY. Google Scholar
- Zezula, P., Savino, P., Amato, G., and Rabitti, F. 1998. Approximate similarity retrieval with M-trees. VLDB J. 7, 4, 275--293. Google ScholarDigital Library
- Zhou, X., Wang, G., Xu, J. Y., and Yu, G. 2003. M<sup>+</sup>-tree: A new dynamical multidimensional index for metric spaces. In Proceedings of the Fourteenth Australasian Database Conference. (ADC'03, Adelaide, Australia). Google ScholarDigital Library
Index Terms
- Unified framework for fast exact and approximate search in dissimilarity spaces
Recommendations
On nonmetric similarity search problems in complex domains
The task of similarity search is widely used in various areas of computing, including multimedia databases, data mining, bioinformatics, social networks, etc. In fact, retrieval of semantically unstructured data entities requires a form of aggregated ...
Approximate Matching in ACSM Dissimilarity Measure
The paper introduces a new patch-based dissimilarity measure for image comparison employing an approximation strategy. It extends the Average Common Sub-matrix measure computing the exact dissimilarity among images. In the exact method, dissimilarity ...
Exact and approximate flexible aggregate similarity search
Aggregate similarity search, also known as aggregate nearest-neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where ...
Comments