skip to main content
research-article

SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index

Published:01 September 2014Publication History
Skip Abstract Section

Abstract

Nearest neighbor searches in high-dimensional space have many important applications in domains such as data mining, and multimedia databases. The problem is challenging due to the phenomenon called "curse of dimensionality". An alternative solution is to consider algorithms that returns a c-approximate nearest neighbor (c-ANN) with guaranteed probabilities. Locality Sensitive Hashing (LSH) is among the most widely adopted method, and it achieves high efficiency both in theory and practice. However, it is known to require an extremely high amount of space for indexing, hence limiting its scalability.

In this paper, we propose several surprisingly simple methods to answer c-ANN queries with theoretical guarantees requiring only a single tiny index. Our methods are highly flexible and support a variety of functionalities, such as finding the exact nearest neighbor with any given probability. In the experiment, our methods demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to 1 billion high-dimensional points on a single commodity PC.

References

  1. N. Ailon and B. Chazelle. Faster dimension reduction. Commun. ACM, 53(2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Arya, T. Malamatos, and D. M. Mount. Space-time tradeoffs for approximate nearest neighbor searching. J. ACM, 57(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. Bahmani, A. Goel, and R. Shinde. Efficient distributed locality sensitive hashing. In CIKM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Borodin, R. Ostrovsky, and Y. Rabani. Lower bounds for high dimensional nearest neighbor search and related problems. In STOC, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations (extended abstract). In STOC, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Cayton. Accelerating nearest neighbor search on manycore systems. In IPDPS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Chakrabarti, B. Chazelle, B. Gum, and A. Lvov. A lower bound on the complexity of approximate nearest-neighbor searching on the hamming cube. In STOC, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. M. Chan. Approximate nearest neighbor queries revisited. Discrete & Computational Geometry, 20(3), 1998.Google ScholarGoogle Scholar
  12. M. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Dasgupta and A. Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Struct. Algorithms, 22(1), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li. Modeling lsh for performance tuning. In CIKM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Fagin et al. Efficient similarity search and classification via rank aggregation. In SIGMOD Conference, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD Conference, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. E. Houle et al. Fast approximate similarity search in extremely high-dimensional data sets. In ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Indyk et al. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. V. Jagadish et al. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell., 33(1), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. B. Johnson et al. Extensions of lipschitz mapping into hilbert space. Contemporary Mathematics, 26, 1984.Google ScholarGoogle Scholar
  25. K. V. R. Kanth, S. Ravada, and D. Abugov. Quadtree and r-tree indexes in oracle spatial: a comparison using gis data. In SIGMOD Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In STOC, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Krauthgamer and J. R. Lee. Navigating nets: simple algorithms for proximity search. In SODA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In STOC, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5), 2000.Google ScholarGoogle ScholarCross RefCross Ref
  30. T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In NIPS, 2004.Google ScholarGoogle Scholar
  31. Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Meiser. Point location in arrangements of hyperplanes. Inf. Comput., 106(2), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. O'Donnell et al. Optimal lower bounds for locality sensitive hashing (except when q is tiny). In ICS, 2011.Google ScholarGoogle Scholar
  34. J. Pan and D. Manocha. Bi-level locality sensitive hashing for k-nearest neighbor computation. In ICDE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. V. Pestov. Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions. Algorithmica, 66(2), 2013.Google ScholarGoogle Scholar
  37. H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufman, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. V. Satuluri and S. Parthasarathy. Bayesian locality sensitive hashing for fast similarity search. PVLDB, 5(5), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. Shinde, A. Goel, P. Gupta, and D. Dutta. Similarity search and locality sensitive hashing using ternary content addressable memories. In SIGMOD Conference, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Slaney et al. Optimal parameters for locality-sensitive hashing. Proceedings of the IEEE, 100(9), 2012.Google ScholarGoogle ScholarCross RefCross Ref
  41. N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. PVLDB, 6(14), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst., 35(3), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. IEEE Trans. Knowl. Data Eng., 16(10), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. K. Ueno, X. Xi, E. J. Keogh, and D.-J. Lee. Anytime classification using the nearest neighbor algorithm with applications to stream mining. In ICDM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. R. Weber et al. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Y. Weiss et al. Spectral hashing. In NIPS, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. C.-C. Yao and F. F. Yao. A general approach to d-dimensional geometric queries (extended abstract). In STOC, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. Yin, M. Badr, and D. Vodislav. Dynamic multi-probe lsh: An i/o efficient index structure for approximate nearest neighbor search. In DEXA (1), 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 8, Issue 1
        September 2014
        100 pages
        ISSN:2150-8097
        Issue’s Table of Contents

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 September 2014
        Published in pvldb Volume 8, Issue 1

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader