Abstract
Nearest neighbor searches in high-dimensional space have many important applications in domains such as data mining, and multimedia databases. The problem is challenging due to the phenomenon called "curse of dimensionality". An alternative solution is to consider algorithms that returns a c-approximate nearest neighbor (c-ANN) with guaranteed probabilities. Locality Sensitive Hashing (LSH) is among the most widely adopted method, and it achieves high efficiency both in theory and practice. However, it is known to require an extremely high amount of space for indexing, hence limiting its scalability.
In this paper, we propose several surprisingly simple methods to answer c-ANN queries with theoretical guarantees requiring only a single tiny index. Our methods are highly flexible and support a variety of functionalities, such as finding the exact nearest neighbor with any given probability. In the experiment, our methods demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to 1 billion high-dimensional points on a single commodity PC.
- N. Ailon and B. Chazelle. Faster dimension reduction. Commun. ACM, 53(2), 2010. Google ScholarDigital Library
- A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA, 2014. Google ScholarDigital Library
- S. Arya, T. Malamatos, and D. M. Mount. Space-time tradeoffs for approximate nearest neighbor searching. J. ACM, 57(1), 2009. Google ScholarDigital Library
- S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6), 1998. Google ScholarDigital Library
- B. Bahmani, A. Goel, and R. Shinde. Efficient distributed locality sensitive hashing. In CIKM, 2012. Google ScholarDigital Library
- A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In ICML, 2006. Google ScholarDigital Library
- A. Borodin, R. Ostrovsky, and Y. Rabani. Lower bounds for high dimensional nearest neighbor search and related problems. In STOC, 1999. Google ScholarDigital Library
- A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations (extended abstract). In STOC, 1998. Google ScholarDigital Library
- L. Cayton. Accelerating nearest neighbor search on manycore systems. In IPDPS, 2012. Google ScholarDigital Library
- A. Chakrabarti, B. Chazelle, B. Gum, and A. Lvov. A lower bound on the complexity of approximate nearest-neighbor searching on the hamming cube. In STOC, 1999. Google ScholarDigital Library
- T. M. Chan. Approximate nearest neighbor queries revisited. Discrete & Computational Geometry, 20(3), 1998.Google Scholar
- M. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002. Google ScholarDigital Library
- S. Dasgupta and A. Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Struct. Algorithms, 22(1), 2003. Google ScholarDigital Library
- M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, 2004. Google ScholarDigital Library
- W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li. Modeling lsh for performance tuning. In CIKM, 2008. Google ScholarDigital Library
- R. Fagin et al. Efficient similarity search and classification via rank aggregation. In SIGMOD Conference, 2003. Google ScholarDigital Library
- J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD Conference, 2012. Google ScholarDigital Library
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarDigital Library
- M. E. Houle et al. Fast approximate similarity search in extremely high-dimensional data sets. In ICDE, 2005. Google ScholarDigital Library
- P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 2006. Google ScholarDigital Library
- P. Indyk et al. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, 1998. Google ScholarDigital Library
- H. V. Jagadish et al. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2), 2005. Google ScholarDigital Library
- H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell., 33(1), 2011. Google ScholarDigital Library
- W. B. Johnson et al. Extensions of lipschitz mapping into hilbert space. Contemporary Mathematics, 26, 1984.Google Scholar
- K. V. R. Kanth, S. Ravada, and D. Abugov. Quadtree and r-tree indexes in oracle spatial: a comparison using gis data. In SIGMOD Conference, 2002. Google ScholarDigital Library
- J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In STOC, 1997. Google ScholarDigital Library
- R. Krauthgamer and J. R. Lee. Navigating nets: simple algorithms for proximity search. In SODA, 2004. Google ScholarDigital Library
- E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In STOC, 1998. Google ScholarDigital Library
- B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5), 2000.Google ScholarCross Ref
- T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In NIPS, 2004.Google Scholar
- Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB, 2007. Google ScholarDigital Library
- S. Meiser. Point location in arrangements of hyperplanes. Inf. Comput., 106(2), 1993. Google ScholarDigital Library
- R. O'Donnell et al. Optimal lower bounds for locality sensitive hashing (except when q is tiny). In ICS, 2011.Google Scholar
- J. Pan and D. Manocha. Bi-level locality sensitive hashing for k-nearest neighbor computation. In ICDE, 2012. Google ScholarDigital Library
- R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA, 2006. Google ScholarDigital Library
- V. Pestov. Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions. Algorithmica, 66(2), 2013.Google Scholar
- H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufman, 2006. Google ScholarDigital Library
- V. Satuluri and S. Parthasarathy. Bayesian locality sensitive hashing for fast similarity search. PVLDB, 5(5), 2012. Google ScholarDigital Library
- R. Shinde, A. Goel, P. Gupta, and D. Dutta. Similarity search and locality sensitive hashing using ternary content addressable memories. In SIGMOD Conference, 2010. Google ScholarDigital Library
- M. Slaney et al. Optimal parameters for locality-sensitive hashing. Proceedings of the IEEE, 100(9), 2012.Google ScholarCross Ref
- N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. PVLDB, 6(14), 2013. Google ScholarDigital Library
- Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst., 35(3), 2010. Google ScholarDigital Library
- Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. IEEE Trans. Knowl. Data Eng., 16(10), 2004. Google ScholarDigital Library
- K. Ueno, X. Xi, E. J. Keogh, and D.-J. Lee. Anytime classification using the nearest neighbor algorithm with applications to stream mining. In ICDM, 2006. Google ScholarDigital Library
- R. Weber et al. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, 1998. Google ScholarDigital Library
- Y. Weiss et al. Spectral hashing. In NIPS, 2008.Google ScholarDigital Library
- A. C.-C. Yao and F. F. Yao. A general approach to d-dimensional geometric queries (extended abstract). In STOC, 1985. Google ScholarDigital Library
- S. Yin, M. Badr, and D. Vodislav. Dynamic multi-probe lsh: An i/o efficient index structure for approximate nearest neighbor search. In DEXA (1), 2013.Google ScholarDigital Library
Index Terms
- SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index
Recommendations
Smart Root Search (SRS): A New Search Algorithm to Investigate Combinatorial Problems
CIMSIM '15: Proceedings of the 2015 Seventh International Conference on Computational Intelligence, Modelling and SimulationIn recent years researchers have tried to apply Stochastic Algorithms for solving Optimization problems. Some of these algorithms like Genetic Algorithm (GA), Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO) and Artificial Immune Systems ...
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07: Proceedings of the 33rd international conference on Very large data basesSimilarity indices for high-dimensional data are very desirable for building content-based search systems for feature-rich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been ...
An improved algorithm finding nearest neighbor using Kd-trees
LATIN'08: Proceedings of the 8th Latin American conference on Theoretical informaticsWe suggest a simple modification to the Kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades ...
Comments