research-article

SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index

Authors:
Yifang Sun

University of New South Wales, Australia

University of New South Wales, Australia
View Profile

,
Wei Wang

University of New South Wales, Australia

University of New South Wales, Australia
View Profile

,
Jianbin Qin

University of New South Wales, Australia

University of New South Wales, Australia
View Profile

,
Ying Zhang

University of Technology Sydney, Australia

University of Technology Sydney, Australia
View Profile

,
Xuemin Lin

University of New South Wales, Australia

University of New South Wales, Australia
View Profile

Proceedings of the VLDB Endowment Volume 8 Issue 1pp 1–12https://doi.org/10.14778/2735461.2735462

Published:01 September 2014Publication History

Proceedings of the VLDB Endowment

Abstract

Nearest neighbor searches in high-dimensional space have many important applications in domains such as data mining, and multimedia databases. The problem is challenging due to the phenomenon called "curse of dimensionality". An alternative solution is to consider algorithms that returns a c-approximate nearest neighbor (c-ANN) with guaranteed probabilities. Locality Sensitive Hashing (LSH) is among the most widely adopted method, and it achieves high efficiency both in theory and practice. However, it is known to require an extremely high amount of space for indexing, hence limiting its scalability.

In this paper, we propose several surprisingly simple methods to answer c-ANN queries with theoretical guarantees requiring only a single tiny index. Our methods are highly flexible and support a variety of functionalities, such as finding the exact nearest neighbor with any given probability. In the experiment, our methods demonstrate superior performance against the state-of-the-art LSH-based methods, and scale up well to 1 billion high-dimensional points on a single commodity PC.

References

N. Ailon and B. Chazelle. Faster dimension reduction. Commun. ACM, 53(2), 2010. Google ScholarDigital Library
A. Andoni, P. Indyk, H. L. Nguyen, and I. Razenshteyn. Beyond locality-sensitive hashing. In SODA, 2014. Google ScholarDigital Library
S. Arya, T. Malamatos, and D. M. Mount. Space-time tradeoffs for approximate nearest neighbor searching. J. ACM, 57(1), 2009. Google ScholarDigital Library
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM, 45(6), 1998. Google ScholarDigital Library
B. Bahmani, A. Goel, and R. Shinde. Efficient distributed locality sensitive hashing. In CIKM, 2012. Google ScholarDigital Library
A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In ICML, 2006. Google ScholarDigital Library
A. Borodin, R. Ostrovsky, and Y. Rabani. Lower bounds for high dimensional nearest neighbor search and related problems. In STOC, 1999. Google ScholarDigital Library
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations (extended abstract). In STOC, 1998. Google ScholarDigital Library
L. Cayton. Accelerating nearest neighbor search on manycore systems. In IPDPS, 2012. Google ScholarDigital Library
A. Chakrabarti, B. Chazelle, B. Gum, and A. Lvov. A lower bound on the complexity of approximate nearest-neighbor searching on the hamming cube. In STOC, 1999. Google ScholarDigital Library
T. M. Chan. Approximate nearest neighbor queries revisited. Discrete & Computational Geometry, 20(3), 1998.Google Scholar
M. Charikar. Similarity estimation techniques from rounding algorithms. In STOC, 2002. Google ScholarDigital Library
S. Dasgupta and A. Gupta. An elementary proof of a theorem of johnson and lindenstrauss. Random Struct. Algorithms, 22(1), 2003. Google ScholarDigital Library
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Symposium on Computational Geometry, 2004. Google ScholarDigital Library
W. Dong, Z. Wang, W. Josephson, M. Charikar, and K. Li. Modeling lsh for performance tuning. In CIKM, 2008. Google ScholarDigital Library
R. Fagin et al. Efficient similarity search and classification via rank aggregation. In SIGMOD Conference, 2003. Google ScholarDigital Library
J. Gan, J. Feng, Q. Fang, and W. Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In SIGMOD Conference, 2012. Google ScholarDigital Library
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarDigital Library
M. E. Houle et al. Fast approximate similarity search in extremely high-dimensional data sets. In ICDE, 2005. Google ScholarDigital Library
P. Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 2006. Google ScholarDigital Library
P. Indyk et al. Approximate nearest neighbors: Towards removing the curse of dimensionality. In STOC, 1998. Google ScholarDigital Library
H. V. Jagadish et al. idistance: An adaptive b⁺-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2), 2005. Google ScholarDigital Library
H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell., 33(1), 2011. Google ScholarDigital Library
W. B. Johnson et al. Extensions of lipschitz mapping into hilbert space. Contemporary Mathematics, 26, 1984.Google Scholar
K. V. R. Kanth, S. Ravada, and D. Abugov. Quadtree and r-tree indexes in oracle spatial: a comparison using gis data. In SIGMOD Conference, 2002. Google ScholarDigital Library
J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In STOC, 1997. Google ScholarDigital Library
R. Krauthgamer and J. R. Lee. Navigating nets: simple algorithms for proximity search. In SODA, 2004. Google ScholarDigital Library
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. In STOC, 1998. Google ScholarDigital Library
B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5), 2000.Google ScholarCross Ref
T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In NIPS, 2004.Google Scholar
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: Efficient indexing for high-dimensional similarity search. In VLDB, 2007. Google ScholarDigital Library
S. Meiser. Point location in arrangements of hyperplanes. Inf. Comput., 106(2), 1993. Google ScholarDigital Library
R. O'Donnell et al. Optimal lower bounds for locality sensitive hashing (except when q is tiny). In ICS, 2011.Google Scholar
J. Pan and D. Manocha. Bi-level locality sensitive hashing for k-nearest neighbor computation. In ICDE, 2012. Google ScholarDigital Library
R. Panigrahy. Entropy based nearest neighbor search in high dimensions. In SODA, 2006. Google ScholarDigital Library
V. Pestov. Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions. Algorithmica, 66(2), 2013.Google Scholar
H. Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufman, 2006. Google ScholarDigital Library
V. Satuluri and S. Parthasarathy. Bayesian locality sensitive hashing for fast similarity search. PVLDB, 5(5), 2012. Google ScholarDigital Library
R. Shinde, A. Goel, P. Gupta, and D. Dutta. Similarity search and locality sensitive hashing using ternary content addressable memories. In SIGMOD Conference, 2010. Google ScholarDigital Library
M. Slaney et al. Optimal parameters for locality-sensitive hashing. Proceedings of the IEEE, 100(9), 2012.Google ScholarCross Ref
N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. PVLDB, 6(14), 2013. Google ScholarDigital Library
Y. Tao, K. Yi, C. Sheng, and P. Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst., 35(3), 2010. Google ScholarDigital Library
Y. Tao, J. Zhang, D. Papadias, and N. Mamoulis. An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. IEEE Trans. Knowl. Data Eng., 16(10), 2004. Google ScholarDigital Library
K. Ueno, X. Xi, E. J. Keogh, and D.-J. Lee. Anytime classification using the nearest neighbor algorithm with applications to stream mining. In ICDM, 2006. Google ScholarDigital Library
R. Weber et al. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In VLDB, 1998. Google ScholarDigital Library
Y. Weiss et al. Spectral hashing. In NIPS, 2008.Google ScholarDigital Library
A. C.-C. Yao and F. F. Yao. A general approach to d-dimensional geometric queries (extended abstract). In STOC, 1985. Google ScholarDigital Library
S. Yin, M. Badr, and D. Vodislav. Dynamic multi-probe lsh: An i/o efficient index structure for approximate nearest neighbor search. In DEXA (1), 2013.Google ScholarDigital Library

Index Terms

SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Index terms have been assigned to the content through auto-classification.

Recommendations

Smart Root Search (SRS): A New Search Algorithm to Investigate Combinatorial Problems
CIMSIM '15: Proceedings of the 2015 Seventh International Conference on Computational Intelligence, Modelling and Simulation

In recent years researchers have tried to apply Stochastic Algorithms for solving Optimization problems. Some of these algorithms like Genetic Algorithm (GA), Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO) and Artificial Immune Systems ...
Read More
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07: Proceedings of the 33rd international conference on Very large data bases

Similarity indices for high-dimensional data are very desirable for building content-based search systems for feature-rich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been ...
Read More
An improved algorithm finding nearest neighbor using Kd-trees
LATIN'08: Proceedings of the 8th Latin American conference on Theoretical informatics

We suggest a simple modification to the Kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 8, Issue 1
September 2014
100 pages
ISSN:2150-8097
Editors:
Chen Li
University of California, Irvine
,
Volker Markl
TU Berlin
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2014
Published in pvldb Volume 8, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 53
  Total Citations
  View Citations
- 253
  Total Downloads
- Downloads (Last 12 months)17
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Smart Root Search (SRS): A New Search Algorithm to Investigate Combinatorial Problems

Multi-probe LSH: efficient indexing for high-dimensional similarity search

An improved algorithm finding nearest neighbor using Kd-trees

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Smart Root Search (SRS): A New Search Algorithm to Investigate Combinatorial Problems

Multi-probe LSH: efficient indexing for high-dimensional similarity search

An improved algorithm finding nearest neighbor using Kd-trees

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media