Abstract
Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many multimedia retrieval applications. Exact tree-based approaches are known to suffer from the notorious curse of dimensionality for high-dimensional data. Approximate searching techniques sacrifice some accuracy while returning good enough results for faster performance. Locality Sensitive Hashing (LSH) is a popular technique for finding approximate nearest neighbors. There are two main benefits of LSH techniques: they provide theoretical guarantees on the query results, and they are highly scalable. The most dominant costs for existing external memory-based LSH techniques are algorithm time and index I/Os required to find candidate points. Existing works do not compare both of these costs in their evaluation. In this experimental survey paper, we show the impact of both these costs on the overall performance. We compare three state-of-the-art techniques on six real-world datasets, and show the importance of comparing these costs to achieve a more fair comparison.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
These implementations will be made public.
- 2.
We refer the reader to a recent survey [14] for an in-depth survey on these categories.
- 3.
Supported by NSF Award #1337884.
References
Arora, A., et al.: HD-index: pushing the scalability-accuracy boundary for approximate kNN search. In: VLDB (2018)
Babenko, A., et al.: Efficient indexing of billion-scale datasets of deep descriptors. In: CVPR (2016)
Bawa, M., et al.: LSH forest: self-tuning indexes for similarity search. In: WWW (2005)
Chávez, E., et al.: Searching in metric spaces. CSUR 33, 273–321 (2001)
Danziger, S.A., et al.: Predicting positive P53 cancer rescue regions using most informative positive (MIP) active learning. PLoS Comput. Biol. 5, e1000498 (2009)
Datar, M., et al.: Locality-sensitive hashing scheme based on p-stable distributions. In: SOCG (2004)
Gan, J., et al.: Locality-sensitive hashing scheme based on dynamic collision counting. In: SIGMOD (2012)
Gionis, A., et al.: Similarity search in high dimensions via hashing. In: VLDB (1999)
Huang, Q., et al.: Query-aware locality-sensitive hashing for approximate nearest neighbor search. VLDB 9, 1–12 (2015)
Jegou, H., et al.: Product quantization for nearest neighbor search. TPAMI 33, 117–128 (2010)
Kim, A., et al.: Optimally leveraging density and locality for exploratory browsing and sampling. In: HILDA (2018)
Leis, V., et al.: Query optimization through the looking glass, and what we found running the join order benchmark. VLDB 27, 643–668 (2018)
Li, M., et al.: I/O efficient approximate nearest neighbour search based on learned functions. In: ICDE (2020)
Li, W., et al.: Approximate nearest neighbor search on high dimensional data - experiments, analyses, and improvement. TKDE (2019)
Liu, W., et al.: I-LSH: I/O efficient c-approximate nearest neighbor search in high-dimensional space. In: ICDE (2019)
Liu, Y., et al.: SK-LSH: an efficient index structure for approximate nearest neighbor search. VLDB 7, 745–756 (2014)
Loosli, G., et al.: Training invariant support vector machines using selective sampling. Large Scale Kernel Mach. (2007)
Lv, Q., et al.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: VLDB (2007)
Russell, B.C., et al.: LabelMe: a database and web-based tool for image annotation. IJCV 77, 157–173 (2008)
Seagate Barracuda 120 SSD Manual. https://www.seagate.com/www-content/datasheets/pdfs/barracuda-120-sata-DS2022-1-1909US-en_US.pdf
Seagate ST2000DM001 Manual. https://www.seagate.com/files/staticfiles/docs/pdf/datasheet/disc/barracuda-ds1737-1-1111us.pdf
Sun, Y., et al.: SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. VLDB (2014)
Tao, Y., et al.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. TODS 35, 1–46 (2010)
Torralba, A., et al.: 80 million tiny images: a large data set for nonparametric object and scene recognition. TPAMI 30, 1958–1970 (2008)
Zheng, B., et al.: PM-LSH: a fast and accurate LSH framework for high-dimensional approximate NN search. VLDB 13, 643–655 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Jafari, O., Nagarkar, P. (2021). Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches. In: Qiao, M., Vossen, G., Wang, S., Li, L. (eds) Databases Theory and Applications. ADC 2021. Lecture Notes in Computer Science(), vol 12610. Springer, Cham. https://doi.org/10.1007/978-3-030-69377-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-69377-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69376-3
Online ISBN: 978-3-030-69377-0
eBook Packages: Computer ScienceComputer Science (R0)