skip to main content
10.1145/3127479.3127485acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

DLSH: a distribution-aware LSH scheme for approximate nearest neighbor query in cloud computing

Authors Info & Claims
Published:24 September 2017Publication History

ABSTRACT

Cloud computing needs to process and analyze massive high-dimensional data in a real-time manner. Approximate queries in cloud computing systems can provide timely queried results with acceptable accuracy, thus alleviating the consumption of a large amount of resources. Locality Sensitive Hashing (LSH) is able to maintain the data locality and support approximate queries. However, due to randomly choosing hash functions, LSH has to use too many functions to guarantee the query accuracy. The extra computation and storage overheads exacerbate the real performance of LSH. In order to reduce the overheads and deliver high performance, we propose a distribution-aware scheme, called DLSH, to offer cost-effective approximate nearest neighbor query service for cloud computing. The idea of DLSH is to leverage the principal components of the data distribution as the projection vectors of hash functions in LSH, further quantify the weight of each hash function and adjust the interval value in each hash table. We then refine the queried result set based on the hit frequency to significantly decrease the time overhead of distance computation. Extensive experiments in a large-scale cloud computing testbed demonstrate significant improvements in terms of multiple system performance metrics. We have released the source code of DLSH for public use.

References

  1. December 2015. How many photos are uploaded to Flickr every day, month, year? https://www.flickr.com/photos/franckmichel/6855169886/in/photostream/ (December 2015).Google ScholarGoogle Scholar
  2. Updated July 2016. The Top 20 Valuable Facebook Statistics. https://zephoria.com/top-15-valuable-facebook-statistics/ (Updated July 2016).Google ScholarGoogle Scholar
  3. Hervé Abdi and Lynne J Williams. 2010. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics 2, 4 (2010), 433--459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Alexandr Andoni and Piotr Indyk. 2005. E2LSH 0.1 user manual. (2005).Google ScholarGoogle Scholar
  5. Alexandr Andoni and Piotr Indyk. 2006. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Sciemce (FOCS'06). IEEE, 459--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. 2014. Beyond Locality-Sensitive Hashing. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1018--1028. Google ScholarGoogle ScholarCross RefCross Ref
  7. Alexandr Andoni and Ilya Razenshteyn. 2015. Optimal Data-Dependent Hashing for Approximate Near Neighbors. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing. ACM, 793--801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ning Cao, Cong Wang, Ming Li, Kui Ren, and Wenjing Lou. 2014. Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud Data. IEEE Transactions on parallel and distributed systems 25, 1 (2014), 222--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. 2004. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. In Proceedings of the twentieth annual symposium on Computational geometry. ACM, 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. 2012. Locality-Sensitive Hashing Scheme Based on Dynamic Collision Counting. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 541--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jinyang Gao, Hosagrahar Visvesvaraya Jagadish, Wei Lu, and Beng Chin Ooi. 2014. DSH: Data Sensitive Hashing for High-Dimensional k-NN Search. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, 1127--1138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yu Hua, Bin Xiao, Dan Feng, and Bo Yu. 2008. Bounded LSH for Similarity Search in Peer-to-Peer File Systems. In Proceedings of the 37th International Conference on Parallel Processing. IEEE, 644--651. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yu Hua, Bin Xiao, and Xue Liu. 2013. NEST: Locality-aware Approximate Query Service for Cloud Computing. In Proceedings IEEE INFOCOM. IEEE, 1303--1311. Google ScholarGoogle ScholarCross RefCross Ref
  14. Qiang Huang, Jianlin Feng, Yikai Zhang, Qiong Fang, and Wilfred Ng. 2015. Query-Aware Locality-Sensitive Hashing for Sp-proximate Nearest Neighbor Search. Proceedings of the VLDB Endowment 9, 1, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing. ACM, 604--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Weihao Kong and Wu-Jun Li. 2012. Isotropic Hashing. In Advances in Neural Information Processing Systems. 1646--1654.Google ScholarGoogle Scholar
  17. Simon Korman and Shai Avidan. 2016. Coherency Sensitive Hashing. IEEE transactions on pattern analysis and machine intelligence 38, 6 (2016), 1099--1112. Google ScholarGoogle ScholarCross RefCross Ref
  18. Gautam Kumar, Ganesh Ananthanarayanan, Sylvia Ratnasamy, and Ion Stoica. 2016. Hold'em or Fold'em?: Aggregation Queries under Performance Variations. In Proceedomgs pf tje Eleventh European Conference on Computer Systems (EuroSys'16). ACM.Google ScholarGoogle Scholar
  19. Ming Li, Shucheng Yu, Ning Cao, and Wenjing Lou. 2011. Authorized Private Keyword Search over Encrypted Data in Cloud Computing. In Proceedings of the 31st International Conference on Distributed Computing Systems (ICDCS'11). IEEE, 383--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Guosheng Lin, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, and David Suter. 2014. Fast Supervised Hashing with Decision Trees for High-Dimensional Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1963--1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jia Liu, Bin Xiao, Kai Bu, and Lijun Chen. 2014. Efficient Distributed Query Processing in Large RFID-enabled Supply Chains. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'14). IEEE, 163--171. Google ScholarGoogle ScholarCross RefCross Ref
  22. Qin Liu, Chiu C Tan, Jie Wu, and Guojun Wang. 2012. Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments. In Proceedings of the 31st Annual IEEE International Conference on Computer Communications. IEEE, 2581--2585. Google ScholarGoogle ScholarCross RefCross Ref
  23. David G Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60, 2 (2004), 91--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2007. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search. In Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 950--961.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yusuke Matsushita and Toshikazu Wada. 2009. Principal Component Hashing: An Accelerated Approximate Nearest Neighbor Search. Pacific-Rim Symposium on Image and Video Technology (2009), 374--385.Google ScholarGoogle Scholar
  26. Krystian Mikolajczyk, Tinne Tuytelaars, Cordelia Schmid, Andrew Zisserman, Jiri Matas, Frederik Schaffalitzky, Timor Kadir, and Luc Van Gool. 2005. A Comparison of Affine Region Detectors. International Journal of Computer Vision 65, 1--2 (2005), 43--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Nhan Nguyen and Philippas Tsigas. 2014. Lock-Free Cuckoo Hashing. In Proceedings of the 34th International Conference on Distributed Computing Systems. IEEE, 627--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. David Nister and Henrik Stewenius. 2006. Scalable Recognition with a Vocabulary Tree. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Version and Pattern Recognition (CVPR'06), Vol. 2. IEEE, 2161--2168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rina Panigrahy. 2006. Entropy based Nearest Neighbor Search in High Dimensions. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm. Society for Industrial and Applied Mathematics, 1186--1195. Google ScholarGoogle ScholarCross RefCross Ref
  30. Yongjoo Park, Michael Cafarella, and Barzan Mozafari. 2015. Neighbor-Sensitive Hashing. Proceedings of the VLDB Endowment 9, 3 (2015), 144--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sébastien Poullot, Olivier Buisson, and Michel Crucianu. 2007. Z-grid-based Probabilistic Retrieval for Scaling Up Content-Based Copy Detection. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval. ACM, 348--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Maxim Raginsky and Svetlana Lazebnik. 2009. Locality-Sensitive Binary Codes from Shift-Invariant Kernels. In Advances in Neural Information Processing Systems. 1509--1517.Google ScholarGoogle Scholar
  33. Fumin Shen, Chunhua Shen, Wei Liu, and Heng Tao Shen. 2015. Supervised Discrete Hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 37--45. Google ScholarGoogle ScholarCross RefCross Ref
  34. Evan R Sparks, Ameet Talwalkar, Daniel Haas, Michael J Franklin, Michael I Jordan, and Tim Kraska. 2015. Automating Model Search for Large Scale Machine Learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing. ACM, 368--380.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wenhai Sun, Xuefeng Liu, Wenjing Lou, Y Thomas Hou, and Hui Li. 2015. Catch You If You Lie to Me: Efficient Verifiable Conjunctive Keyword Search over Large Dynamic Encrypted Cloud Data. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'15). IEEE, 2110--2118. Google ScholarGoogle ScholarCross RefCross Ref
  36. Yuzhe Tang and Ling Liu. 2015. Privacy-Preserving Multi-Keyword Search in Information Networks. IEEE Transactions on Knowledge and Data Engineering 27, 9 (2015), 2424--2437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. 2009. Quality and Efficiency in High Dimensional Nearest Neighbor Search. In Proceedings of the 2009 ACM SIGMOD Conference on Management of data. ACM, 563--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. 2010. Efficient and Accurate Nearest Neighbor and Closest Pair Search in High-Dimensional Space. ACM Transactions on Database Systems (TODS) 35, 3 (2010), 20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Shixin Tian, Ying Cai, and Zhenbi Hu. 2016. A Parity-Based Data Outsourcing Model for Query Authentication and Correction. In Proceedings of the 36th International Conference on Distributed Computing Systems (ICDCS'16). IEEE, 395--404. Google ScholarGoogle ScholarCross RefCross Ref
  40. Vernon Turner, John F Gantz, David Reinsel, and Stephen Minton. 2014. The digital universe of opportunities: rich data and the increasing value of the internet of things. International Data Corporation, White Paper, IDC_1672 (2014).Google ScholarGoogle Scholar
  41. Raajay Viswanathan, Ganesh Ananthanarayanan, and Aditya Akella. 2016. CLARINET: WAN-Aware Optimization for Analytics Queries. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16). USENIX Association, 435--450.Google ScholarGoogle Scholar
  42. Dongsheng Wang, Xiaohua Jia, Cong Wang, Kan Yang, Shaojing Fu, and Ming Xu. 2015. Generalized Pattern Matching String Search on Encrypted Data in Cloud Systems. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'15). IEEE, 2101--2109.Google ScholarGoogle ScholarCross RefCross Ref
  43. Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral Hashing. In Advances in Neural Information Processing Systems. 1753--1760.Google ScholarGoogle Scholar
  44. Huiqi Xu, Shumin Guo, and Keke Chen. 2014. Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation. IEEE Transactions on Knowledge and Data Engineering 26, 2 (2014), 322--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hao Xu, Jingdong Wang, Zhu Li, Gang Zeng, Shipeng Li, and Nenghai Yu. 2011. Complementary Hashing for Approximate Nearest Neighbor Search. In Proceedings of the 2011 IEEE International Conference on Computer Vision. IEEE, 1631--1638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Lei Xu, Hong Jiang, Lei Tian, and Ziling Huang. 2014. Propeller: A Scalable Real-Time File-Search Service in Distributed Systems. In Proceedings of the 34th International Conference on Distributed Computing Systems (ICDCS'14). IEEE, 378--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Myung Keun Yoon, JinWoo Son, and Seon-Ho Shin. 2014. Bloom Tree: A Search Tree Based on Bloom Filters for Multiple-Set Membership Testing. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM'14). IEEE, 1429--1437.Google ScholarGoogle ScholarCross RefCross Ref
  48. Felix X Yu, Sanjiv Kumar, Yunchao Gong, and Shih-Fu Chang. 2014. Circulant Binary Embedding. In Proceedings of the International Conference on Machine Learning, Vol. 6. 7.Google ScholarGoogle Scholar
  49. Deli Zhang and Damian Dechev. 2016. An Efficient Lock-Free Logarithmic Search Data Structure Based on Multi-dimensional List. In Proceedings of the 36th International Conference on Distributed Computing Systems (ICDCS'16). IEEE, 281--292. Google ScholarGoogle ScholarCross RefCross Ref
  50. Lan Zhang, Taeho Jung, Cihang Liu, Xuan Ding, Xiang-Yang Li, and Yunhao Liu. 2015. POP: Privacy-Preserving Outsourced Photo Sharing and Searching for Mobile Devices. In Proceedings of the 35th International Conference on Distributed Computing Systems (ICDCS'15). IEEE, 308--317. Google ScholarGoogle ScholarCross RefCross Ref
  51. Wei Zhang, Ke Gao, Yong-dong Zhang, and Jin-tao Li. 2010. Data-Oriented Locality Sensitive Hashing. In Proceedings of the 18th ACM international conference on Multimedia. ACM, 1131--1134. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DLSH: a distribution-aware LSH scheme for approximate nearest neighbor query in cloud computing

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing
            September 2017
            672 pages
            ISBN:9781450350280
            DOI:10.1145/3127479

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 September 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate169of722submissions,23%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader