ABSTRACT
LSH is a popular framework to generate compact representations of multimedia data, which can be used for content based search. However, the performance of LSH is limited by its unsupervised nature and the underlying feature scale. In this work, we propose to improve LSH by incorporating two elements - supervised hash bit selection and multi-scale feature representation. First, a feature vector is represented by multiple scales. At each scale, the feature vector is divided into segments. The size of a segment is decreased gradually to make the representation correspond to a coarse-to-fine view of the feature. Then each segment is hashed to generate more bits than the target hash length. Finally the best ones are selected from the hash bit pool according to the notion of bit reliability, which is estimated by bit-level hypothesis testing.
Extensive experiments have been performed to validate the proposal in two applications: near-duplicate image detection and approximate feature distance estimation. We first demonstrate that the feature scale can influence performance, which is often a neglected factor. Then we show that the proposed supervision method is effective. In particular, the performance increases with the size of the hash bit pool. Finally, the two elements are put together. The integrated scheme exhibits further improved performance.
- A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proc. of 47th IEEE Symposium on Foundations of Computer Science (FOCS), pages 459--468, 2006. Google ScholarDigital Library
- R. Balu, T. Furon, and H. Jégou. Beyond "project and sign" for cosine estimation with binary codes. In Proc. of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 6884--6888, May 2014.Google Scholar
- M. S. Charikar. Similarity estimation techniques from rounding algorithms. In Proc. of 34th ACM Symposium on Theory of Computing (STOC), pages 380--388, 2002. Google ScholarDigital Library
- M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proc. of 20th Symposium on Computational Geometry (SCG), pages 253--262, 2004. Google ScholarDigital Library
- J. Fridrich. Robust bit extraction from images. In Proc. of IEEE International Conference on Multimedia Computing and Systems, volume 2, pages 536--540, 1999. Google ScholarDigital Library
- Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 817--824, June 2011. Google ScholarDigital Library
- J. Haitsma and T. Kalker. A highly robust audio fingerprinting system. In Proc. of 3rd International Conference on Music Information Retrieval, pages 107--115, October 2002.Google Scholar
- H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117--128, 2011. Google ScholarDigital Library
- F. Khelifi and J. Jiang. Perceptual image hashing based on virtual watermark detection. IEEE Transactions on Image Processing, 19(4):981--994, April 2010. Google ScholarDigital Library
- B. Kulis and K. Grauman. Kernelized locality-sensitive hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6):1092--1104, June 2012. Google ScholarDigital Library
- F. Lefèbvre, B. Macq, and J.-D. Legat. RASH: RAdon Soft Hash algorithm. In Proc. of 11th European Signal Processing Conference, volume 1, pages 299--302, Toulouse, France, Sep. 2002.Google Scholar
- W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashing with kernels. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2074--2081, 2012. Google ScholarDigital Library
- D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, Nov. 2004. Google ScholarDigital Library
- A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3):145--175, May 2001. Google ScholarDigital Library
- R. Salakhutdinov and G. E. Hinton. Learning a nonlinear embedding by preserving class neighbourhood structure. In Proc. of International Conference on Artificial Intelligence and Statistics, volume 11, pages 412--419, 2007.Google Scholar
- M. Schneider and S.-F. Chang. A robust content based digital signature for image authentication. In Proc. of International Conference on Image Processing (ICIP), volume 3, pages 227--230, 1996.Google ScholarCross Ref
- M. Slaney and M. Casey. Locality-sensitive hashing for finding nearest neighbors {lecture notes}. Signal Processing Magazine, IEEE, 25(2):128--131, 2008.Google ScholarCross Ref
- W. Stallings. Cryptography and Network Security. Prentice Hall, 4th edition, 2005. Google ScholarDigital Library
- C. Strecha, A. M. Bronstein, M. M. Bronstein, and P. Fua. LDAHash: Improved matching with smaller descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1):66--78, 2012. Google ScholarDigital Library
- A. Swaminathan, Y. Mao, and M. Wu. Robust and secure image hashing. IEEE Transactions on Information Forensics and Security, 1(2):215--230, June 2006. Google ScholarDigital Library
- Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In NIPS, pages 1753--1760, 2008.Google ScholarDigital Library
- L. Weng, L. Amsaleg, A. Morton, and S. Marchand-Maillet. A privacy-preserving framework for large-scale content-based information retrieval. IEEE Transactions on Information Forensics and Security, 10(1):152--167, Jan. 2015.Google ScholarCross Ref
- L. Weng, G. Braeckman, A. Dooms, and B. Preneel. Robust image content authentication with tamper location. In Proc. of IEEE International Conference on Multimedia and Expo, pages 380--385, 2012. Google ScholarDigital Library
- L. Weng, R. Darazi, B. Preneel, B. Macq, and A. Dooms. Robust image content authentication using perceptual hashing and watermarking. In Proc. of 13th Pacific-Rim Conference on Multimedia (PCM), volume 7674 of LNCS, pages 315--326, 2012. Google ScholarDigital Library
- L. Weng and B. Preneel. On secure image hashing by higher-order statistics. In Proc. of IEEE International Conference on Signal Processing and Communications, pages 1063--1066, 2007.Google ScholarCross Ref
- L. Weng and B. Preneel. Shape-based features for image hashing. In Proc. of IEEE International Conference on Multimedia and Expo (ICME), pages 1074--1077, 2009. Google ScholarDigital Library
- L. Weng and B. Preneel. A novel video hash algorithm. In Proc. of ACM International Conference on Multimedia, pages 739--742, October 2010. Google ScholarDigital Library
- L. Weng and B. Preneel. A secure perceptual hash algorithm for image content authentication. In Proc. of International Conference on Communications and Multimedia Security, volume 7025 of LNCS, pages 108--121, 2011. Google ScholarDigital Library
Index Terms
- Supervised Multi-scale Locality Sensitive Hashing
Recommendations
Fast locality-sensitive hashing
KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data miningLocality-sensitive hashing (LSH) is a basic primitive in several large-scale data processing applications, including nearest-neighbor search, de-duplication, clustering, etc. In this paper we propose a new and simple method to speed up the widely-used ...
A posteriori multi-probe locality sensitive hashing
MM '08: Proceedings of the 16th ACM international conference on MultimediaEfficient high-dimensional similarity search structures are essential for building scalable content-based search systems on feature-rich multimedia data. In the last decade, Locality Sensitive Hashing (LSH) has been proposed as indexing technique for ...
Data-Dependent Locality Sensitive Hashing
Proceedings of the 15th Pacific-Rim Conference on Advances in Multimedia Information Processing --- PCM 2014 - Volume 8879Locality sensitive hashing LSH is the most popular algorithm for approximate nearest neighbor ANN search. As LSH partitions vector space uniformly and the distribution of vectors is usually non-uniform, it poorly fits real dataset and has limited ...
Comments