skip to main content
research-article

BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion

Authors Info & Claims
Published:18 March 2016Publication History
Skip Abstract Section

Abstract

Query autocompletion has become a standard feature in many search applications, especially for search engines. A recent trend is to support the error-tolerant autocompletion, which increases the usability significantly by matching prefixes of database strings and allowing a small number of errors.

In this article, we systematically study the query processing problem for error-tolerant autocompletion with a given edit distance threshold. We propose a general framework that encompasses existing methods and characterizes different classes of algorithms and the minimum amount of information they need to maintain under different constraints. We then propose a novel evaluation strategy that achieves the minimum active node size by eliminating ancestor-descendant relationships among active nodes entirely. In addition, we characterize the essence of edit distance computation by a novel data structure named edit vector automaton (EVA). It enables us to compute new active nodes and their associated states efficiently by table lookups. In order to support large distance thresholds, we devise a partitioning scheme to reduce the size and construction cost of the automaton, which results in the universal partitioned EVA (UPEVA) to handle arbitrarily large thresholds. Our extensive evaluation demonstrates that our proposed method outperforms existing approaches in both space and time efficiencies.

References

  1. Arvind Arasu, Venkatesh Ganti, and Raghav Kaushik. 2006. Efficient exact set-similarity joins. In VLDB.Google ScholarGoogle Scholar
  2. V. L. Arlazarov, E. A. Dinic, M. A. Kronrod, and I. A. Faradžev. 1970. On economical construction of the transitive closure of a directed graph. Soviet Math.—Doklady 11, 5 (1970), 1209--1210.Google ScholarGoogle Scholar
  3. Ricardo A. Baeza-Yates, Carlos A. Hurtado, and Marcelo Mendoza. 2007. Improving search engines by query clustering. JASIST 58, 12 (2007), 1793--1804.Google ScholarGoogle ScholarCross RefCross Ref
  4. Ziv Bar-Yossef and Naama Kraus. 2011. Context-sensitive query auto-completion. In WWW. 107--116.Google ScholarGoogle Scholar
  5. Hannah Bast and Björn Buchhold. 2013. An index for efficient semantic full-text search. In CIKM. 369--378.Google ScholarGoogle Scholar
  6. Hannah Bast and Marjan Celikik. 2013. Efficient fuzzy search in large text collections. ACM Trans. Inf. Syst. 31, 2 (2013), 10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Holger Bast, Debapriyo Majumdar, and Ingmar Weber. 2007. Efficient interactive query expansion with complete search. In CIKM. 857--860.Google ScholarGoogle Scholar
  8. Holger Bast and Ingmar Weber. 2006. Type less, find more: Fast autocompletion search with a succinct index. In SIGIR.Google ScholarGoogle Scholar
  9. Roberto J. Bayardo, Yiming Ma, and Ramakrishnan Srikant. 2007. Scaling up all pairs similarity search. In WWW.Google ScholarGoogle Scholar
  10. Sumit Bhatia, Debapriyo Majumdar, and Prasenjit Mitra. 2011. Query suggestions in the absence of query logs. In SIGIR. ACM, 795--804.Google ScholarGoogle Scholar
  11. Leonid Boytsov. 2011. Indexing methods for approximate dictionary searching: Comparative analysis. ACM J. Exper. Algorithmics 16, 1 (2011), 1.1--1.91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Eric Brill and Robert C. Moore. 2000. An improved error model for noisy channel spelling correction. In ACL.Google ScholarGoogle Scholar
  13. Andrei Z. Broder, Peter Ciccolo, Evgeniy Gabrilovich, Vanja Josifovski, Donald Metzler, Lance Riedel, and Jeffrey Yuan. 2009. Online expansion of rare queries for sponsored search. In WWW.Google ScholarGoogle Scholar
  14. Inci Cetindil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li. 2014. Efficient instant-fuzzy search with proximity ranking. In ICDE.Google ScholarGoogle Scholar
  15. Surajit Chaudhuri, Venkatesh Ganti, and Raghav Kaushik. 2006. A primitive operator for similarity joins in data cleaning. In ICDE.Google ScholarGoogle Scholar
  16. Surajit Chaudhuri and Raghav Kaushik. 2009. Extending autocompletion to tolerate errors. In SIGMOD.Google ScholarGoogle Scholar
  17. Silviu Cucerzan and Eric Brill. 2004. Spelling correction as an iterative process that exploits the collective knowledge of web users. In EMNLP. 293--300.Google ScholarGoogle Scholar
  18. Dong Deng, Guoliang Li, and Jianhua Feng. 2014. A pivotal prefix based filtering algorithm for string similarity search. In SIGMOD. 673--684.Google ScholarGoogle Scholar
  19. Dong Deng, Guoliang Li, Jianhua Feng, and Wen-Syan Li. 2013. Top-K string similarity search with edit-distance constraints. In ICDE.Google ScholarGoogle Scholar
  20. Huizhong Duan and Bo-June (Paul) Hsu. 2011. Online spelling correction for query completion. In WWW. 117--126.Google ScholarGoogle Scholar
  21. Jianhua Feng, Jiannan Wang, and Guoliang Li. 2012. Trie-join: A trie-based method for efficient string similarity joins. VLDB J. 21, 4 (2012), 437--461.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Luis Gravano, Panagiotis G. Ipeirotis, H. V. Jagadish, Nick Koudas, S. Muthukrishnan, and Divesh Srivastava. 2001. Approximate string joins in a database (almost) for free. In VLDB.Google ScholarGoogle Scholar
  23. David Hawking and Kathy Griffiths. 2013. An enterprise search paradigm based on extended query auto-completion. Do we still need search and navigation?. In ADCS.Google ScholarGoogle Scholar
  24. Qi He, Daxin Jiang, Zhen Liao, Steven C. H. Hoi, Kuiyu Chang, Ee-Peng Lim, and Hang Li. 2009. Web query recommendation via sequential query prediction. In ICDE. 1443--1454.Google ScholarGoogle Scholar
  25. Bo-June (Paul) Hsu and Giuseppe Ottaviano. 2013. Space-efficient data structures for top-k completion. In WWW. 583--594.Google ScholarGoogle Scholar
  26. Heikki Hyyrö. 2008. Improving the bit-parallel NFA of Baeza-Yates and Navarro for approximate string matching. Inf. Process. Lett. 108, 5 (2008), 313--319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shengyue Ji, Guoliang Li, Chen Li, and Jianhua Feng. 2009. Efficient interactive fuzzy keyword search. In WWW. 371--380.Google ScholarGoogle Scholar
  28. Chen Li, Jiaheng Lu, and Yiming Lu. 2008. Efficient merging and filtering algorithms for approximate string searches. In ICDE.Google ScholarGoogle Scholar
  29. Chen Li, Bin Wang, and Xiaochun Yang. 2007. VGRAM: Improving performance of approximate queries on string collections using variable-length grams. In VLDB.Google ScholarGoogle Scholar
  30. Guoliang Li, Dong Deng, Jiannan Wang, and Jianhua Feng. 2011. PASS-JOIN: A partition-based method for similarity joins. PVLDB 5, 3 (2011), 253--264.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Guoliang Li, Jianhua Feng, and Jing Xu. 2012b. DESKS: Direction-aware spatial keyword search. In ICDE. 474--485.Google ScholarGoogle Scholar
  32. Guoliang Li, Shengyue Ji, Chen Li, and Jianhua Feng. 2009. Efficient type-ahead search on relational data: A TASTIER approach. In SIGMOD. 695--706.Google ScholarGoogle Scholar
  33. Guoliang Li, Shengyue Ji, Chen Li, and Jianhua Feng. 2011. Efficient fuzzy full-text type-ahead search. VLDB J. 20, 4 (2011), 617--640.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Guoliang Li, Jiannan Wang, Chen Li, and Jianhua Feng. 2012. Supporting efficient top-k queries in type-ahead search. In SIGIR.Google ScholarGoogle Scholar
  35. Yanen Li, Huizhong Duan, and ChengXiang Zhai. 2012a. CloudSpeller: Query spelling correction by using a unified hidden markov model with web-scale resources. In WWW (Companion Volume). 561--562.Google ScholarGoogle Scholar
  36. Yinan Li, Jignesh M. Patel, and Allison Terrell. 2012. WHAM: A high-throughput sequence alignment method. ACM Trans. Database Syst. 37, 4 (2012), 28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. William J. Masek and Mike Paterson. 1980. A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 1 (1980), 18--31.Google ScholarGoogle ScholarCross RefCross Ref
  38. Stoyan Mihov and Klaus U. Schulz. 2004. Fast approximate search in large dictionaries. Comput. Linguistics 30, 4 (2004), 451--477.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Petar Mitankin, Stoyan Mihov, and Klaus U. Schulz. 2011. Deciding word neighborhood with universal neighborhood automata. Theor. Comput. Sci. 412, 22 (2011), 2340--2355.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Gene Myers. 1999. A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46, 3 (1999), 395--415.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Arnab Nandi and H. V. Jagadish. 2007a. Assisted querying using instant-response interfaces. In SIGMOD.Google ScholarGoogle Scholar
  42. Arnab Nandi and H. V. Jagadish. 2007b. Effective phrase prediction. In VLDB. 219--230.Google ScholarGoogle Scholar
  43. Gonzalo Navarro. 1997. A partial deterministic automaton for approximate string matching. In WSP’. 112--124.Google ScholarGoogle Scholar
  44. Gonzalo Navarro. 2001a. A guided tour to approximate string matching. ACM Comput. Surv. 33, 1 (2001), 31--88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Gonzalo Navarro. 2001b. NR-grep: A fast and flexible pattern-matching tool. Softw. Pract. Exper. 31, 13 (2001), 1265--1312.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Saul B. Needleman and Christian D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 3 (1970), 443--453.Google ScholarGoogle ScholarCross RefCross Ref
  47. Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In Infoscale. 1.Google ScholarGoogle Scholar
  48. Jianbin Qin, Wei Wang, Yifei Lu, Chuan Xiao, and Xuemin Lin. 2011. Efficient exact edit similarity query processing with the asymmetric signature scheme. In SIGMOD. 1033--1044.Google ScholarGoogle Scholar
  49. Jianbin Qin, Wei Wang, Chuan Xiao, Yifei Lu, Xuemin Lin, and Haixun Wang. 2013. Asymmetric signature schemes for efficient exact edit similarity query processing. ACM Trans. Database Syst. 38, 3 (2013), 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Senjuti Basu Roy and Kaushik Chakrabarti. 2011. Location-aware type ahead search on spatial databases: Semantics and efficiency. In SIGMOD. 361--372.Google ScholarGoogle Scholar
  51. Eldar Sadikov, Jayant Madhavan, Lu Wang, and Alon Y. Halevy. 2010. Clustering query refinements by user intent. In WWW. 841--850.Google ScholarGoogle Scholar
  52. Sunita Sarawagi and Alok Kirpal. 2004. Efficient set joins on similarity predicates. In SIGMOD.Google ScholarGoogle Scholar
  53. Klaus U. Schulz and Stoyan Mihov. 2002. Fast string correction with Levenshtein automata. IJDAR 5, 1 (2002), 67--85.Google ScholarGoogle ScholarCross RefCross Ref
  54. Peter H. Sellers. 1974. On the theory and computation of evolutionary distances. SIAM J. Appl. Math. 26, 4 (1974), 787--793.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Christian Sengstock and Michael Gertz. 2011. CONQUER: A system for efficient context-aware query suggestions. In WWW.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Milad Shokouhi. 2013. Learning to personalize query auto-completion. In SIGIR. 103--112.Google ScholarGoogle Scholar
  57. Milad Shokouhi and Kira Radinsky. 2012. Time-sensitive query auto-completion. In SIGIR. 601--610.Google ScholarGoogle Scholar
  58. B. Stiller, T. Bocek, and E. Hunt. 2007. Fast Similarity Search in Large Dictionaries. Technical Report ifi-2007.02. Department of Informatics, University of Zurich.Google ScholarGoogle Scholar
  59. Sarah K. Tyler and Jaime Teevan. 2010. Large scale query log analysis of re-finding. In WSDM. 191--200.Google ScholarGoogle Scholar
  60. Esko Ukkonen. 1985a. Algorithms for approximate string matching. Inf. Control 64, 1--3 (1985), 100--118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Esko Ukkonen. 1985b. Finding approximate patterns in strings. J. Algorithms 6, 1 (1985), 132--137.Google ScholarGoogle ScholarCross RefCross Ref
  62. T. K. Vintsyuk. 1968. Speech discrimination by dynamic programming. Cybernetics 4, 1 (1968), 52--57. Russian Kibernetika 4, 1, (1968), 81--88.Google ScholarGoogle ScholarCross RefCross Ref
  63. Robert A. Wagner and Michael J. Fischer. 1974. The string-to-string correction problem. J. ACM 21, 1 (Jan. 1974), 168--173.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jin Wang, Guoliang Li, Dong Deng, Yong Zhang, and Jianhua Feng. 2015. Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search. In ICDE. 519--530.Google ScholarGoogle Scholar
  65. Jiannan Wang, Guoliang Li, and Jianhua Feng. 2012. Can we beat the prefix filtering? An adaptive framework for similarity join and search. In SIGMOD. ACM, 85--96.Google ScholarGoogle Scholar
  66. Wei Wang, Jianbin Qin, Chuan Xiao, Xuemin Lin, and Heng Tao Shen. 2013. VChunkJoin: An efficient algorithm for edit similarity joins. IEEE Trans. Knowl. Data Eng. 25, 8 (2013), 1916--1929.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Wei Wang, Chuan Xiao, Xuemin Lin, and Chengqi Zhang. 2009. Efficient approximate entity extraction with edit constraints. In SIMGOD. 759--770.Google ScholarGoogle Scholar
  68. Xiaoli Wang, Xiaofeng Ding, Anthony K. H. Tung, and Zhenjie Zhang. 2013. Efficient and effective KNN sequence search with approximate n-grams. PVLDB 7, 1 (2013), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Ryen W. White and Gary Marchionini. 2007. Examining the effectiveness of real-time query expansion. Inf. Process. Manage. 43, 3 (2007), 685--704.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Chuan Xiao, Jianbin Qin, Wei Wang, Yoshiharu Ishikawa, Koji Tsuda, and Kunihiko Sadakane. 2013. Efficient error-tolerant query autocompletion. PVLDB 6, 6 (2013), 373--384.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Chuan Xiao, Wei Wang, and Xuemin Lin. 2008a. Ed-Join: An efficient algorithm for similarity joins with edit distance constraints. PVLDB 1, 1 (2008), 933--944.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Chuan Xiao, Wei Wang, Xuemin Lin, and Jeffrey Xu Yu. 2008b. Efficient similarity joins for near duplicate detection. In WWW. 131--140.Google ScholarGoogle Scholar
  73. Xiaochun Yang, Bin Wang, and Chen Li. 2008. Cost-based variable-length-gram selection for string collections to support approximate queries efficiently. In SIGMOD. ACM, 353--364.Google ScholarGoogle Scholar
  74. Xiaochun Yang, Yaoshu Wang, Bin Wang, and Wei Wang. 2015. Local filtering: Improving the performance of approximate queries on string collections. In SIGMOD. 377--392.Google ScholarGoogle Scholar
  75. Xiaoyang Zhang, Jianbin Qin, Wei Wang, Yifang Sun, and Jiaheng Lu. 2013. HmSearch: An efficient hamming distance query processing algorithm. In SSDBM. 19:1--19:12.Google ScholarGoogle Scholar
  76. Zhenjie Zhang, Marios Hadjieleftheriou, Beng Chin Ooi, and Divesh Srivastava. 2010. Bed-tree: An all-purpose index structure for string similarity search based on edit distance. In SIGMOD. ACM, 915--926.Google ScholarGoogle Scholar
  77. Yuxin Zheng, Zhifeng Bao, Lidan Shou, and Anthony K. H. Tung. 2014. MESA: A map service to support fuzzy type-ahead search over geo-textual data. PVLDB 7, 13 (2014), 1545--1548.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Ruicheng Zhong, Ju Fan, Guoliang Li, Kian-Lee Tan, and Lizhu Zhou. 2012. Location-aware instant search. In CIKM. 385--394.Google ScholarGoogle Scholar

Index Terms

  1. BEVA: An Efficient Query Processing Algorithm for Error-Tolerant Autocompletion

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Database Systems
          ACM Transactions on Database Systems  Volume 41, Issue 1
          Invited Paper from ICDT 2015, SIGMOD 2014, EDBT 2014 and Regular Papers
          April 2016
          287 pages
          ISSN:0362-5915
          EISSN:1557-4644
          DOI:10.1145/2897141
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 March 2016
          • Accepted: 1 October 2015
          • Revised: 1 August 2015
          • Received: 1 October 2014
          Published in tods Volume 41, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader