skip to main content
10.1145/1458082.1458171acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Supporting sub-document updates and queries in an inverted index

Published:26 October 2008Publication History

ABSTRACT

Inverted indexes have become the standard indexing method for supporting search queries in a variety of content-based applications. Examples of such applications include enterprise document management, e-mail, web search, and social networks. One shortcoming in current inverted index designs is that they support only document-level updates, forcing a full document to be reindexed even if just part of it changes. This paper describes a new inverted index design that enables applications to break a document into semantically meaningful sub-documents or "sections". Each section of a document can be updated separately, but search queries can still work seamlessly across sections. Our index design is motivated by applications where there is metadata associated with each document that tends to be smaller and more frequently updated than the document's content, but at the same time, it is desireable to search the metadata and content with the same index structure. A novel self-optimizing query execution algorithm is described to efficiently join the sections of a document in the inverted index. Experimental results on TREC and patent data are provided, showing that sections can dramatically improve overall system throughput on a mixed workload of updates and queries.

References

  1. http://aws.amazon.com.Google ScholarGoogle Scholar
  2. http://base.google.com.Google ScholarGoogle Scholar
  3. http://lucene.apache.org.Google ScholarGoogle Scholar
  4. http://trec.nist.gov/data/webmain.html.Google ScholarGoogle Scholar
  5. http://www.documentum.com.Google ScholarGoogle Scholar
  6. http://www.filenet.com.Google ScholarGoogle Scholar
  7. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Bayer and E. McCreight. Organization and maintenance of large ordered indices. In Acta Informatica, vol 1, 1972.Google ScholarGoogle Scholar
  9. O. Ben-Yitzhak, N. Golbandi, N. Har'El, R. Lempel, A. Neumann, S. Ofek-Koifman, D. Sheinwald, E. Shekita, B. Sznajder, and S. Yogev. Beyond basic faceted search. In WSDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Brown, J. Callan, and W. Croft. Fast incremental indexing for full-text information retrieval. In VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Buttcher, C. Clarke, and B. Lushman. Hybrid index maintenance for growing text collections. In SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Cutting and J. Pedersen. Optimizations for dynamic inverted index maintenance. In SIGIR, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Fontoura, V. Josifovski, E. Shekita, and B. Yang. Optimizing cursor movement in holistic twig joins. In CIKM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Fontoura, R. Lempel, R. Qi, and J. Zien. Inverted index support for numeric search. In Internet Mathematics, 3(2), 2006.Google ScholarGoogle Scholar
  17. H. Garcia-Molina, J. Ullman, and J. Widom. Database System Implementation. Prentice Hall, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Graefe. The five-minute rule twenty years later, and how flash memory changes the rules. In DAMON, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Hamilton and T. Nayak. Microsoft sql server full-text search. IEEE Data Eng. Bull., 24(4), 2001.Google ScholarGoogle Scholar
  20. H. Jiang, W. Wang, H. Lu, and J. Yu. Holistic twig joins on indexed xml documents. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. Lester, A. Moffat, and J. Zobel. Fast on-line index construction by geometric partitioning. In CIKM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Lester, J. Zobel, and H. Williams. In-place verse re-build verse re-merge: Index maintenance strategies for text retrieval systems. In 27'th Australasian Computer Science Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Mohan and F. Levine. Aries/im: An efficient and high concurrency index managment method using write-ahead logging. In SIGMOD, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Tomasic, H. Garcia-Molina, and K. Shoens. Incremental updates of inverted lists for text document retrieval. In SIGMOD, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Turtle and J. Flood. Query evaluation: Strategies and optimizations. In Information Processing and Management, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. I. Witten, T. Bell, and A. Moffat. Managing Gigabytes: Compressing and Indexing Documents and Images. John Wiley & Sons, Inc., 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Supporting sub-document updates and queries in an inverted index

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
            October 2008
            1562 pages
            ISBN:9781595939913
            DOI:10.1145/1458082

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 October 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,861of8,427submissions,22%

            Upcoming Conference

          • Article Metrics

            • Downloads (Last 12 months)3
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader