Skip to main content

Structured Index Organizations for High-Throughput Text Querying

  • Conference paper
String Processing and Information Retrieval (SPIRE 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4209))

Included in the following conference series:

Abstract

Inverted indexes are the preferred mechanism for supporting content-based queries in text retrieval systems, with the various data items usually stored compressed in some way. But different query modalities require that different information be held in the index. For example, phrase querying requires that word offsets be held as well as document numbers. In this study we describe an inverted index organization that provides efficient support for all of conjunctive Boolean queries, ranked queries, and phrase queries. Experimental results on a 426 GB document collection show that the methods we describe provide fast evaluation of all three querying modes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Anh, V.N., de Kretser, O., Moffat, A.: Vector-space ranking with effective early termination. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, September 2001, pp. 35–42. ACM Press, New York (2001)

    Chapter  Google Scholar 

  • Anh, V.N., Moffat, A.: Improved word-aligned binary compression for text indexing. IEEE Transactions on Knowledge and Data Engineering 18(6), 857–861 (2006a)

    Article  Google Scholar 

  • Anh, V.N., Moffat, A.: Pruned query evaluation using pre-computed impacts. In: Proc. 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, August 2006, ACM Press, New York (to appear, 2006b)

    Google Scholar 

  • Hawking, D.: Efficiency/effectiveness trade-offs in query processing. ACM SIGIR Forum 32(2), 16–22 (1998)

    Article  MathSciNet  Google Scholar 

  • Heaps, H.S.: Information Retrieval, Computational and Theoretical Aspects. Academic Press, London (1978)

    MATH  Google Scholar 

  • Kaszkiel, M., Zobel, J., Sacks-Davis, R.: Efficient passage ranking for document databases. ACM Transactions on Information Systems 17(4), 406–439 (1999)

    Article  Google Scholar 

  • Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems 14(4), 349–379 (1996)

    Article  Google Scholar 

  • Persin, M., Zobel, J., Sacks-Davis, R.: Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science 47(10), 749–764 (1996)

    Article  Google Scholar 

  • Strohman, T., Turtle, H., Croft, W.B.: Optimization strategies for complex queries. In: Marchionini, G., Moffat, A., Tait, J., Baeza-Yates, R., Ziviani, N. (eds.) Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, August 2005, pp. 219–225. ACM Press, New York (2005)

    Chapter  Google Scholar 

  • Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Information Processing & Management 31(1), 831–850 (1995)

    Article  Google Scholar 

  • Williams, H.E., Zobel, J., Bahle, D.: Fast phrase querying with combined indexes. ACM Transactions on Information Systems 22(4), 573–594 (2004)

    Article  Google Scholar 

  • Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  • Zobel, J., Moffat, A.: Inverted files for text search engines. Computing Surveys, (to appear, 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Anh, V.N., Moffat, A. (2006). Structured Index Organizations for High-Throughput Text Querying. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_25

Download citation

  • DOI: https://doi.org/10.1007/11880561_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45774-9

  • Online ISBN: 978-3-540-45775-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics