skip to main content
10.1145/1277741.1277774acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Efficient document retrieval in main memory

Published:23 July 2007Publication History

ABSTRACT

Disk access performance is a major bottleneck in traditional information retrieval systems. Compared to system memory, disk bandwidth is poor, and seek times are worse.

We circumvent this problem by considering query evaluation strategies in main memory. We show how new accumulator trimming techniques combined with inverted list skipping can produce extremely high performance retrieval systems without resorting to methods that may harm effectiveness.

We evaluate our techniques using Galago, a new retrieval system designed for efficient query processing. Our system achieves a 69% improvement in query throughput over previous methods.

References

  1. V. N. Anh, O. deKretser, and A. Moffat. Vector-space ranking with effective early termination. In SIGIR 2001, pages 35--42, New York, NY, USA, 2001. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In SIGIR 2005, pages 226--233, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In SIGIR 2006, pages 372--379, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. IO-top-k: index-access optimized top-k query processing. In VLDB 2006, pages 475--486. VLDB Endowment, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. W. Brown. Fast evaluation of structured queries for information retrieval. In SIGIR 1995, pages 30--38, New York, NY, USA, 1995. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Buckley. Implementation of the information retrieval system. Technical report, Cornell University, Ithaca, NY, USA, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In CIKM 2006, pages 182--189, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Büttcher, C. L. A. Clarke, and I. Soboroff. The TREC 2006 Terabyte track. In TREC 2006, Gaithersburg, Maryland USA, November 2006.Google ScholarGoogle Scholar
  9. D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In SIGIR 2001, pages 43--50, New York, NY, USA, 2001. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. W. Cleverdon. The significance of the Cranfield tests on index languages. In SIGIR 1991, pages 3--12, New York, NY, USA, 1991. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters: Simplified data processing on large clusters. In OSDI 2004, pages 137--150, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS 2001, pages 102--113, New York, NY, USA, 2001. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Lester, A. Moffat, W. Webber, and J. Zobel. Space-limited ranked query evaluation using adaptive pruning. In WISE 2005, pages 470--477, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst., 14(4):349--379, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society of Information Science, 47(10):749--764, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In SIGIR 2005, pages 219--225, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Information Processing and Management, 31(6):831--850, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. H. Witten, A. Moffat, and T. C. Bell. Managing gigabytes (2nd ed.): compressing and indexing documents and images. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Zukowski, P. A. Boncz, N. Nes, and S. Heman. MonetDB/X100 - a DBMS in the CPU cache. IEEE Data Engineering Bulletin, 28(2):17--22, June 2005.Google ScholarGoogle Scholar

Index Terms

  1. Efficient document retrieval in main memory

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
      July 2007
      946 pages
      ISBN:9781595935977
      DOI:10.1145/1277741

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 July 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader