ABSTRACT
Disk access performance is a major bottleneck in traditional information retrieval systems. Compared to system memory, disk bandwidth is poor, and seek times are worse.
We circumvent this problem by considering query evaluation strategies in main memory. We show how new accumulator trimming techniques combined with inverted list skipping can produce extremely high performance retrieval systems without resorting to methods that may harm effectiveness.
We evaluate our techniques using Galago, a new retrieval system designed for efficient query processing. Our system achieves a 69% improvement in query throughput over previous methods.
- V. N. Anh, O. deKretser, and A. Moffat. Vector-space ranking with effective early termination. In SIGIR 2001, pages 35--42, New York, NY, USA, 2001. ACM Press. Google ScholarDigital Library
- V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In SIGIR 2005, pages 226--233, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In SIGIR 2006, pages 372--379, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. IO-top-k: index-access optimized top-k query processing. In VLDB 2006, pages 475--486. VLDB Endowment, 2006. Google ScholarDigital Library
- E. W. Brown. Fast evaluation of structured queries for information retrieval. In SIGIR 1995, pages 30--38, New York, NY, USA, 1995. ACM Press. Google ScholarDigital Library
- C. Buckley. Implementation of the information retrieval system. Technical report, Cornell University, Ithaca, NY, USA, 1985. Google ScholarDigital Library
- S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In CIKM 2006, pages 182--189, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- S. Büttcher, C. L. A. Clarke, and I. Soboroff. The TREC 2006 Terabyte track. In TREC 2006, Gaithersburg, Maryland USA, November 2006.Google Scholar
- D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In SIGIR 2001, pages 43--50, New York, NY, USA, 2001. ACM Press. Google ScholarDigital Library
- C. W. Cleverdon. The significance of the Cranfield tests on index languages. In SIGIR 1991, pages 3--12, New York, NY, USA, 1991. ACM Press. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters: Simplified data processing on large clusters. In OSDI 2004, pages 137--150, 2004. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS 2001, pages 102--113, New York, NY, USA, 2001. ACM Press. Google ScholarDigital Library
- N. Lester, A. Moffat, W. Webber, and J. Zobel. Space-limited ranked query evaluation using adaptive pruning. In WISE 2005, pages 470--477, 2005. Google ScholarDigital Library
- A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst., 14(4):349--379, 1996. Google ScholarDigital Library
- M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society of Information Science, 47(10):749--764, 1996. Google ScholarDigital Library
- T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In SIGIR 2005, pages 219--225, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Information Processing and Management, 31(6):831--850, 1995. Google ScholarDigital Library
- I. H. Witten, A. Moffat, and T. C. Bell. Managing gigabytes (2nd ed.): compressing and indexing documents and images. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999. Google ScholarDigital Library
- M. Zukowski, P. A. Boncz, N. Nes, and S. Heman. MonetDB/X100 - a DBMS in the CPU cache. IEEE Data Engineering Bulletin, 28(2):17--22, June 2005.Google Scholar
Index Terms
- Efficient document retrieval in main memory
Recommendations
A durable and energy efficient main memory using phase change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architectureUsing nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
An efficient non-volatile main memory using phase change memory
CompSysTech '12: Proceedings of the 13th International Conference on Computer Systems and TechnologiesThe paper represents a suggestion for a non-volatile computer system design. We propose architecture for implementing the main memory as non-volatile, resulting in a non-volatile computer. Our solution is based on the rapidly developing contemporary ...
A durable and energy efficient main memory using phase change memory technology
Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
Comments