Elsevier

Information Processing & Management

Volume 30, Issue 6, November–December 1994, Pages 733-744
Information Processing & Management

Memory efficient ranking

https://doi.org/10.1016/0306-4573(94)90002-7Get rights and content

Abstract

Fast and effective ranking of a collection of documents with respect to a query requires several structures, including a vocabulary, inverted file entries, arrays of term weights and document lengths, a set of partial similarity accumulators, and address tables for inverted file entries and documents. Of all of these structures, the array of document lengths and the set of accumulators are the components accessed most frequently in a ranked query, and it is crucial to acceptable performance that they be held in main memory. Here we describe an approximate ranking process that makes use of a compact array of in-memory, low-precision approximations for the lengths. Combined with another simple rule for reducing the memory required by the partial similarity accumulators, the approximation heuristic allows the ranking of large document collections using less than one byte of memory per document, an eight-fold reduction compared with conventional techniques. Moreover, in our experiments retrieval effectiveness was largely unaffected by the use of these heuristics.

References (20)

  • A. Bookstein et al.

    A systematic approach to compressing a full-text retrieval system

    Information Processing & Management

    (1992)
  • G. Salton et al.

    Term-weighting approaches in automatic text retrieval

    Information Processing & Management

    (1988)
  • W.Y.P. Wong et al.

    Implementations of partial document ranking using inverted files

    Information Processing & Management

    (1993)
  • S. Al-Hawarndeh et al.

    Comparison of index term weighting schemes for the ranking of paragraphs in full-text documents

    International Journal of Information and Library Research

    (1990)
  • T.C. Bell et al.

    Data compression in full-text retrieval systems

    Journal of the American Society for Information Science

    (1993)
  • C. Buckley et al.

    Optimization of inverted vector searches

  • T.H. Cormen et al.

    Introduction to Algorithms

    (1990)
  • E.A. Fox et al.

    Order-preserving minimal hash functions and information retrieval

    ACM Transactions on Office Information Systems

    (1991)
  • E.A. Fox et al.

    Practical minimal perfect hash functions for large databases

    Communications of the ACM

    (January 1992)
  • D. Harman

    Overview of the first text retrieval conference

There are more references available in the full text version of this article.

Cited by (0)

View full text