skip to main content
article
Free Access

Optimum polynomial retrieval functions based on the probability ranking principle

Published:01 July 1989Publication History
Skip Abstract Section

Abstract

We show that any approach to developing optimum retrieval functions is based on two kinds of assumptions: first, a certain form of representation for documents and requests, and second, additional simplifying assumptions that predefine the type of the retrieval function. Then we describe an approach for the development of optimum polynomial retrieval functions: request-document pairs (fl, dm) are mapped onto description vectors x(fl, dm), and a polynomial function e(x) is developed such that it yields estimates of the probability of relevance P(R | x (fl, dm) with minimum square errors. We give experimental results for the application of this approach to documents with weighted indexing as well as to documents with complex representations. In contrast to other probabilistic models, our approach yields estimates of the actual probabilities, it can handle very complex representations of documents and requests, and it can be easily applied to multivalued relevance scales. On the other hand, this approach is not suited to log-linear probabilistic models and it needs large samples of relevance feedback data for its application.

References

  1. 1 BIEBRICHER, P., FUHR, N., KNORZ, G., LUSTIG, G., AND SCHWANTNER, M. The automatic indexing system AIR/PHYS--from research to application. In 11th International Conference on Research and Development in Information Retrieval, Y. Chiaramella, Ed. Presses Universitaires de Grenoble, Grenoble, France, 1988, pp. 333-342. Google ScholarGoogle Scholar
  2. 2 BOLLMANN, P., JOCHUM, R., REINER, U., WEISSMANN, V., AND ZUSE, H. Planung und Durchfiihrung der Retrievaltests. In Leistungsbewertung yon Information Retrieval Verfahren (LIVE), H.-J. Schneider et al., eds. TU Berlin, Fachbereich Informatik, Computergestfitzte Informationssysteme (CIS), Institut fiir Angewandte Informatik, 1986, pp. 183-212.Google ScholarGoogle Scholar
  3. 3 BOOKSTEIN, A. Logtinear Analysis of Library Data. Research Report, OCLC, Office of Research, 1988.Google ScholarGoogle Scholar
  4. 4 BOOKSTEIN, A. Outline of a general probabilistic retrieval model. J. Doc. 39, 2 (1983), 63-72.Google ScholarGoogle Scholar
  5. 5 CROFT, W.B. Approaches to intelligent information retrieval. Inf. Process. Manage. 23, 4 (1987), 249-254. Google ScholarGoogle Scholar
  6. 6 DUI)A, R. O., AND HART, P.E. Pattern Classification and Scene Analysis. Wiley, New York, 1973.Google ScholarGoogle Scholar
  7. 7 FUHR, N. Models for retrieval with probabilistic indexing. In/. Process. Manage. 25, 1 (1989), 55-72. Google ScholarGoogle Scholar
  8. 8 FUnR, N. A probabilistic model of dictionary based automatic indexing. In Proceedings of the Riao 85 (Recherche d'informations Assistee par Ordinateur) (Grenoble, France, March 18-20). 1985, pp. 207-216.Google ScholarGoogle Scholar
  9. 9 FUHR, N. Probabilistisches lndexing und Retrieval. Fachinformationszentrum Karlsruhe, Eggenstein-Leopoldshafen, West Germany, 1988.Google ScholarGoogle Scholar
  10. 10 FUHR, N. Two models of retrieval with probabilistic indexing. In Proceedings of the 9th Annual Conference on Research and Development in Information Retrieval (Pisa, Italy, Sept. 8-10). F. Rabitti, ed. ACM, New York, 1986, pp. 249-257. Google ScholarGoogle Scholar
  11. 11 FUHR, N., AND HUTHER, H. Optimum probability estimation based on expectations, in llth International Conference on Research and Development in Information Retrieval, Y. Chiaranella, ed. Presses Universitaires de Grenoble, Grenoble, France, 1988, pp. 257-273. Google ScholarGoogle Scholar
  12. 12 FUHR, N., AND HOTHER, H. Optimum probability estimation from empirical distributions. Inf. Process. Manage. 25, 3 (1989). Google ScholarGoogle Scholar
  13. 13 FUHR, N., AND KNORZ, G. Retrieval test evaluation of a rule based automatic indexing (AIR/ PHYS). In Research and Development in Information Retrieval, C. J. Van Rijsbergen, ed. Cambridge University Press, Cambridge, England 1984, pp. 391-408. Google ScholarGoogle Scholar
  14. 14 GORDON, M. Probabilistic and genetic algorithms for document retrieval. Commun. ACM 31, 10 (Oct. 1988), 1208-1218. Google ScholarGoogle Scholar
  15. 15 KEEN, E. M. Evaluation parameters. In The SMART Retrieval System--Experiments in Automatic Document Processing, G. Salton, ed. Prentice Hall, Englewood Cliffs, N.J., 1971, pp. 74-112.Google ScholarGoogle Scholar
  16. 16 KNORZ, G. Automatisches Indexieren als Erkennen abstrakter Objekte. Niemeyer, Tfibingen, West Germany, 1983.Google ScholarGoogle Scholar
  17. 17 KNORZ, G. A decision theory approach to optimal automatic indexing. In Research and Development in Information Retrieval, G. Salton and H.-J. Schneider, eds. Springer, Berlin, West Germany, 1983, pp. 174-193. Google ScholarGoogle Scholar
  18. 18 KONSTANTIN, J. Untersuchung yon nach dem Quadratmittel-Polynomansatz erstellten Ranking{unktionen. Diplomarbeit, TH Darmstadt, FB Informatik, Datenverwaltungssysteme II, Darmstadt, West Germany 1985.Google ScholarGoogle Scholar
  19. 19 LUSTlCL G. Automatische Indexierung zwischen Forschung und Anwendung. Olms, Hildesheim, West Germany 1986.Google ScholarGoogle Scholar
  20. 20 RIJSBERGEN, C.J. Information Retrieval, 2nd ed. Butterworth, London, 1979. Google ScholarGoogle Scholar
  21. 21 ROBERTSON, S.E. The probability ranking principle in IR. J. Doc. 33 (1977), 294-304.Google ScholarGoogle Scholar
  22. 22 ROBERTSON, S. E., MARON, M. E., AND COOPER, W.S. Probability of relevance: A unification of two competing models for document retrieval. Inf. Tech. Res. 1 (1982), 1-21.Google ScholarGoogle Scholar
  23. 23 ROCCHIO, J.J. Relevance feedback in information retrieval. In The SMART Retrieval System~ Experiments in Automatic Document Processing, G. Salton, ed. Prentice Hall, Englewood Cliffs, N.J., 1971.Google ScholarGoogle Scholar
  24. 24 SALTON, G., EI). The SMART Retrieval System--Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs, N.J., 1971. Google ScholarGoogle Scholar
  25. 25 SCHORMANN, J. Polynomklassifikatoren fur die Zeichenerkennung. Ansatz, Adaption, Anwendung. Oldenbourg, M/inchen, West Germany, 1977.Google ScholarGoogle Scholar
  26. 26 WoNc,, S. K. M., YAO, Y. Y., AND BOLLMANN, P. Linear structure in information retrieval. In l lth International Conference on Research and Development in Information Retrieval, Y. Chiaramella, ed. Presses Universitaires de Grenoble, Grenoble, France, June 1988, pp. 219-232. Google ScholarGoogle Scholar

Index Terms

  1. Optimum polynomial retrieval functions based on the probability ranking principle

            Recommendations

            Reviews

            Kathleen H. V. Booth

            This excellent paper describes an application of the least squares polynomial method, previously used in automatic indexing, to the classification of request-document pairs in information retrieval. The retrieval functions developed provide both a probabilistic ranking of relevant documents and estimates of the probability of relevance. This application also allows the use of multivalued (rather than simple 1/0) relevance scales. An interesting refinement is that the indexing procedure identifies phrases as well as single words and assigns a significance factor to them based on their location in the document. A disadvantage of this approach is that it requires a large number of independent assessments of the relevance of documents to requests, because these assessments are needed to define the so-called description vectors used in the process. Fuhr describes the results of extensive experiments and makes suggestions for further work in the field.

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Information Systems
              ACM Transactions on Information Systems  Volume 7, Issue 3
              July 1989
              134 pages
              ISSN:1046-8188
              EISSN:1558-2868
              DOI:10.1145/65943
              Issue’s Table of Contents

              Copyright © 1989 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 July 1989
              Published in tois Volume 7, Issue 3

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader