Abstract
Even though a Boolean query can express the information need precisely enough to select relevant documents, it is not easy to construct an appropriate Boolean query that covers all relevant documents. To utilize a Boolean query effectively, a mechanism to retrieve as many as possible relevant documents is therefore required. In accordance with this requirement, we propose a method for modifying a given Boolean query by using information from a relevant document set. The retrieval results, however, may deteriorate if some important query terms are removed by this reformulation. A further mechanism is thus required in order to use other query terms that are useful for finding more relevant documents, but are not strictly required in relevant documents. To meet this requirement, we propose a new method that combines the probabilistic IR and the Boolean IR models. We also introduce a new IR system---called appropriate Boolean query reformulation for information retrieval (ABRIR)---based on these two methods and the Okapi system. ABRIR uses both a word index and a phrase index formed from combinations of two adjacent noun words. The effectiveness of these two methods was confirmed according to the NTCIR-4 Web test collection.
- Anick, P. G., Brennan, J. D., Flynn, R. A., Hanssen, D. R., Alvey, B., and Robbins, J. M. 1990. A direct manipulation interface for Boolean information retrieval via natural language query. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Brussels, Belgium, 5--7 September 1990, Proceedings, J.-L. Vidick, Ed. ACM, New York. 135--150. Google Scholar
- Baeza-Yates, R. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley, Reading, MA. Google Scholar
- Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1--7, 107--117. Google Scholar
- Cutting, D. R., Pedersen, J. O., Karger, D., and Tukey, J. W. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 318--329. Google Scholar
- Eastman, C. M. and Jansen, B. J. 2003. Coverage, relevance, and ranking: The impact of query operators on web search engine results. ACM Transactions on Information Systems 21, 4, 383--411. Google Scholar
- Eguchi, K., Oyama, K., Aizawa, A., and Ishikawa, H. 2004. Overview of the informational retrieval task at ntcir-4 web. In Working Notes of the Fourth NTCIR Workshop Meeting. http://research.nii.ac.jp/ntcir-ws4/NTCIR4-WN/WEB/NTCIR4WN-OV-WEB-A-EguchiK.pdf.Google Scholar
- Hearst, M. A. 1999. Modern Information Retrieval. Addison-Wesley, Chapter 10 User Interfaces and Visualization, 257--323. Google Scholar
- Jones, S. 1998. Graphical query specification and dynamic result previews for a digital library. In ACM Symposium on User Interface Software and Technology. 143--151. Google Scholar
- Kekalainen, J. and Jarvelin, K. 1998. The impact of query structure and query expansion on retrieval performance. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 130--137. Google Scholar
- Koenemann, J. and Belkin, N. J. 1996. A case for interaction: A study of interactive information retrieval behavior and effectiveness. In Proceedings of ACM Conference on Human Factors in Computing Systems. 205--212. Google Scholar
- Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H., Takaoka, K., and Asahara, M. 2000. Morphological Analysis System ChaSen version 2.2.1 Manual. Nara Institute of Science and Technology.Google Scholar
- Robertson, S. E. and Walker, S. 2000. Okapi/Keenbow at TREC-8. In Proceedings of TREC-8. 151--162.Google Scholar
- Salton, G., Fox, E. A., and Wu, H. 1983. Extended Boolean information retrieval. Communications of the ACM 26, 11, 1022--1036. Google Scholar
- Shaw, J. A. and Fox, E. A. 1994. Combination of multiple searches. In Text REtrieval Conference. 105--108.Google Scholar
- Spink, A., Wolfram, D., Jansen, M. B. J., and Saracevic, T. 2001. Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology 52, 3, 226--234. Google Scholar
- Takano, A., Niwa, Y., Nishioka, S., Hisamitsu, T., Iwayama, M., and Imaichi, O. 2001. Associative information access using dualnavi. In Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium. 771--772.Google Scholar
- Toyoda, M., Kitsuregawa, M., Mano, H., Itoh, H., and Ogawa, Y. 2002. University of tokyo/ricoh at ntcir-3 web retrieval task. In Proceedings of the Third NTCIR Workshop on research in information Retrieval, Automatic Text Summarization and Question Answering. http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings3/NTCIR3-WEB-ToyodaM.pdf.Google Scholar
- Uchiyama, M. and Isahara, H. 2001. Implementation of an IR package. In IPSJ SIGNotes, 2001-FI-63. 57--64 (in Japan).Google Scholar
- Xu, J. and Croft, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 4--11. Google Scholar
- Yoshioka, M. and Haraguchi, M. 2003. Construction of personalized and purpose-oriented thesaurus. In Proceedings of Asian Association for Lexicography '03 (ASIALEX). 461--466.Google Scholar
- Young, D. and Shneiderman, B. 1993. A graphical filter/flow representation of Boolean queries: A prototype implementation and evaluation. Journal of the American Society of Information Science 44, 6, 327--339. Google Scholar
Index Terms
- On a combination of probabilistic and boolean ir models for WWW document retrieval
Recommendations
Two models of retrieval with probabilistic indexing
SIGIR '86: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrievalWe describe two retrieval models for probabilistic indexing. The binary independence indexing (BII) model is a generalized version of the Maron & Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of ...
Document expansion for image retrieval
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous InformationSuccessful information retrieval requires effective matching between the user's search request and the contents of relevant documents. Often the request entered by a user may not use the same topic relevant terms as the authors' of these documents. One ...
A comparison of Chinese document indexing strategies and retrieval models
With the advent of the Internet and intranets, substantial interest is being shown in Asian language information retrieval; especially in Chinese, which is a good example of an Asian ideographic language (other examples include Japanese and Korean). ...
Comments