article

Revisiting Document Length Hypotheses: A Comparative Study of Japanese Newspaper and Patent Retrieval

Author:
Sumio Fujita

Yahoo Japan Corporation, Mori-tower, Roppongi 6-10-1, Minato-ku, Tokyo 106-6182, Japan

Yahoo Japan Corporation, Mori-tower, Roppongi 6-10-1, Minato-ku, Tokyo 106-6182, Japan
View Profile

ACM Transactions on Asian Language Information Processing Volume 4 Issue 2pp 207–235https://doi.org/10.1145/1105696.1105853

Published:01 June 2005Publication History

ACM Transactions on Asian Language Information Processing

References

BERGER, A., AND LAFFERTY, J. 1999. Information retrieval as statistical translation. In Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, 222-229. Google Scholar
CALLAN, J. P., LU, Z., AND CROFT, W. B. 1995. Searching distributed collections with inference networks. In Proceedings of the 1995 ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, 21-28. Google Scholar
CHEN, K. H., CHEN, H. H., KISHIDA, K., KURIYAMA, K., KANDO, N., LEE, S., MYAENG, S. H., EGUCHII, K., AND KIM, H. 2002. Overview of CLIR task at the third NTCIR workshop. In Working Notes of the Third NTCIR Workshop Meeting, Part I: Overview, 23-60.Google Scholar
EVANS, D. A., AND LEFFERTS, R. 1993. Design and evaluation of the CLARIT-TREC-2 System. In NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC 2), 137-150.Google Scholar
FANG, H. TAO, T., AND ZHAI, C. 2003. An exploration of formalized information retrieval heuristics. In Proceedings of the ACM SIGIR 2003 Workshop on Mathematical/Formal Methods in IR, Toronto, Canada. Google Scholar
FUJII, A., IWAYAMA, M., AND KANDO, N. 2004. Overview of Patent Retrieval Task at NTCIR-4. In Working notes of the fourth NTCIR workshop meeting, 225-232.Google Scholar
FUJITA, S. 2000. Reflections on "Aboutness"-TREC-9 Evaluation experiments at Justsystem. In NIST Special Publication 500-249: The Ninth Text REtrieval Conference (TREC 9), 281-288.Google Scholar
FUJITA, S. 2001. More reflections on "Aboutness"-TREC-2001 Evaluation experiments at Justsystem. In NIST Special Publication 500-250: The Tenth Text REtrieval Conference (TREC 2001), 331-338.Google Scholar
HIEMSTRA, D., AND KRAAIJ, W. 1998. Twenty-one at TREC-7: Ad-hoc and cross-language track. In NIST Special Publication 500-242: The Seventh Text REtrieval Conference (TREC 7), 227-238.Google Scholar
INTERNATIONAL PATENT CLASSIFICATION (IPC). http://www.wipo.int/classifications/fulltext/new_ipc/Google Scholar
IWAYAMA, M., FUJII, A., KANDO, N., AND TAKANO, A. 2002. Overview of Patent Retrieval Task at NTCIR-3. In Working notes of the third NTCIR workshop meeting, Part I: Overview, 67-76.Google Scholar
IWAYAMA, M., FUJII, A., KANDO, N., AND MARUKAWA, Y. 2003. An empirical study on retrieval models for different document genres: patents and newspaper articles. In Proceedings of the 2003 ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, Canada, 251-258. Google Scholar
KANDO, N. 2004. Overview of the Fourth NTCIR Workshop. In Working notes of the fourth NTCIR workshop meeting, i-viii.Google Scholar
KISHIDA, K., CHEN, K. H., LEE, S., KURIYAMA, K., KANDO, N., CHEN, H. H., MYAENG, S. H., AND EGUCHI, K. 2004. Overview of CLIR Task at the Fourth NTCIR Workshop. In Working notes of the fourth NTCIR workshop meeting, 1-59.Google Scholar
KRAAIJ, W., WESTERVELD, T., AND HIEMSTRA, D. 2002. The importance of prior probabilities for entry page search. In Proceedings of the Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), Tampere, Finland, 27-34. Google Scholar
KWOK, K. L., PAPADOPOLOUS, L., AND KWAN, K. Y. Y. 1992. Retrieval experiments with a large collection using PIRCS, NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1), 153-172.Google Scholar
LAFFERTY, J., AND ZHAI, C. 2001. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 2001 ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, 111-119. Google Scholar
LARKEY, L. S. 1999. A patent search and classification system. In Digital Libraries 99 - The Fourth ACM Conference on Digital Libraries, Berkeley, CA, Aug 1999, 79-87. Google Scholar
LARKEY, L. S., CONNELL, M., AND CALLAN, J. 2000. Collection selection and results merging with topically organized U. S. Patents and TREC Data. In Proceedings of the Ninth International Conference on Information Knowledge and Management, Washington D.C., 2000, 282-289. Google Scholar
MILLER, D. H., LEEK, T., AND SCHWARTZ, R. 1999. A hidden Markov model information retrieval system, In Proceedings of the 1999 ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, CA, 214-221. Google Scholar
OGILVIE, O., AND CALLAN, J. 2002. Experiments using the lemur toolkit. In NIST Special Publication 500-250: The Tenth Text REtrieval Conference (TREC 2001), 103-108.Google Scholar
PONTE, J., AND COFT, W. B. 1998. A language modeling approach to information retrieval. In Proceedings of the 1998 ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 275-281. Google Scholar
ROBERTSON, S. E., AND WALKER S. 1994. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 1994 ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 232-241. Google Scholar
ROBERTSON, S. E., WALKER, S., JONES, S. M., HANCOCK-BEAULIEU, M., AND GATFORD, M. 1995. Okapi at TREC-3. In NIST Special Publication 500-226: Overview of the Third Text REtrieval Conference (TREC-3), 109-126.Google Scholar
ROBERTSON, S. E., AND WALKER S. 1997. On relevance weights with little relevance information. In Proceedings of the 1997 ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, PA, 16-24. Google Scholar
ROCCHIO, J. J. 1971. Relevance feedback in information retrieval, In The SMART Retrieval System: Experiments in Automatic Document Processing, G. SALTON, ed., Prentice-Hall, Englewood Cliffs, NJ, 313-323.Google Scholar
SALTON, G. 1988. Automatic Text Processing--The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley Publishing Company, Reading, MA. Google Scholar
SINGHAL, A., BUCKLEY, C., AND MITRA, M. 1996. Pivoted document length normalization. In Proceedings of the 1996 ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, 21-29. Google Scholar
WESTERVELD, T., KRAAIJ, W., AND HIEMSTRA, D. 2002. Retrieving Web pages using content, links, URLs and anchors. In NIST Special Publication 500-250: The Tenth Text REtrieval Conference (TREC 2001), 663- 672.Google Scholar
ZHAI, C., AND LAFFERTY, J. 2001a. Model-based feedback in the KL-divergence retrieval model. In Proceedings of the Tenth International Conference on Information and Knowledge Management (CIKM 2001), Atlanta, GA, 403-410. Google Scholar
ZHAI, C., AND LAFFERTY, J. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 2001 ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, 334-342. Google Scholar

Index Terms

Revisiting Document Length Hypotheses: A Comparative Study of Japanese Newspaper and Patent Retrieval

Recommendations

Document Length Normalization
Read More
Revisiting the relationship between document length and relevance
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

The scope hypothesis in Information Retrieval (IR) states that a relationship exists between document length and relevance, such that the likelihood of relevance increases with document length. A number of empirical studies have provided statistical ...
Read More
Adapting pivoted document-length normalization for query size: Experiments in Chinese and English

The vector space model (VSM) is one of the most widely used information retrieval (IR) models in both academia and industry. It was less effective at the Chinese ad hoc retrieval tasks than other retrieval models in the NTCIR-3 evaluation workshop, but ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 4, Issue 2
June 2005
179 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/1105696
Issue’s Table of Contents

Copyright © 2005 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2005
Published in talip Volume 4, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 448
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Revisiting Document Length Hypotheses: A Comparative Study of Japanese Newspaper and Patent Retrieval

ACM Transactions on Asian Language Information Processing

References

Cited By

Index Terms

Recommendations

Document Length Normalization

Revisiting the relationship between document length and relevance

Adapting pivoted document-length normalization for query size: Experiments in Chinese and English

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Revisiting Document Length Hypotheses: A Comparative Study of Japanese Newspaper and Patent Retrieval

ACM Transactions on Asian Language Information Processing

References

Cited By

Index Terms

Recommendations

Document Length Normalization

Revisiting the relationship between document length and relevance

Adapting pivoted document-length normalization for query size: Experiments in Chinese and English

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media