Article

Free Access

Subtopic structuring for full-length document access

SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrievalJuly 1993Pages 59–68https://doi.org/10.1145/160688.160695

Published:01 July 1993Publication History

SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 59–68

ABSTRACT

We argue that the advent of large volumes of full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure on full-length text documents; that is, a partition of the text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard IR measure.

References

CROFT, W. BRUCE, ROBERT KROVETZ, ~ H. TUR- TLE. 1990. Interactive retrieval of complex documents. Information Process,zng and Management, 26(5):593-616. Google ScholarDigital Library
HAHN, UDO. 1990. Topic parsing: Accounting for text macro structures in full-text analysis. Information Processzng and Management, 26(1):135-170. Google ScholarDigital Library
HEARST, MARTI A. 1993a. Cases as structured indexes for full-length documents. In Procee&ngs of the 1993 AAAI Spring Symposzum on Case-based Reasonzng and Information Retrieval, Stanford,CA.Google Scholar
HEARST, MARTI A. 1993b. TextTiling: A quantitative approach to discourse segmentation. Technical Report 93/24, Sequoia 2000 Technical Report, University of California, Berkeley. Google ScholarDigital Library
LIDDY, ELIZABETH. 1991. The discourse level structure of empirical abstracts - an exploratory study. Information Processzng and Management, 27(1):55- 81. Google ScholarDigital Library
RABINEtt, LAWRENCE R. ~L RONALD W. SCHAFER. 1978. Digital processing of speech signals. New Jersey: Prentice-Hall, Inc.Google Scholar
RO, JUNG SOON. 1988a. An evaluation of the applicability of ranking algorithms to improve the effectiveness of full-text retrieval, i. on the effectiveness of full-text retrieval. Journal of lhe Amemcan Soczety for Information b"czence, 39(2):73-78.Google Scholar
Ro, JUNG SOON. 1988b. An evaluation of the applicability of ranking algorithms to improve the effectiveness of full-text retrieval, i. on the effectiveness of ranking algorithms on full-text retrieval. Journal of the Amcrzcan Society for Informatzon Science, 39(3):147-160.Google ScholarCross Ref
SALTON, GERARD. 1988. Automatzc text processzng : the transformation, analysis, and retmeval of zn- .formation by computer. Reading, MA: Addison- Wesley. Google ScholarDigital Library
SALTON, GERARD ~ CHRIS BUCKLEY. 1991a. Automatic text structuring and retrieval: Experiments in automatic encyclopedia searching. In Proceedzngs of S{GIR, 21-31. Google ScholarDigital Library
SALTON, GERARD & CHRIS BUCKLEY. 1991b. Global text matching for information retrieval. Sczence, 253:1012-1015.Google Scholar
STANFILL, CRAIG & DAVID L. WALTZ. 1992. Statistical methods, artificial intelligence, and information retrieval. In Text-based zntelligent systems: Current research and practzce ~n ~nformat~on extraction and retrieval, ed. by Paul S. Jacobs, 215-226. Lawrence Erlbaum Associates. Google Scholar
TENOPIR, CAROL ~ JUNE SOON RO. 1090. Full text databases. New Directions in Information Management. Greenwood Press. Google ScholarDigital Library
TURTLE, HOWARD. 1991. Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187-222. Google ScholarDigital Library
YAROWSKY, DAVID. 1992. Word sense disambiguation using statistical models of roget's categories trained on large corpora. In Proceedings of the Fourteenth Internat~onal Conference on Computational L~ng~istics, 454-460, Nantes, France. Google ScholarDigital Library

Index Terms

Subtopic structuring for full-length document access
1. Applied computing
  1. Document management and text processing
    1. Document preparation
      1. Format and notation
2. Information systems
  1. Information retrieval
    1. Information retrieval query processing
    2. Retrieval models and ranking

Recommendations

Subtopic Structuring for Full-Length Document Access
Read More
Subtopic-Focused Sentence Scoring in Multi-document Summarization
ALPIT '07: Proceedings of the Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007)

In previous works, subtopics are seldom mentioned in multi-document summarization while only one topic is focused to extract summary. In this paper, we propose a subtopic- focused model to score sentences in the extractive summarization task. Different ...
Read More
Subtopic-based Multi-documents Summarization
CSO '10: Proceedings of the 2010 Third International Joint Conference on Computational Science and Optimization - Volume 02

Multi-documents summarization is an important research area of NLP. Most methods or techniques of multidocument summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
July 1993
361 pages
ISBN:0897916050
DOI:10.1145/160688
Editors:
Robert Korfhage,
Edie Rasmussen,
Peter Willett
Copyright © 1993 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 1993
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 186
  Total Citations
  View Citations
- 1,135
  Total Downloads
- Downloads (Last 12 months)57
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Subtopic structuring for full-length document access

SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Subtopic Structuring for Full-Length Document Access

Subtopic-Focused Sentence Scoring in Multi-document Summarization

Subtopic-based Multi-documents Summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Subtopic structuring for full-length document access

SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Subtopic Structuring for Full-Length Document Access

Subtopic-Focused Sentence Scoring in Multi-document Summarization

Subtopic-based Multi-documents Summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media