ABSTRACT
We argue that the advent of large volumes of full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure on full-length text documents; that is, a partition of the text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard IR measure.
- CROFT, W. BRUCE, ROBERT KROVETZ, ~ H. TUR- TLE. 1990. Interactive retrieval of complex documents. Information Process,zng and Management, 26(5):593-616. Google ScholarDigital Library
- HAHN, UDO. 1990. Topic parsing: Accounting for text macro structures in full-text analysis. Information Processzng and Management, 26(1):135-170. Google ScholarDigital Library
- HEARST, MARTI A. 1993a. Cases as structured indexes for full-length documents. In Procee&ngs of the 1993 AAAI Spring Symposzum on Case-based Reasonzng and Information Retrieval, Stanford,CA.Google Scholar
- HEARST, MARTI A. 1993b. TextTiling: A quantitative approach to discourse segmentation. Technical Report 93/24, Sequoia 2000 Technical Report, University of California, Berkeley. Google ScholarDigital Library
- LIDDY, ELIZABETH. 1991. The discourse level structure of empirical abstracts - an exploratory study. Information Processzng and Management, 27(1):55- 81. Google ScholarDigital Library
- RABINEtt, LAWRENCE R. ~L RONALD W. SCHAFER. 1978. Digital processing of speech signals. New Jersey: Prentice-Hall, Inc.Google Scholar
- RO, JUNG SOON. 1988a. An evaluation of the applicability of ranking algorithms to improve the effectiveness of full-text retrieval, i. on the effectiveness of full-text retrieval. Journal of lhe Amemcan Soczety for Information b"czence, 39(2):73-78.Google Scholar
- Ro, JUNG SOON. 1988b. An evaluation of the applicability of ranking algorithms to improve the effectiveness of full-text retrieval, i. on the effectiveness of ranking algorithms on full-text retrieval. Journal of the Amcrzcan Society for Informatzon Science, 39(3):147-160.Google ScholarCross Ref
- SALTON, GERARD. 1988. Automatzc text processzng : the transformation, analysis, and retmeval of zn- .formation by computer. Reading, MA: Addison- Wesley. Google ScholarDigital Library
- SALTON, GERARD ~ CHRIS BUCKLEY. 1991a. Automatic text structuring and retrieval: Experiments in automatic encyclopedia searching. In Proceedzngs of S{GIR, 21-31. Google ScholarDigital Library
- SALTON, GERARD & CHRIS BUCKLEY. 1991b. Global text matching for information retrieval. Sczence, 253:1012-1015.Google Scholar
- STANFILL, CRAIG & DAVID L. WALTZ. 1992. Statistical methods, artificial intelligence, and information retrieval. In Text-based zntelligent systems: Current research and practzce ~n ~nformat~on extraction and retrieval, ed. by Paul S. Jacobs, 215-226. Lawrence Erlbaum Associates. Google Scholar
- TENOPIR, CAROL ~ JUNE SOON RO. 1090. Full text databases. New Directions in Information Management. Greenwood Press. Google ScholarDigital Library
- TURTLE, HOWARD. 1991. Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187-222. Google ScholarDigital Library
- YAROWSKY, DAVID. 1992. Word sense disambiguation using statistical models of roget's categories trained on large corpora. In Proceedings of the Fourteenth Internat~onal Conference on Computational L~ng~istics, 454-460, Nantes, France. Google ScholarDigital Library
Index Terms
- Subtopic structuring for full-length document access
Recommendations
Subtopic-Focused Sentence Scoring in Multi-document Summarization
ALPIT '07: Proceedings of the Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007)In previous works, subtopics are seldom mentioned in multi-document summarization while only one topic is focused to extract summary. In this paper, we propose a subtopic- focused model to score sentences in the extractive summarization task. Different ...
Subtopic-based Multi-documents Summarization
CSO '10: Proceedings of the 2010 Third International Joint Conference on Computational Science and Optimization - Volume 02Multi-documents summarization is an important research area of NLP. Most methods or techniques of multidocument summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic ...
Comments