skip to main content
10.1145/160688.160695acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

Subtopic structuring for full-length document access

Published:01 July 1993Publication History

ABSTRACT

We argue that the advent of large volumes of full-length text, as opposed to short texts like abstracts and newswire, should be accompanied by corresponding new approaches to information access. Toward this end, we discuss the merits of imposing structure on full-length text documents; that is, a partition of the text into coherent multi-paragraph units that represent the pattern of subtopics that comprise the text. Using this structure, we can make a distinction between the main topics, which occur throughout the length of the text, and the subtopics, which are of only limited extent. We discuss why recognition of subtopic structure is important and how, to some degree of accuracy, it can be found. We describe a new way of specifying queries on full-length documents and then describe an experiment in which making use of the recognition of local structure achieves better results on a typical information retrieval task than does a standard IR measure.

References

  1. CROFT, W. BRUCE, ROBERT KROVETZ, ~ H. TUR- TLE. 1990. Interactive retrieval of complex documents. Information Process,zng and Management, 26(5):593-616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. HAHN, UDO. 1990. Topic parsing: Accounting for text macro structures in full-text analysis. Information Processzng and Management, 26(1):135-170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. HEARST, MARTI A. 1993a. Cases as structured indexes for full-length documents. In Procee&ngs of the 1993 AAAI Spring Symposzum on Case-based Reasonzng and Information Retrieval, Stanford,CA.Google ScholarGoogle Scholar
  4. HEARST, MARTI A. 1993b. TextTiling: A quantitative approach to discourse segmentation. Technical Report 93/24, Sequoia 2000 Technical Report, University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. LIDDY, ELIZABETH. 1991. The discourse level structure of empirical abstracts - an exploratory study. Information Processzng and Management, 27(1):55- 81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. RABINEtt, LAWRENCE R. ~L RONALD W. SCHAFER. 1978. Digital processing of speech signals. New Jersey: Prentice-Hall, Inc.Google ScholarGoogle Scholar
  7. RO, JUNG SOON. 1988a. An evaluation of the applicability of ranking algorithms to improve the effectiveness of full-text retrieval, i. on the effectiveness of full-text retrieval. Journal of lhe Amemcan Soczety for Information b"czence, 39(2):73-78.Google ScholarGoogle Scholar
  8. Ro, JUNG SOON. 1988b. An evaluation of the applicability of ranking algorithms to improve the effectiveness of full-text retrieval, i. on the effectiveness of ranking algorithms on full-text retrieval. Journal of the Amcrzcan Society for Informatzon Science, 39(3):147-160.Google ScholarGoogle ScholarCross RefCross Ref
  9. SALTON, GERARD. 1988. Automatzc text processzng : the transformation, analysis, and retmeval of zn- .formation by computer. Reading, MA: Addison- Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. SALTON, GERARD ~ CHRIS BUCKLEY. 1991a. Automatic text structuring and retrieval: Experiments in automatic encyclopedia searching. In Proceedzngs of S{GIR, 21-31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. SALTON, GERARD & CHRIS BUCKLEY. 1991b. Global text matching for information retrieval. Sczence, 253:1012-1015.Google ScholarGoogle Scholar
  12. STANFILL, CRAIG & DAVID L. WALTZ. 1992. Statistical methods, artificial intelligence, and information retrieval. In Text-based zntelligent systems: Current research and practzce ~n ~nformat~on extraction and retrieval, ed. by Paul S. Jacobs, 215-226. Lawrence Erlbaum Associates. Google ScholarGoogle Scholar
  13. TENOPIR, CAROL ~ JUNE SOON RO. 1090. Full text databases. New Directions in Information Management. Greenwood Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. TURTLE, HOWARD. 1991. Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187-222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. YAROWSKY, DAVID. 1992. Word sense disambiguation using statistical models of roget's categories trained on large corpora. In Proceedings of the Fourteenth Internat~onal Conference on Computational L~ng~istics, 454-460, Nantes, France. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Subtopic structuring for full-length document access

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
          July 1993
          361 pages
          ISBN:0897916050
          DOI:10.1145/160688

          Copyright © 1993 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 July 1993

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader