skip to main content
10.3115/1119467.1119470dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
Article
Free Access

Multi-document summarization using off the shelf compression software

Published:31 May 2003Publication History

ABSTRACT

This study examines the usefulness of common off the shelf compression software such as gzip in enhancing already existing summaries and producing summaries from scratch. Since the gzip algorithm works by removing repetitive data from a file in order to compress it, we should be able to determine which sentences in a summary contain the least repetitive data by judging the gzipped size of the summary with the sentence compared to the gzipped size of the summary without the sentence. By picking the sentence that increased the size of the summary the most, we hypothesized that the summary will gain the sentence with the most new information. This hypothesis was found to be true in many cases and to varying degrees in this study.

References

  1. Regina Barzilay and Michael Elhadad. 1997. Using Lexical Chains for Text Summarization. In Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, pages 10--17, Madrid, Spain, July.Google ScholarGoogle Scholar
  2. Dario Benedetto, Emanuele Cagliot, and Vittorio Loreto. 2002a. Language trees and zipping. Physical Review Letters, (4).Google ScholarGoogle Scholar
  3. Dario Benedetto, Emanuele Cagliot, and Vittorio Loreto. 2002b. On J. Goodman's comment to Language Trees and Zipping, cmp-lg preprint archive.Google ScholarGoogle Scholar
  4. Jaime G. Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Research and Development in Information Retrieval, pages 335--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. P. Edmundson. 1969. New Methods in Automatic Extracting. Journal of the Association for Computing Machinery, 16(2):264--285, April. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Joshua Goodman. 2002. Extended comment on language trees and zipping. http://arxiv.org/abs/condmat/0202383.Google ScholarGoogle Scholar
  7. Eduard Hovy and Chin-Yew Lin. 1999. Automated Text Summarization in SUMMARIST. In I. Mani and M. T. Maybury, editors, Advances in Automatic Text Summarization, pages 81--94. The MIT Press.Google ScholarGoogle Scholar
  8. Julian Kupiec, Jan O. Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Research and Development in Information Retrieval, pages 68--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. P. Luhn. 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research Development, 2(2):159--165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Inderjeet Mani. 2001. Automatic Summarization. John Benjamins Publishing Company, Amsterdam/Philadephia.Google ScholarGoogle Scholar
  11. Dragomir R. Radev, Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In ANLP/NAACL Workshop on Summarization, Seattle, WA, April. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dragomir Radev, Simone Teufel, Horacio Saggion, Wai Lam, John Blitzer, Arda Celebi, Hong Qi, Elliott Drabek, and Danyu Liu. 2002. Evaluation of text summarization in a cross-lingual information retrieval framework. Technical report, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, June.Google ScholarGoogle Scholar
  13. J. Ziv and A. Lempel. 1977. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, 23(3):337--343.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image DL Hosted proceedings
    HLT-NAACL-DUC '03: Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
    May 2003
    81 pages

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    • Published: 31 May 2003

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate240of768submissions,31%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader