ABSTRACT
This study examines the usefulness of common off the shelf compression software such as gzip in enhancing already existing summaries and producing summaries from scratch. Since the gzip algorithm works by removing repetitive data from a file in order to compress it, we should be able to determine which sentences in a summary contain the least repetitive data by judging the gzipped size of the summary with the sentence compared to the gzipped size of the summary without the sentence. By picking the sentence that increased the size of the summary the most, we hypothesized that the summary will gain the sentence with the most new information. This hypothesis was found to be true in many cases and to varying degrees in this study.
- Regina Barzilay and Michael Elhadad. 1997. Using Lexical Chains for Text Summarization. In Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, pages 10--17, Madrid, Spain, July.Google Scholar
- Dario Benedetto, Emanuele Cagliot, and Vittorio Loreto. 2002a. Language trees and zipping. Physical Review Letters, (4).Google Scholar
- Dario Benedetto, Emanuele Cagliot, and Vittorio Loreto. 2002b. On J. Goodman's comment to Language Trees and Zipping, cmp-lg preprint archive.Google Scholar
- Jaime G. Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Research and Development in Information Retrieval, pages 335--336. Google ScholarDigital Library
- H. P. Edmundson. 1969. New Methods in Automatic Extracting. Journal of the Association for Computing Machinery, 16(2):264--285, April. Google ScholarDigital Library
- Joshua Goodman. 2002. Extended comment on language trees and zipping. http://arxiv.org/abs/condmat/0202383.Google Scholar
- Eduard Hovy and Chin-Yew Lin. 1999. Automated Text Summarization in SUMMARIST. In I. Mani and M. T. Maybury, editors, Advances in Automatic Text Summarization, pages 81--94. The MIT Press.Google Scholar
- Julian Kupiec, Jan O. Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Research and Development in Information Retrieval, pages 68--73. Google ScholarDigital Library
- H. P. Luhn. 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research Development, 2(2):159--165.Google ScholarDigital Library
- Inderjeet Mani. 2001. Automatic Summarization. John Benjamins Publishing Company, Amsterdam/Philadephia.Google Scholar
- Dragomir R. Radev, Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In ANLP/NAACL Workshop on Summarization, Seattle, WA, April. Google ScholarDigital Library
- Dragomir Radev, Simone Teufel, Horacio Saggion, Wai Lam, John Blitzer, Arda Celebi, Hong Qi, Elliott Drabek, and Danyu Liu. 2002. Evaluation of text summarization in a cross-lingual information retrieval framework. Technical report, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, June.Google Scholar
- J. Ziv and A. Lempel. 1977. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, 23(3):337--343.Google ScholarDigital Library
Recommendations
Multi-document abstractive summarization using ILP based multi-sentence compression
IJCAI'15: Proceedings of the 24th International Conference on Artificial IntelligenceAbstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach ...
Combining N-Gram and Dependency Word Pair for Multi-document Summarization
CSE '14: Proceedings of the 2014 IEEE 17th International Conference on Computational Science and EngineeringThis paper proposes a method for extractive multi-document summarization based on the combined features of n-grams co-occurrences and dependency word pairs co-occurrences. Unigram is the basic text unit, Big ram and skip-big ram reflect the word ...
Using parallel corpora for multilingual (multi-document) summarisation evaluation
CLEF'10: Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forumWe are presenting a method for the evaluation of multilingual multi-document summarisation that allows saving precious annotation time and that makes the evaluation results across languages directly comparable. The approach is based on the manual ...
Comments