Article

Free Access

Multi-document summarization using off the shelf compression software

Authors:
Amardeep Grewal

University of Michigan

University of Michigan
View Profile

,
Timothy Allison

University of Michigan

University of Michigan
View Profile

,
Stanko Dimitrov

University of Michigan

University of Michigan
View Profile

,
Dragomir Radev

University of Michigan

University of Michigan
View Profile

HLT-NAACL-DUC '03: Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5May 2003Pages 17–24https://doi.org/10.3115/1119467.1119470

Published:31 May 2003Publication History

HLT-NAACL-DUC '03: Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5

Pages 17–24

ABSTRACT

This study examines the usefulness of common off the shelf compression software such as gzip in enhancing already existing summaries and producing summaries from scratch. Since the gzip algorithm works by removing repetitive data from a file in order to compress it, we should be able to determine which sentences in a summary contain the least repetitive data by judging the gzipped size of the summary with the sentence compared to the gzipped size of the summary without the sentence. By picking the sentence that increased the size of the summary the most, we hypothesized that the summary will gain the sentence with the most new information. This hypothesis was found to be true in many cases and to varying degrees in this study.

References

Regina Barzilay and Michael Elhadad. 1997. Using Lexical Chains for Text Summarization. In Proceedings of the ACL/EACL'97 Workshop on Intelligent Scalable Text Summarization, pages 10--17, Madrid, Spain, July.Google Scholar
Dario Benedetto, Emanuele Cagliot, and Vittorio Loreto. 2002a. Language trees and zipping. Physical Review Letters, (4).Google Scholar
Dario Benedetto, Emanuele Cagliot, and Vittorio Loreto. 2002b. On J. Goodman's comment to Language Trees and Zipping, cmp-lg preprint archive.Google Scholar
Jaime G. Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Research and Development in Information Retrieval, pages 335--336. Google ScholarDigital Library
H. P. Edmundson. 1969. New Methods in Automatic Extracting. Journal of the Association for Computing Machinery, 16(2):264--285, April. Google ScholarDigital Library
Joshua Goodman. 2002. Extended comment on language trees and zipping. http://arxiv.org/abs/condmat/0202383.Google Scholar
Eduard Hovy and Chin-Yew Lin. 1999. Automated Text Summarization in SUMMARIST. In I. Mani and M. T. Maybury, editors, Advances in Automatic Text Summarization, pages 81--94. The MIT Press.Google Scholar
Julian Kupiec, Jan O. Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Research and Development in Information Retrieval, pages 68--73. Google ScholarDigital Library
H. P. Luhn. 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research Development, 2(2):159--165.Google ScholarDigital Library
Inderjeet Mani. 2001. Automatic Summarization. John Benjamins Publishing Company, Amsterdam/Philadephia.Google Scholar
Dragomir R. Radev, Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies. In ANLP/NAACL Workshop on Summarization, Seattle, WA, April. Google ScholarDigital Library
Dragomir Radev, Simone Teufel, Horacio Saggion, Wai Lam, John Blitzer, Arda Celebi, Hong Qi, Elliott Drabek, and Danyu Liu. 2002. Evaluation of text summarization in a cross-lingual information retrieval framework. Technical report, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, June.Google Scholar
J. Ziv and A. Lempel. 1977. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, 23(3):337--343.Google ScholarDigital Library

Recommendations

Multi-document abstractive summarization using ILP based multi-sentence compression
IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence

Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach ...
Read More
Combining N-Gram and Dependency Word Pair for Multi-document Summarization
CSE '14: Proceedings of the 2014 IEEE 17th International Conference on Computational Science and Engineering

This paper proposes a method for extractive multi-document summarization based on the combined features of n-grams co-occurrences and dependency word pairs co-occurrences. Unigram is the basic text unit, Big ram and skip-big ram reflect the word ...
Read More
Using parallel corpora for multilingual (multi-document) summarisation evaluation
CLEF'10: Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum

We are presenting a method for the evaluation of multilingual multi-document summarisation that allows saving precious annotation time and that makes the evaluation results across languages directly comparable. The approach is based on the manual ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HLT-NAACL-DUC '03: Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5
May 2003
81 pages
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 31 May 2003
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate240of768submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 279
  Total Downloads
- Downloads (Last 12 months)25
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-document summarization using off the shelf compression software

HLT-NAACL-DUC '03: Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5

ABSTRACT

References

Cited By

Recommendations

Multi-document abstractive summarization using ILP based multi-sentence compression

Combining N-Gram and Dependency Word Pair for Multi-document Summarization

Using parallel corpora for multilingual (multi-document) summarisation evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multi-document summarization using off the shelf compression software

HLT-NAACL-DUC '03: Proceedings of the HLT-NAACL 03 on Text summarization workshop - Volume 5

ABSTRACT

References

Cited By

Recommendations

Multi-document abstractive summarization using ILP based multi-sentence compression

Combining N-Gram and Dependency Word Pair for Multi-document Summarization

Using parallel corpora for multilingual (multi-document) summarisation evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media