research-article

Free Access

Reducing redundancy in multi-document summarization using lexical semantic similarity

Authors:
Iris Hendrickx

University of Antwerp, Antwerpen, Belgium

University of Antwerp, Antwerpen, Belgium
View Profile

,
Walter Daelemans

University of Antwerp, Antwerpen, Belgium

University of Antwerp, Antwerpen, Belgium
View Profile

,
Erwin Marsi

Tilburg University, Tilburg, The Netherlands

Tilburg University, Tilburg, The Netherlands
View Profile

,
Emiel Krahmer

Tilburg University, Tilburg, The Netherlands

Tilburg University, Tilburg, The Netherlands
View Profile

Authors Info & Claims

UCNLG+Sum '09: Proceedings of the 2009 Workshop on Language Generation and SummarisationAugust 2009Pages 63–66

Published:06 August 2009Publication History

UCNLG+Sum '09: Proceedings of the 2009 Workshop on Language Generation and Summarisation

Pages 63–66

ABSTRACT

We present an automatic multi-document summarization system for Dutch based on the MEAD system. We focus on redundancy detection, an essential ingredient of multi-document summarization. We introduce a semantic overlap detection tool, which goes beyond simple string matching. Our results so far do not confirm our expectation that this tool would outperform the other tested methods.

References

Regina Barzilay and Kathleen R. McKeown. 2005. Sentence fusion for multidocument news summarization. Computational Linguistics, 31(3):297--328. Google ScholarCross Ref
Gosse Bouma, Gertjan van Noord, and Robert Malouf. 2001. Alpino: Wide-coverage computational analysis of Dutch. In Computational Linguistics in the Netherlands 2000., pages 45--59. Rodopi, Amsterdam, New York.Google Scholar
Jaime Carbonell and Jade Goldstein. 1998. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR 1998, pages 335--336, New York, NY, USA. ACM. Google ScholarDigital Library
H. T. Dang. 2006. Overview of DUC 2006. In Proceedings of the Document Understanding Workshop, pages 1--10, Brooklyn, USA.Google Scholar
Harold W. Kuhn. 1955. The Hungarian Method for the assignment problem. Naval Research Logistics Quarterly, 2:83--97.Google ScholarCross Ref
C.-Y. Lin and E. H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of HLT-NAACL, pages 71--78, Edmonton, Canada. Google ScholarDigital Library
D. Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the ICML, pages 296--304. Google ScholarDigital Library
Erwin Marsi and Emiel Krahmer. 2007. Annotating a parallel monolingual treebank with semantic similarity relations. In Proceedings of the 6th International Workshop on Treebanks and Linguistic Theories, pages 85--96, Bergen, Norway.Google Scholar
Dragomir Radev et al. 2004. Mead-a platform for multidocument multilingual text summarization. In Proceedings of LREC 2004, Lisabon, Portugal.Google Scholar
Karen Spärck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21.Google ScholarCross Ref
P. Vossen, I. Maks, R. Segers, and H. van der Vliet. 2008. Integrating lexical units, synsets and ontology in the Cornetto Database. In Proceedings of LREC 2008, Marrakech, Morocco.Google Scholar

Index Terms

Reducing redundancy in multi-document summarization using lexical semantic similarity
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Read More
Hybrid multi-document summarization using pre-trained language models
Abstract
Abstractive multi-document summarization is a type of automatic text summarization. It obtains information from multiple documents and generates a human-like summary from them. In this paper, we propose an abstractive multi-document ...
Highlights
- Introducing a multi-document summarizer, called HMSumm, based on pre-trained methods.
Read More
Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
UCNLG+Sum '09: Proceedings of the 2009 Workshop on Language Generation and Summarisation
August 2009
108 pages
ISBN:9781932432510
General Chairs:
Anja Belz
University of Brighton, UK
,
Roger Evans
University of Brighton, UK
,
Sebastian Varges
University of Trento, Italy
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 6 August 2009
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 565
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reducing redundancy in multi-document summarization using lexical semantic similarity

UCNLG+Sum '09: Proceedings of the 2009 Workshop on Language Generation and Summarisation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Latent dirichlet allocation based multi-document summarization

Hybrid multi-document summarization using pre-trained language models

Research on Multi-document Summarization Based on LDA Topic Model

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Reducing redundancy in multi-document summarization using lexical semantic similarity

UCNLG+Sum '09: Proceedings of the 2009 Workshop on Language Generation and Summarisation

ABSTRACT

References

Cited By

Index Terms

Recommendations

Latent dirichlet allocation based multi-document summarization

Hybrid multi-document summarization using pre-trained language models

Research on Multi-document Summarization Based on LDA Topic Model

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media