Abstract
In recent years, graph-based models and ranking algorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e.g. sentence similarity) but neglect the difference among documents and the influence of documents on sentences. In this paper, we present a novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation. By exploiting document–document and document–sentence relations, we distinguish intra-document sentence relations from inter-document sentence relations. In such a way, we move towards the goal of truly summarizing multiple documents rather than a single combined document. Based on this model, we develop an iterative sentence ranking algorithm, namely DsR (Document-Sensitive Ranking). Automatic ROUGE evaluations on the DUC data sets show that DsR outperforms previous graph-based models in both generic and query-oriented summarization tasks.
Similar content being viewed by others
References
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7): 107–117
Erkan G, Radev DR (2004a) LexPageRank: prestige in multi-document text summarization. In: Proceedings of the conference on empirical methods in natural language processing, pp 365–371
Erkan G, Radev DR (2004b) LexRank: graph-based centrality as salience in text summarization. J Artif Intell Res 22: 457–479
Haveliwala TT (2003) Topic-sensitive PageRank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4): 784–796
Kleinberg JM (1999) Authoritative sources in hyperlinked environment. J ACM 46(5): 604–632
Langville AN, Meyer CD (2004) Deeper inside PageRank. J Internet Math 1(3): 335–380
Lin CY, Hovy E (2003) Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of HLT-NAACL, pp 71–78
Lin Z, Chua TS, Kan MY, Lee WS, Qiu L, Ye S (2007) NUS at DUC 2007: using evolutionary models for text. In: Proceedings of Document Understanding Conference (DUC)
MacCluer CR (2000) The many proofs and applications of Perron’s theorem. SIAM Rev 42(3): 487–498
Mihalcea R, Tarau P (2004) TextRank—bringing order into text. In: Proceedings of 2004 conference on empirical methods in natural language processing, pp 404–411
Otterbacher J, Erkan G, Radev DR (2005) Using random walks for question-focused sentence retrieval. In: Proceedings of the human language technology conference/conference on empirical methods in natural language processing, pp 915–922
Padmanabhan D, Desikan P, Srivastava J, Riaz K (2005) WICER: A weighted inter-cluster edge ranking for clustered graphs. In: Proceedings of 2005 IEEE/WIC/ACM international conference on web intelligence, pp 522–528
Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. Stanford University (manuscript in Progress)
Radev DR, Jing HY, Stys M, Tam D (2003) Centroid-based summarization of multiple documents. Inf Process Manage 40: 919–938
Tong H., Faloutsos C, Pan JY (2008) Random walk with restart: fast solutions and applications. Knowl Inf Syst 14(3): 327–346
Varadarajan R, Hristidis V (2006) A system for query-specific document summarization. In: Proceedings of the 15th ACM conference on information and knowledge management, pp 622–631
Wan X, Yang J, Xiao J (2006a) Using cross-document random walks for topic-focused multi-document summarization. In: Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence, pp 1012–1018
Wan X, Yang J, Xiao J (2006b) The great importance of cross-document relationships for multi- document summarization. In: Proceedings of the 21st international conference on the computer processing of oriental languages, pp 131–138
Wu X, Kumar V, Quinlan JR et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Yoshioka M, Haraguchi M (2004) Multiple news articles summarization based on event reference information. In Working Notes of NTCIR-4
Zha HY (2002) Generic summarization and key phrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, pp 113–120
Zhang Y, Chu CH, Ji X, Zha HY (2004) Correlating summarization of multi-source news with K-way graph bi-clustering. ACM SIGKDD Explor Newslett 6(2): 34–42
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wei, F., Li, W., Lu, Q. et al. A document-sensitive graph model for multi-document summarization. Knowl Inf Syst 22, 245–259 (2010). https://doi.org/10.1007/s10115-009-0194-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0194-2