ABSTRACT
Information networks are widely used to characterize the relationships between data items such as text documents. Many important retrieval and mining tasks rely on ranking the data items based on their centrality or prestige in the network. Beyond prestige, diversity has been recognized as a crucial objective in ranking, aiming at providing a non-redundant and high coverage piece of information in the top ranked results. Nevertheless, existing network-based ranking approaches either disregard the concern of diversity, or handle it with non-optimized heuristics, usually based on greedy vertex selection.
We propose a novel ranking algorithm, DivRank, based on a reinforced random walk in an information network. This model automatically balances the prestige and the diversity of the top ranked vertices in a principled way. DivRank not only has a clear optimization explanation, but also well connects to classical models in mathematics and network science. We evaluate DivRank using empirical experiments on three different networks as well as a text summarization task. DivRank outperforms existing network-based ranking methods in terms of enhancing diversity in prestige.
Supplemental Material
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 5--14, 2009. Google ScholarDigital Library
- A.-L. Barabasi and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.Google ScholarCross Ref
- J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR '98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335--336, 1998. Google ScholarDigital Library
- C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 659--666, 2008. Google ScholarDigital Library
- G. Erkan and D. R. Radev. Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Int. Res., 22(1):457--479, 2004. Google ScholarDigital Library
- S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 381--390, 2009. Google ScholarDigital Library
- T. H. Haveliwala. Topic-sensitive pagerank. In WWW '02: Proceedings of the 11th international conference on World wide web, pages 517--526, 2002. Google ScholarDigital Library
- J. Hirsch. An index to quantify an individual's scientific research output. PNAS, 102(46):16569--16572, 2005.Google ScholarCross Ref
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- C.-Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram co-occurrence statistics. In NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 71--78, 2003. Google ScholarDigital Library
- R. E. Madsen, D. Kauchak, and C. Elkan. Modeling word burstiness using the dirichlet distribution. In ICML '05: Proceedings of the 22th international conference on Machine learning, pages 545--552, 2005. Google ScholarDigital Library
- Q. Mei, D. Zhang, and C. Zhai. A general optimization framework for smoothing language models on graph structures. In SIGIR '08: Proceedings of the 31th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 611--618, 2008. Google ScholarDigital Library
- M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167--256, 2003.Google ScholarDigital Library
- L. Page, S. Brin, RajeevMotwani, and TerryWinograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
- R. Pemantle. Vertex reinforced random walk. Prob. Th. and Rel. Fields, pages 117--136, 1992.Google ScholarCross Ref
- F. Radlinski, P. N. Bennett, B. Carterette, and T. Joachims. Redundancy, diversity and interdependent document relevance. SIGIR Forum, 43(2):46--52, 2009. Google ScholarDigital Library
- J. Shi and J. Malik. Normalized cuts and image segmentation. In CVPR '97: Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition, pages 731--737, 1997. Google ScholarDigital Library
- C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 10--17, 2003. Google ScholarDigital Library
- B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 504--511, 2005. Google ScholarDigital Library
- Y. Zhang, J. Callan, and T. Minka. Novelty and redundancy detection in adaptive filtering. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 81--88, 2002. Google ScholarDigital Library
- D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Scholkopf. Learning with local and global consistency. In NIPS '04. 2004.Google Scholar
- D. Zhou, J. Huang, and B. Scholkopf. Learning from labeled and unlabeled data on a directed graph. In Proceedings of the 22th international conference on Machine learning, pages 1036--1043, 2005. Google ScholarDigital Library
- D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Scholkopf. Ranking on data manifolds. In NIPS '04. 2004.Google Scholar
- X. Zhu, A. Goldberg, J. Van Gael, and D. Andrzejewski. Improving diversity in ranking using absorbing random walks. In NAACL-HLT 2007, pages 97--104, April 2007.Google Scholar
- C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In WWW '05: Proceedings of the 14th international conference on World wide web, pages 22--32, 2005. Google ScholarDigital Library
Index Terms
- DivRank: the interplay of prestige and diversity in information networks
Recommendations
Decayed DivRank: capturing relevance, diversity and prestige in information networks
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalMany network-based ranking approaches have been proposed to rank objects according to different criteria, including relevance, prestige and diversity. However, existing approaches either only aim at one or two of the criteria, or handle them with ...
Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementThis work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Diverse and Proportional Size-l Object Summaries for Keyword Search
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataThe abundance and ubiquity of graphs (e.g., Online Social Networks such as Google+ and Facebook; bibliographic graphs such as DBLP) necessitates the effective and efficient search over them. Given a set of keywords that can identify a Data Subject (DS), ...
Comments