Abstract
In many applications, the system needs to selectively present a small subset of answers to users. The set of all possible answers can be seen as an elevation surface over a domain, where the elevation measures the quality of each answer, and the dimensions of the domain correspond to attributes of the answers with which similarity between answers can be measured. This paper considers the problem of finding a diverse set of k high-quality representatives for such a surface. We show that existing methods for diversified top-k and weighted clustering problems are inadequate for this problem. We propose k-DHR as a better formulation for the problem. We show that k-DHR has a submodular and monotone objective function, and we develop efficient algorithms for solving k-DHR with provable guarantees. We conduct extensive experiments to demonstrate the usefulness of the results produced by k-DHR for applications in computational lead-finding and fact-checking, as well as the efficiency and effectiveness of our algorithms.
- M. Ackerman, S. Ben-David, S. Brânzei, and D. Loker Weighted clustering. AAAI, 2012, 858--863. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. SIGIR, 1998, 335--336. Google ScholarDigital Library
- I. Catallo, E. Ciceri, P. Fraternali, D. Martinenghi, and M. Tagliasacchi. Top-k diversity queries over bounded regions. TODS, 38(2), 2013. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B (methodological), 39(1):1--38, 1977.Google Scholar
- J. R. Haritsa. The KNDN problem: A quest for unity in diversity. IEEE DEB, 32(4):15--22, 2009.Google Scholar
- M. Hasan, A. Kashyap, V. Hristidis, and V. J. Tsotras. User effort minimization through adaptive diversification. SIGKDD, 2014, 203--212. Google ScholarDigital Library
- A. Jain, P. Sarda, and J. R. Haritsa. Providing diversity in k-nearest neighbor query results. PAKDD, 2004, 404--413.Google Scholar
- R. Kimmel, A. Amir and A. M. Bruckstein. Finding shortest paths on surfaces using level sets propagation. PAMI, 17(6):635--640, 1995. Google ScholarDigital Library
- R. Kimmel and J. A. Sethian. Computing geodesic paths on manifolds. PNAS, 95(15):8431--8435, 1998.Google ScholarCross Ref
- S. P. Lloyd. Least squares quantization in pcm. Info. Theory, 28(2):129--137, 1982. Google ScholarDigital Library
- J. MacQueen. Some methods for classification and analysis of multivariate observations. BSMSP, 1(14):281--297, 1967.Google Scholar
- N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems. SIAM, 13(1):182--196, 1984.Google ScholarCross Ref
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functionsi. Mathematical Programming, 14(1):265--294, 1978.Google ScholarDigital Library
- L. Qin, J. X. Yu, and L. Chang. Diversifying top-k results. VLDB, 5(11):1124--1135, 2012. Google ScholarDigital Library
- A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency, volume 24. Springer, 2003.Google Scholar
- Y. Wu, P. K. Agarwal, C. Li, J. Yang, and C. Yu. Toward computational fact-checking. VLDB, 7(7):589--600, 2014. Google ScholarDigital Library
- Y. Wu, J. Gao, P. K. Agarwal, and J. Yang. Finding diverse, high-value representatives on a surface of answers. Technical report, Duke University, 2016. http://db.cs.duke.edu/papers/WuGaoEtAl-16-diverse_reprs.pdf.Google Scholar
- T. Zhou, Z. Kuscsik, J-G Liu, M. Medo, J. R. Wakeling, and Y-C Zhang. Solving the apparent diversity-accuracy dilemma of recommender systems. PNAS, 107(10):4511--4515, 2010.Google ScholarCross Ref
Recommendations
Finding more trustworthy answers: Various trustworthiness factors in question answering
In the recent explosion of Web information, it is important to find not only appropriate, but also more trustworthy answers to user questions. This paper proposes an improved ranking model for question answering (QA) which is focused on various answer ...
Finding Answers in Web Search
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalThere are many informational queries that could be answered with a text passage, thereby not requiring the searcher to access the full web document. When building manual annotations of answer passages for TREC queries, Keikha et al. [6] confirmed that ...
Finding and approximating top-k answers in keyword proximity search
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsVarious approaches for keyword proximity search have been implemented in relational databases, XML and the Web. Yet, in all of them, an answer is a Q-fragment, namely, a subtree T of the given data graph G, such that T contains all the keywords of the ...
Comments