skip to main content
research-article

Finding diverse, high-value representatives on a surface of answers

Published:01 March 2017Publication History
Skip Abstract Section

Abstract

In many applications, the system needs to selectively present a small subset of answers to users. The set of all possible answers can be seen as an elevation surface over a domain, where the elevation measures the quality of each answer, and the dimensions of the domain correspond to attributes of the answers with which similarity between answers can be measured. This paper considers the problem of finding a diverse set of k high-quality representatives for such a surface. We show that existing methods for diversified top-k and weighted clustering problems are inadequate for this problem. We propose k-DHR as a better formulation for the problem. We show that k-DHR has a submodular and monotone objective function, and we develop efficient algorithms for solving k-DHR with provable guarantees. We conduct extensive experiments to demonstrate the usefulness of the results produced by k-DHR for applications in computational lead-finding and fact-checking, as well as the efficiency and effectiveness of our algorithms.

References

  1. M. Ackerman, S. Ben-David, S. Brânzei, and D. Loker Weighted clustering. AAAI, 2012, 858--863. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. SIGIR, 1998, 335--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Catallo, E. Ciceri, P. Fraternali, D. Martinenghi, and M. Tagliasacchi. Top-k diversity queries over bounded regions. TODS, 38(2), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B (methodological), 39(1):1--38, 1977.Google ScholarGoogle Scholar
  5. J. R. Haritsa. The KNDN problem: A quest for unity in diversity. IEEE DEB, 32(4):15--22, 2009.Google ScholarGoogle Scholar
  6. M. Hasan, A. Kashyap, V. Hristidis, and V. J. Tsotras. User effort minimization through adaptive diversification. SIGKDD, 2014, 203--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Jain, P. Sarda, and J. R. Haritsa. Providing diversity in k-nearest neighbor query results. PAKDD, 2004, 404--413.Google ScholarGoogle Scholar
  8. R. Kimmel, A. Amir and A. M. Bruckstein. Finding shortest paths on surfaces using level sets propagation. PAMI, 17(6):635--640, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Kimmel and J. A. Sethian. Computing geodesic paths on manifolds. PNAS, 95(15):8431--8435, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. P. Lloyd. Least squares quantization in pcm. Info. Theory, 28(2):129--137, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. MacQueen. Some methods for classification and analysis of multivariate observations. BSMSP, 1(14):281--297, 1967.Google ScholarGoogle Scholar
  12. N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems. SIAM, 13(1):182--196, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  13. G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functionsi. Mathematical Programming, 14(1):265--294, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Qin, J. X. Yu, and L. Chang. Diversifying top-k results. VLDB, 5(11):1124--1135, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency, volume 24. Springer, 2003.Google ScholarGoogle Scholar
  16. Y. Wu, P. K. Agarwal, C. Li, J. Yang, and C. Yu. Toward computational fact-checking. VLDB, 7(7):589--600, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Wu, J. Gao, P. K. Agarwal, and J. Yang. Finding diverse, high-value representatives on a surface of answers. Technical report, Duke University, 2016. http://db.cs.duke.edu/papers/WuGaoEtAl-16-diverse_reprs.pdf.Google ScholarGoogle Scholar
  18. T. Zhou, Z. Kuscsik, J-G Liu, M. Medo, J. R. Wakeling, and Y-C Zhang. Solving the apparent diversity-accuracy dilemma of recommender systems. PNAS, 107(10):4511--4515, 2010.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 10, Issue 7
    March 2017
    132 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 March 2017
    Published in pvldb Volume 10, Issue 7

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader