research-article

Finding diverse, high-value representatives on a surface of answers

Authors:
You Wu

Google Research and Duke University

Google Research and Duke University
View Profile

,
Junyang Gao

Duke University

Duke University
View Profile

,
Pankaj K. Agarwal

Duke University

Duke University
View Profile

,
Jun Yang

Duke University

Duke University
View Profile

Proceedings of the VLDB Endowment Volume 10 Issue 7pp 793–804https://doi.org/10.14778/3067421.3067428

Published:01 March 2017Publication History

Proceedings of the VLDB Endowment

Abstract

In many applications, the system needs to selectively present a small subset of answers to users. The set of all possible answers can be seen as an elevation surface over a domain, where the elevation measures the quality of each answer, and the dimensions of the domain correspond to attributes of the answers with which similarity between answers can be measured. This paper considers the problem of finding a diverse set of k high-quality representatives for such a surface. We show that existing methods for diversified top-k and weighted clustering problems are inadequate for this problem. We propose k-DHR as a better formulation for the problem. We show that k-DHR has a submodular and monotone objective function, and we develop efficient algorithms for solving k-DHR with provable guarantees. We conduct extensive experiments to demonstrate the usefulness of the results produced by k-DHR for applications in computational lead-finding and fact-checking, as well as the efficiency and effectiveness of our algorithms.

References

M. Ackerman, S. Ben-David, S. Brânzei, and D. Loker Weighted clustering. AAAI, 2012, 858--863. Google ScholarDigital Library
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. SIGIR, 1998, 335--336. Google ScholarDigital Library
I. Catallo, E. Ciceri, P. Fraternali, D. Martinenghi, and M. Tagliasacchi. Top-k diversity queries over bounded regions. TODS, 38(2), 2013. Google ScholarDigital Library
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society Series B (methodological), 39(1):1--38, 1977.Google Scholar
J. R. Haritsa. The KNDN problem: A quest for unity in diversity. IEEE DEB, 32(4):15--22, 2009.Google Scholar
M. Hasan, A. Kashyap, V. Hristidis, and V. J. Tsotras. User effort minimization through adaptive diversification. SIGKDD, 2014, 203--212. Google ScholarDigital Library
A. Jain, P. Sarda, and J. R. Haritsa. Providing diversity in k-nearest neighbor query results. PAKDD, 2004, 404--413.Google Scholar
R. Kimmel, A. Amir and A. M. Bruckstein. Finding shortest paths on surfaces using level sets propagation. PAMI, 17(6):635--640, 1995. Google ScholarDigital Library
R. Kimmel and J. A. Sethian. Computing geodesic paths on manifolds. PNAS, 95(15):8431--8435, 1998.Google ScholarCross Ref
S. P. Lloyd. Least squares quantization in pcm. Info. Theory, 28(2):129--137, 1982. Google ScholarDigital Library
J. MacQueen. Some methods for classification and analysis of multivariate observations. BSMSP, 1(14):281--297, 1967.Google Scholar
N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems. SIAM, 13(1):182--196, 1984.Google ScholarCross Ref
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functionsi. Mathematical Programming, 14(1):265--294, 1978.Google ScholarDigital Library
L. Qin, J. X. Yu, and L. Chang. Diversifying top-k results. VLDB, 5(11):1124--1135, 2012. Google ScholarDigital Library
A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency, volume 24. Springer, 2003.Google Scholar
Y. Wu, P. K. Agarwal, C. Li, J. Yang, and C. Yu. Toward computational fact-checking. VLDB, 7(7):589--600, 2014. Google ScholarDigital Library
Y. Wu, J. Gao, P. K. Agarwal, and J. Yang. Finding diverse, high-value representatives on a surface of answers. Technical report, Duke University, 2016. http://db.cs.duke.edu/papers/WuGaoEtAl-16-diverse_reprs.pdf.Google Scholar
T. Zhou, Z. Kuscsik, J-G Liu, M. Medo, J. R. Wakeling, and Y-C Zhang. Solving the apparent diversity-accuracy dilemma of recommender systems. PNAS, 107(10):4511--4515, 2010.Google ScholarCross Ref

Recommendations

Finding more trustworthy answers: Various trustworthiness factors in question answering

In the recent explosion of Web information, it is important to find not only appropriate, but also more trustworthy answers to user questions. This paper proposes an improved ranking model for question answering (QA) which is focused on various answer ...
Read More
Finding Answers in Web Search
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

There are many informational queries that could be answered with a text passage, thereby not requiring the searcher to access the full web document. When building manual annotations of answer passages for TREC queries, Keikha et al. [6] confirmed that ...
Read More
Finding and approximating top-k answers in keyword proximity search
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Various approaches for keyword proximity search have been implemented in relational databases, XML and the Web. Yet, in all of them, an answer is a Q-fragment, namely, a subtree T of the given data graph G, such that T contains all the keywords of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 10, Issue 7
March 2017
132 pages
ISSN:2150-8097
Editors:
Peter Boncz
CWI
,
Ken Salem
University of Waterloo
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 March 2017
Published in pvldb Volume 10, Issue 7
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 50
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Finding diverse, high-value representatives on a surface of answers

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Finding more trustworthy answers: Various trustworthiness factors in question answering

Finding Answers in Web Search

Finding and approximating top-k answers in keyword proximity search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Finding diverse, high-value representatives on a surface of answers

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Finding more trustworthy answers: Various trustworthiness factors in question answering

Finding Answers in Web Search

Finding and approximating top-k answers in keyword proximity search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media