research-article

Diverse and Proportional Size-l Object Summaries for Keyword Search

Authors:
Georgios Fakas

HKUST, Hong Kong, Hong Kong

HKUST, Hong Kong, Hong Kong
View Profile

,
Zhi Cai

Beijing University of Technology, Beijing, China

Beijing University of Technology, Beijing, China
View Profile

,
Nikos Mamoulis

The University of Hong Kong, Hong Kong, Hong Kong

The University of Hong Kong, Hong Kong, Hong Kong
View Profile

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataMay 2015Pages 363–375https://doi.org/10.1145/2723372.2737783

Published:27 May 2015Publication History

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Pages 363–375

ABSTRACT

The abundance and ubiquity of graphs (e.g., Online Social Networks such as Google+ and Facebook; bibliographic graphs such as DBLP) necessitates the effective and efficient search over them. Given a set of keywords that can identify a Data Subject (DS), a recently proposed relational keyword search paradigm produces, as a query result, a set of Object Summaries (OSs). An OS is a tree structure rooted at the DS node (i.e., a tuple containing the keywords) with surrounding nodes that summarize all data held on the graph about the DS. OS snippets, denoted as size-l OSs, have also been investigated. Size-l OSs are partial OSs containing l nodes such that the summation of their importance scores results in the maximum possible total score. However, the set of nodes that maximize the total importance score may result in an uninformative size-l OSs, as very important nodes may be repeated in it, dominating other representative information. In view of this limitation, in this paper we investigate the effective and efficient generation of two novel types of OS snippets, i.e. diverse and proportional size-l OSs, denoted as DSize-l and PSize-l OSs. Namely, apart from the importance of each node, we also consider its frequency in the OS and its repetitions in the snippets. We conduct an extensive evaluation on two real graphs (DBLP and Google+). We verify effectiveness by collecting user feedback, e.g. by asking DBLP authors (i.e. the DSs themselves) to evaluate our results. In addition, we verify the efficiency of our algorithms and evaluate the quality of the snippets that they produce.

References

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
A. Angel and N. Koudas. Efficient diversity-aware search. In SIGMOD, pages 781--792, 2011. Google ScholarDigital Library
A. Balmin, V. Hristidis, and Y. Papakonstantinou. Objectrank: Authority-based keyword search in databases. In VLDB, pages 564--575, 2004. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW Conference, pages 107--117, 1998. Google ScholarDigital Library
J. G. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarDigital Library
G. Cheng, T. Tran, and Y. Qu. Relin: relatedness and informativeness-based centrality for entity summarization. In The Semantic Web-ISWC 2011, pages 114--129, 2011. Google ScholarDigital Library
S. Cheng, A. Arvanitis, M. Chrobak, and V. Hristidis. Multi-query diversification in microblogging posts. In EDBT, 2014.Google Scholar
V. Dang and W. Croft. Diversity by proportionality: an election-based approach to search result diversification. In SIGIR, 2012. Google ScholarDigital Library
M. Drosou and E. Pitoura. Disc diversity: result diversification based on dissimilarity and coverage. PVLDB, 6(1):13--24, 2012. Google ScholarDigital Library
M. Drosou and E. Pitoura. The disc diversity model. In EDBT/ICDT Workshops, pages 173--175, 2014.Google Scholar
G. J. Fakas. Automated generation of object summaries from relational databases: A novel keyword searching paradigm. In DBRank'08, ICDE, pages 564--567, 2008. Google ScholarDigital Library
G. J. Fakas. A novel keyword search paradigm in relational databases: Object summaries. Data Knowl. Eng., 70(2):208--229, 2011. Google ScholarDigital Library
G. J. Fakas and Z. Cai. Ranking of object summaries. In DBRank'09, ICDE, pages 1580--1583, 2009. Google ScholarDigital Library
G. J. Fakas, Z. Cai, and N. Mamoulis. Size-l object summaries for relational keyword search. PVLDB, 5(3):229--240, 2011. Google ScholarDigital Library
G. J. Fakas, Z. Cai, and N. Mamoulis. Versatile size-l object summaries for relational keyword search. TKDE, 26(4):1026--1038, 2014. Google ScholarDigital Library
G. J. Fakas, B. Cawley, and Z. Cai. Automated generation of personal data reports from relational databases. JIKM, 10(2):193--208, 2011.Google Scholar
S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google ScholarDigital Library
V. Hristidis, L. Gravano, and Y. Papakonstantinou. Efficient ir-style keyword search over relational databases. In VLDB, pages 850--861, 2003. Google ScholarDigital Library
V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, pages 670--681, 2002. Google ScholarDigital Library
Y. Huang, Z. Liu, and Y. Chen. Query biased snippet generation intextscXML search. In SIGMOD, pages 315--326, 2008. Google ScholarDigital Library
A. Kashyap and V. Hristidis. Logrank: Summarizing social activity logs. In WebDB, pages 1--6, 2012.Google Scholar
G. Koutrika, A. Simitsis, and Y. Ioannidis. Précis: The essence of a query answer. In ICDE, pages 69--79, 2006. Google ScholarDigital Library
Y. Luo, X. Lin, W. Wang, and X. Zhou.textscSPARK: Top-k keyword query in relational databases. In SIGMOD, pages 115--126, 2007. Google ScholarDigital Library
A. Simitsis, G. Koutrika, and Y. Ioannidis. Précis: From unstructured keywords as queries to structured databases as answers. The VLDB Journal, 17(1):117--149, 2008. Google ScholarDigital Library
M. Sydow, M. Pikula, and R. Schenkel. The notion of diversity in graphical entity summarisation on semantic knowledge graphs. Journal of Intelligent Information Systems, 2013. Google ScholarDigital Library
A. Tombros and M. Sanderson. Advantages of query biased summaries in information retrieval. In SIGIR, pages 2--10, 1998. Google ScholarDigital Library
A. Turpin, Y. Tsegay, D. Hawking, and H. E. Williams. Fast generation of result snippets in web search. In SIGIR, pages 127--134, 2007. Google ScholarDigital Library
R. Varadarajan, V. Hristidis, and L. Raschid. Explaining and reformulating authority flow queries. In ICDE, pages 883--892, 2008. Google ScholarDigital Library
H. L. Vieira, M. R. amd Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, A. J. M. Traina, and V. J. Tsotras. On query result diversification. In ICDE, pages 1163--1174, 2011. Google ScholarDigital Library
L. Wu, Y. Wang, J. Shepherd, and X. Zhao. An optimization method for proportionally diversifying search results. Advances in Knowledge Discovery and Data Mining, 70(2):390--401, 2013.Google ScholarCross Ref

Index Terms

Diverse and Proportional Size-l Object Summaries for Keyword Search
1. General and reference
  1. Document types
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Diverse and proportional size-l object summaries using pairwise relevance

The abundance and ubiquity of graphs (e.g., online social networks such as Google$$+$$+ and Facebook; bibliographic graphs such as DBLP) necessitates the effective and efficient search over them. Given a set of keywords that can identify a data subject (...
Read More
Top-k-size keyword search on tree structured data

Keyword search is the most popular technique for querying large tree-structured datasets, often of unknown structure, in the web. Recent keyword search approaches return lowest common ancestors (LCAs) of the keyword matches ranked with respect to their ...
Read More
Towards an Effective XML Keyword Search

Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: 1) Identify the user search ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
General Chair:
Timos Sellis
RMIT University, Australia
,
Program Chairs:
Susan B. Davidson
University of Pennsylvania, USA
,
Zack Ives
University of Pennsylvania, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
diversity
keyword search
proportionality
ranking
summaries
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '15 Paper Acceptance Rate106of415submissions,26%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 412
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Diverse and Proportional Size-l Object Summaries for Keyword Search

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Diverse and proportional size-l object summaries using pairwise relevance

Top-k-size keyword search on tree structured data

Towards an Effective XML Keyword Search