ABSTRACT
Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. We also discuss applications that are built upon keyword search, such as keyword based database selection, query generation, and analytical processing. Finally we identify the challenges and opportunities of future research to advance the field.
- S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, 2002.Google ScholarDigital Library
- S. Amer-Yahia and J. Shanmugasundaram. XML full-text search: Challenges and opportunities. In VLDB, 2005. Google ScholarDigital Library
- Z. Bao, T.W. Ling, B. Chen, and J. Lu. Effective XML Keyword Search with Relevance Oriented Ranking. In ICDE, 2009. Google ScholarDigital Library
- G. Bhalotia, C. Nakhe, A. Hulgeri, S. Chakrabarti, and S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS. In ICDE, 2002.Google ScholarDigital Library
- S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. In VLDB, 2003. Google ScholarDigital Library
- B.B. Dalvi, M. Kshirsagar, and S. Sudarshan. Keyword search on external memory data graphs. PVLDB, 1(1), 2008. Google ScholarDigital Library
- I. De Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial databases. In ICDE, 2008. Google ScholarDigital Library
- B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In ICDE, 2007.Google ScholarCross Ref
- R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity search in databases. In VLDB, 1998. Google ScholarDigital Library
- K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In SIGMOD, 2008. Google ScholarDigital Library
- L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In SIGMOD, 2003. Google ScholarDigital Library
- A.Y. Halevy, M.J. Franklin, and D. Maier. Principles of dataspace systems. In PODS, 2006. Google ScholarDigital Library
- H. He, H. Wang, J. Yang, and P.S. Yu. Blinks: ranked keyword searches on graphs. In SIGMOD, 2007. Google ScholarDigital Library
- V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava. Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 18(4), 2006. Google ScholarDigital Library
- V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, 2002. Google ScholarDigital Library
- V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on xml graphs. In ICDE, 2003.Google ScholarCross Ref
- Y. Huang, Z. Liu, and Y. Chen. eXtract: a Snippet Generation System for XML Search. PVLDB, 1(2), 2008. Google ScholarDigital Library
- Y. Huang, Z. Liu, and Y. Chen. Query biased snippet generation in XML search. In SIGMOD, 2008. Google ScholarDigital Library
- INEX. Initiative for the evaluation of xml retrieval. http://inex.is.informatik.uni-duisburg.de/.Google Scholar
- H.V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. Making database systems usable. In SIGMOD, 2007. Google ScholarDigital Library
- V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, 2005. Google ScholarDigital Library
- KEYS 2009. The first international workshop on keyword search on structured data, 2009.Google Scholar
- B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, 2006. Google ScholarDigital Library
- L. Kong, R. Gilleron, and A. Lema. Retrieving Meaningful Relaxed Tightest Fragments for XML Keyword Search. In EDBT, 2009. Google ScholarDigital Library
- G. Koutrika, A. Simitsis, and Y.E. Ioannidis. Precis: The essence of a query answer. In ICDE, 2006. Google ScholarDigital Library
- G. Koutrika, Z.M. Zadeh, and H. Garcia-Molina. DataClouds: Summarizing Keyword Search Results over Structured Data. In EDBT, 2009. Google ScholarDigital Library
- G. Li, J. Feng, J. Wang, and L. Zhou. An effective and versatile keyword search engine on heterogeneous data sources. PVLDB, 1(2), 2008. Google ScholarDigital Library
- G. Li, B.C. Ooi, J. Feng, J. Wang, and L. Zhou. EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In SIGMOD, 2008. Google ScholarDigital Library
- G. Li, X. Zhou, J. Feng, and J. Wang. Progressive Top-k Keyword Search in Relational Database. In ICDE, 2009. Google ScholarDigital Library
- W.-S. Li, K.S. Candan, Q. Vu, and D. Agrawal. "Retrieving and organizing web pages by information unit". In WWW, 2001. Google ScholarDigital Library
- Y. Li, I. Chaudhuri, H. Yang, S. Singh, and H.V. Jagadish. Danalix: a domain-adaptive natural language interface for querying xml. In SIGMOD, 2007. Google ScholarDigital Library
- Y. Li, C. Yu, and H.V. Jagadish. Schema-free XQuery. In VLDB, 2004. Google ScholarDigital Library
- F. Liu, C. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In SIGMOD, 2006. Google ScholarDigital Library
- Z. Liu and Y. Chen. Identifying meaningful return information for xml keyword search. In SIGMOD, 2007. Google ScholarDigital Library
- Z. Liu and Y. Chen. Answering keyword queries on XML using materialized views. In ICDE, 2008. Google ScholarDigital Library
- Z. Liu and Y. Chen. Reasoning and identifying relevant matches for xml keyword search. PVLDB, 1(1), 2008. Google ScholarDigital Library
- Z. Liu, J. Walker, and Y. Chen. XSeek: A semantic XML search engine using keywords. In VLDB, 2007. Google ScholarDigital Library
- Y. Luo, X. Lin, W. Wang, and X. Zhou. SPARK: Top-k keyword query in relational databases. In SIGMOD, 2007. Google ScholarDigital Library
- Y. Luo, W. Wang, and X. Lin. Spark: A keyword search engine on relational databases. In ICDE, 2008. Google ScholarDigital Library
- A. Markowetz, Y. Yang, and D. Papadias. Reachability Indexes for Relational Keyword Search. In ICDE, 2009. Google ScholarDigital Library
- K.Q. Pu and X. Yu. Keyword query cleaning. PVLDB, 1(1), 2008. Google ScholarDigital Library
- M. Sayyadian, H. LeKhac, A. Doan, and L. Gravano. Efficient keyword search across heterogeneous relational databases. In ICDE, 2007.Google ScholarCross Ref
- F. Shao, L. Guo, and C. Botev. Efficient Keyword Search over Virtual XML Views. In VLDB, 2007. Google ScholarDigital Library
- Q. Shao, P. Sun, and Y. Chen. WISE: a workflow information search engine. In ICDE, 2009. Google ScholarDigital Library
- Q. Su and J. Widom. Indexing relational database content online for efficient keyword-based search. In IDEAS, 2005. Google ScholarDigital Library
- C. Sun, C.-Y. Chan, and A. Goenka. Multiway SLCA-based keyword search in XML data. In WWW, 2007. Google ScholarDigital Library
- Databases and IR: Perspectives of a SQL guy. NSF Information and Data Management PI Workshop, 2003.Google Scholar
- P.P. Talukdar, M. Jacob, M.S. Mehmood, K. Crammer, Z.G. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. PVLDB, 1(1), 2008. Google ScholarDigital Library
- Y. Tao and J.X. Yu. Finding Frequent Co-occurring Terms in Relational Keyword Search. In EDBT, 2009. Google ScholarDigital Library
- S. Tata and G.M. Lohman. SQAK: doing more with keywords. In SIGMOD, 2008. Google ScholarDigital Library
- T. Tran, S. Rudolph, P. Cimiano, and H. Wang. Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data. In ICDE, 2009. Google ScholarDigital Library
- Q.H. Vu, B.C. Ooi, D. Papadias, and A.K.H. Tung. A graph method for keyword-based selection of the top-k databases. In SIGMOD, 2008. Google ScholarDigital Library
- S. Wang, Z. Peng, J. Zhang, L. Qin, S. Wang, J.X. Yu, and B. Ding. NUITS: A novel user interface for efficient keyword search over databases. In VLDB, 2006. Google ScholarDigital Library
- G. Weikum. DB&IR: both sides now. In SIGMOD, 2007. Google ScholarDigital Library
- P. Wu, Y. Sismanis, and B. Reinwald. Towards keyword-driven analytical processing. In SIGMOD, 2007. Google ScholarDigital Library
- Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. In SIGMOD, 2005. Google ScholarDigital Library
- Y. Xu and Y. Papakonstantinou. Efficient LCA based Keyword Search in XML Data. In EDBT, 2008. Google ScholarDigital Library
- B. Yu, G. Li, K.R. Sollins, and A.K.H. Tung. Effective keyword-based selection of relational databases. In SIGMOD, 2007. Google ScholarDigital Library
- D. Zhang, Y.M. Chee, A. Mondal, A. Tung, and M. Kitsuregawa. Keyword Search in Spatial Databases: Towards Searching by Document. In ICDE, 2009. Google ScholarDigital Library
- B. Zhou and J. Pei. Answering Aggregate Keyword Queries on Relational Databases Using Minimal Group-bys. In EDBT, 2009. Google ScholarDigital Library
Index Terms
- Keyword search on structured and semi-structured data
Recommendations
Automatically generating structured queries in XML keyword search
INEX'10: Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrievalIn this paper, we present a novel method for automatically deriving structured XML queries from keyword-based queries and show how it was applied to the experimental tasks proposed for the INEX 2010 data-centric track. In our method, called StruX, users ...
Efficient Top-k Keyword Search on XML Streams
ICYCS '08: Proceedings of the 2008 The 9th International Conference for Young Computer ScientistsKeywords can be used to query XML data without schema information. In this paper, a novel kind of query is proposed, top-k keyword search over XML streams. According to the set of keywords and the number of results, such query can retrieve the top-k XML ...
Return specification inference and result clustering for keyword search on XML
Keyword search enables Web users to easily access XML data without the need to learn a structured query language and to study possibly complex data schemas. Existing work has addressed the problem of selecting qualified data nodes that match keywords ...
Comments