skip to main content
10.1145/1559845.1559966acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial

Keyword search on structured and semi-structured data

Authors Info & Claims
Published:29 June 2009Publication History

ABSTRACT

Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. We also discuss applications that are built upon keyword search, such as keyword based database selection, query generation, and analytical processing. Finally we identify the challenges and opportunities of future research to advance the field.

References

  1. S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. In ICDE, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Amer-Yahia and J. Shanmugasundaram. XML full-text search: Challenges and opportunities. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Bao, T.W. Ling, B. Chen, and J. Lu. Effective XML Keyword Search with Relevance Oriented Ranking. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Bhalotia, C. Nakhe, A. Hulgeri, S. Chakrabarti, and S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS. In ICDE, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B.B. Dalvi, M. Kshirsagar, and S. Sudarshan. Keyword search on external memory data graphs. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. De Felipe, V. Hristidis, and N. Rishe. Keyword search on spatial databases. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Ding, J.X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  9. R. Goldman, N. Shivakumar, S. Venkatasubramanian, and H. Garcia-Molina. Proximity search in databases. In VLDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A.Y. Halevy, M.J. Franklin, and D. Maier. Principles of dataspace systems. In PODS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. He, H. Wang, J. Yang, and P.S. Yu. Blinks: ranked keyword searches on graphs. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. V. Hristidis, N. Koudas, Y. Papakonstantinou, and D. Srivastava. Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 18(4), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Hristidis, Y. Papakonstantinou, and A. Balmin. Keyword proximity search on xml graphs. In ICDE, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  17. Y. Huang, Z. Liu, and Y. Chen. eXtract: a Snippet Generation System for XML Search. PVLDB, 1(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. Huang, Z. Liu, and Y. Chen. Query biased snippet generation in XML search. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. INEX. Initiative for the evaluation of xml retrieval. http://inex.is.informatik.uni-duisburg.de/.Google ScholarGoogle Scholar
  20. H.V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. Making database systems usable. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. KEYS 2009. The first international workshop on keyword search on structured data, 2009.Google ScholarGoogle Scholar
  23. B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Kong, R. Gilleron, and A. Lema. Retrieving Meaningful Relaxed Tightest Fragments for XML Keyword Search. In EDBT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Koutrika, A. Simitsis, and Y.E. Ioannidis. Precis: The essence of a query answer. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Koutrika, Z.M. Zadeh, and H. Garcia-Molina. DataClouds: Summarizing Keyword Search Results over Structured Data. In EDBT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Li, J. Feng, J. Wang, and L. Zhou. An effective and versatile keyword search engine on heterogeneous data sources. PVLDB, 1(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Li, B.C. Ooi, J. Feng, J. Wang, and L. Zhou. EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. Li, X. Zhou, J. Feng, and J. Wang. Progressive Top-k Keyword Search in Relational Database. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. W.-S. Li, K.S. Candan, Q. Vu, and D. Agrawal. "Retrieving and organizing web pages by information unit". In WWW, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Li, I. Chaudhuri, H. Yang, S. Singh, and H.V. Jagadish. Danalix: a domain-adaptive natural language interface for querying xml. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Li, C. Yu, and H.V. Jagadish. Schema-free XQuery. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. F. Liu, C. Yu, W. Meng, and A. Chowdhury. Effective keyword search in relational databases. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Z. Liu and Y. Chen. Identifying meaningful return information for xml keyword search. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Z. Liu and Y. Chen. Answering keyword queries on XML using materialized views. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Z. Liu and Y. Chen. Reasoning and identifying relevant matches for xml keyword search. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Z. Liu, J. Walker, and Y. Chen. XSeek: A semantic XML search engine using keywords. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Y. Luo, X. Lin, W. Wang, and X. Zhou. SPARK: Top-k keyword query in relational databases. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Luo, W. Wang, and X. Lin. Spark: A keyword search engine on relational databases. In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. Markowetz, Y. Yang, and D. Papadias. Reachability Indexes for Relational Keyword Search. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K.Q. Pu and X. Yu. Keyword query cleaning. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Sayyadian, H. LeKhac, A. Doan, and L. Gravano. Efficient keyword search across heterogeneous relational databases. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  43. F. Shao, L. Guo, and C. Botev. Efficient Keyword Search over Virtual XML Views. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Q. Shao, P. Sun, and Y. Chen. WISE: a workflow information search engine. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Q. Su and J. Widom. Indexing relational database content online for efficient keyword-based search. In IDEAS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. Sun, C.-Y. Chan, and A. Goenka. Multiway SLCA-based keyword search in XML data. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Databases and IR: Perspectives of a SQL guy. NSF Information and Data Management PI Workshop, 2003.Google ScholarGoogle Scholar
  48. P.P. Talukdar, M. Jacob, M.S. Mehmood, K. Crammer, Z.G. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Y. Tao and J.X. Yu. Finding Frequent Co-occurring Terms in Relational Keyword Search. In EDBT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. Tata and G.M. Lohman. SQAK: doing more with keywords. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. T. Tran, S. Rudolph, P. Cimiano, and H. Wang. Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Q.H. Vu, B.C. Ooi, D. Papadias, and A.K.H. Tung. A graph method for keyword-based selection of the top-k databases. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. S. Wang, Z. Peng, J. Zhang, L. Qin, S. Wang, J.X. Yu, and B. Ding. NUITS: A novel user interface for efficient keyword search over databases. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. G. Weikum. DB&IR: both sides now. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. P. Wu, Y. Sismanis, and B. Reinwald. Towards keyword-driven analytical processing. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Y. Xu and Y. Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Y. Xu and Y. Papakonstantinou. Efficient LCA based Keyword Search in XML Data. In EDBT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. B. Yu, G. Li, K.R. Sollins, and A.K.H. Tung. Effective keyword-based selection of relational databases. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. D. Zhang, Y.M. Chee, A. Mondal, A. Tung, and M. Kitsuregawa. Keyword Search in Spatial Databases: Towards Searching by Document. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. B. Zhou and J. Pei. Answering Aggregate Keyword Queries on Relational Databases Using Minimal Group-bys. In EDBT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Keyword search on structured and semi-structured data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
        June 2009
        1168 pages
        ISBN:9781605585512
        DOI:10.1145/1559845

        Copyright © 2009 Copyright is held by the owner/author(s)

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 June 2009

        Check for updates

        Qualifiers

        • tutorial

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader