skip to main content
10.1145/2658840.2658842acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesdata4uConference Proceedingsconference-collections
research-article

A Paradigm for Learning Queries on Big Data

Published:01 September 2014Publication History

ABSTRACT

Specifying a database query using a formal query language is typically a challenging task for non-expert users. In the context of big data, this problem becomes even harder as it requires the users to deal with database instances of big sizes and hence difficult to visualize. Such instances usually lack a schema to help the users specify their queries, or have an incomplete schema as they come from disparate data sources. In this paper, we propose a novel paradigm for interactive learning of queries on big data, without assuming any knowledge of the database schema. The paradigm can be applied to different database models and a class of queries adequate to the database model. In particular, in this paper we present two instantiations that validated the proposed paradigm for learning relational join queries and for learning path queries on graph databases. Finally, we discuss the challenges of employing the paradigm for further data models and for learning cross-model schema mappings.

References

  1. A. Abouzied, D. Angluin, C. H. Papadimitriou, J. M. Hellerstein, and A. Silberschatz. Learning and verifying quantified boolean queries by example. In PODS, pages 49--60, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Abouzied, J. M. Hellerstein, and A. Silberschatz. Playful query specification with DataPlay. PVLDB, 5(12):1938--1941, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. Designing and refining schema mappings via data examples. In SIGMOD Conference, pages 133--144, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. EIRENE: Interactive design and refinement of schema mappings via data examples. PVLDB, 4(12):1414--1417, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Amarilli, Y. Amsterdamer, and T. Milo. On the complexity of mining itemsets from the crowd using taxonomies. In ICDT, pages 15--25, 2014.Google ScholarGoogle Scholar
  6. D. Angluin. Queries and concept learning. Machine Learning, 2(4):319--342, 1988. Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Barceló. Querying graph databases. In PODS, pages 175--188, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. J. Bex, W. Gelade, F. Neven, and S. Vansummeren. Learning deterministic regular expressions for the inference of schemas from XML data. TWEB, 4(4), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. Boneva, R. Ciucanu, and S. Staworko. Simple schemas for unordered XML. In WebDB, pages 13--18, 2013.Google ScholarGoogle Scholar
  10. A. Bonifati, R. Ciucanu, and A. Lemay. Learning path queries on graph databases, 2014. Under submission.Google ScholarGoogle Scholar
  11. A. Bonifati, R. Ciucanu, and S. Staworko. Interactive inference of join queries. In EDBT, pages 451--462, 2014.Google ScholarGoogle Scholar
  12. A. Bonifati, R. Ciucanu, and S. Staworko. Interactive join query inference with JIM. PVLDB, 7(13), 2014.Google ScholarGoogle Scholar
  13. J. Carme, R. Gilleron, A. Lemay, and J. Niehren. Interactive learning of node selecting tree transducer. Machine Learning, 66(1):33--67, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Cattell. Scalable SQL and NoSQL data stores. SIGMOD Record, 39(4):12--27, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Ciucanu. Learning queries for relational, semi-structured, and graph databases. In SIGMOD/PODS Ph.D. Symposium, pages 19--24, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Ciucanu and S. Staworko. Learning schemas for unordered XML. In DBPL, 2013.Google ScholarGoogle Scholar
  17. S. Cohen and Y. Weiss. Certain and possible XPath answers. In ICDT, pages 237--248, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Das Sarma, A. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT, pages 89--103, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. de la Higuera. Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. L. Dong and D. Srivastava. Big data integration. PVLDB, 6(11):1188--1189, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. M. Gold. Complexity of automaton identification from given data. Information and Control, 37(3):302--320, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  22. T. Imielinski and W. Lipski Jr. Incomplete information in relational databases. J. ACM, 31(4):761--791, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Laurence, A. Lemay, J. Niehren, S. Staworko, and M. Tommasi. Learning sequential tree-to-word transducers. In LATA, pages 490--502, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Lemay, S. Maneth, and J. Niehren. A learning algorithm for top-down XML transformations. In PODS, pages 285--296, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Lemay, J. Niehren, and R. Gilleron. Learning n-ary node selecting tree transducers from completely annotated examples. In ICGI, pages 253--267, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD, pages 631--636, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Human-powered sorts and joins. PVLDB, 5(1):13--24, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Qian, M. J. Cafarella, and H. V. Jagadish. Sample-driven schema mapping. In SIGMOD Conference, pages 73--84, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Sellam and M. L. Kersten. Meet Charles, big data query advisor. In CIDR, 2013.Google ScholarGoogle Scholar
  30. B. Settles. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Staworko and P. Wieczorek. Learning twig and path queries. In ICDT, pages 140--154, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. P. Talukdar, M. Jacob, M. S. Mehmood, K. Crammer, Z. G. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. PVLDB, 1(1):785--796, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. B. ten Cate, V. Dalmau, and P. G. Kolaitis. Learning schema mappings. ACM Trans. Database Syst., 38(4):28, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. In SIGMOD Conference, pages 535--548, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD Conference, pages 229--240, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. T. Wood. Query languages for graph databases. SIGMOD Record, 41(1):50--60, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Yakout, A. K. Elmagarmid, J. Neville, M. Ouzzani, and I. F. Ilyas. Guided data repair. PVLDB, 4(5):279--289, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Z. Yan, N. Zheng, Z. G. Ives, P. P. Talukdar, and C. Yu. Actively soliciting feedback for query answers in keyword search-based data integration. PVLDB, 6(3):205--216, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Zhang, H. Elmeleegy, C. M. Procopiuc, and D. Srivastava. Reverse engineering complex join queries. In SIGMOD Conference, pages 809--820, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. M. Zloof. Query by example. In AFIPS National Computer Conference, pages 431--438, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Paradigm for Learning Queries on Big Data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        Data4U '14: Proceedings of the First International Workshop on Bringing the Value of "Big Data" to Users (Data4U 2014)
        September 2014
        40 pages
        ISBN:9781450331869
        DOI:10.1145/2658840

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 September 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Data4U '14 Paper Acceptance Rate6of6submissions,100%Overall Acceptance Rate6of6submissions,100%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader