ABSTRACT
Specifying a database query using a formal query language is typically a challenging task for non-expert users. In the context of big data, this problem becomes even harder as it requires the users to deal with database instances of big sizes and hence difficult to visualize. Such instances usually lack a schema to help the users specify their queries, or have an incomplete schema as they come from disparate data sources. In this paper, we propose a novel paradigm for interactive learning of queries on big data, without assuming any knowledge of the database schema. The paradigm can be applied to different database models and a class of queries adequate to the database model. In particular, in this paper we present two instantiations that validated the proposed paradigm for learning relational join queries and for learning path queries on graph databases. Finally, we discuss the challenges of employing the paradigm for further data models and for learning cross-model schema mappings.
- A. Abouzied, D. Angluin, C. H. Papadimitriou, J. M. Hellerstein, and A. Silberschatz. Learning and verifying quantified boolean queries by example. In PODS, pages 49--60, 2013. Google ScholarDigital Library
- A. Abouzied, J. M. Hellerstein, and A. Silberschatz. Playful query specification with DataPlay. PVLDB, 5(12):1938--1941, 2012. Google ScholarDigital Library
- B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. Designing and refining schema mappings via data examples. In SIGMOD Conference, pages 133--144, 2011. Google ScholarDigital Library
- B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. EIRENE: Interactive design and refinement of schema mappings via data examples. PVLDB, 4(12):1414--1417, 2011.Google ScholarDigital Library
- A. Amarilli, Y. Amsterdamer, and T. Milo. On the complexity of mining itemsets from the crowd using taxonomies. In ICDT, pages 15--25, 2014.Google Scholar
- D. Angluin. Queries and concept learning. Machine Learning, 2(4):319--342, 1988. Google ScholarCross Ref
- P. Barceló. Querying graph databases. In PODS, pages 175--188, 2013. Google ScholarDigital Library
- G. J. Bex, W. Gelade, F. Neven, and S. Vansummeren. Learning deterministic regular expressions for the inference of schemas from XML data. TWEB, 4(4), 2010. Google ScholarDigital Library
- I. Boneva, R. Ciucanu, and S. Staworko. Simple schemas for unordered XML. In WebDB, pages 13--18, 2013.Google Scholar
- A. Bonifati, R. Ciucanu, and A. Lemay. Learning path queries on graph databases, 2014. Under submission.Google Scholar
- A. Bonifati, R. Ciucanu, and S. Staworko. Interactive inference of join queries. In EDBT, pages 451--462, 2014.Google Scholar
- A. Bonifati, R. Ciucanu, and S. Staworko. Interactive join query inference with JIM. PVLDB, 7(13), 2014.Google Scholar
- J. Carme, R. Gilleron, A. Lemay, and J. Niehren. Interactive learning of node selecting tree transducer. Machine Learning, 66(1):33--67, 2007. Google ScholarDigital Library
- R. Cattell. Scalable SQL and NoSQL data stores. SIGMOD Record, 39(4):12--27, 2010. Google ScholarDigital Library
- R. Ciucanu. Learning queries for relational, semi-structured, and graph databases. In SIGMOD/PODS Ph.D. Symposium, pages 19--24, 2013. Google ScholarDigital Library
- R. Ciucanu and S. Staworko. Learning schemas for unordered XML. In DBPL, 2013.Google Scholar
- S. Cohen and Y. Weiss. Certain and possible XPath answers. In ICDT, pages 237--248, 2013. Google ScholarDigital Library
- A. Das Sarma, A. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT, pages 89--103, 2010. Google ScholarDigital Library
- C. de la Higuera. Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, 2010. Google ScholarDigital Library
- X. L. Dong and D. Srivastava. Big data integration. PVLDB, 6(11):1188--1189, 2013. Google ScholarDigital Library
- E. M. Gold. Complexity of automaton identification from given data. Information and Control, 37(3):302--320, 1978.Google ScholarCross Ref
- T. Imielinski and W. Lipski Jr. Incomplete information in relational databases. J. ACM, 31(4):761--791, 1984. Google ScholarDigital Library
- G. Laurence, A. Lemay, J. Niehren, S. Staworko, and M. Tommasi. Learning sequential tree-to-word transducers. In LATA, pages 490--502, 2014.Google ScholarDigital Library
- A. Lemay, S. Maneth, and J. Niehren. A learning algorithm for top-down XML transformations. In PODS, pages 285--296, 2010. Google ScholarDigital Library
- A. Lemay, J. Niehren, and R. Gilleron. Learning n-ary node selecting tree transducers from completely annotated examples. In ICGI, pages 253--267, 2006. Google ScholarDigital Library
- J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD, pages 631--636, 2006. Google ScholarDigital Library
- A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Human-powered sorts and joins. PVLDB, 5(1):13--24, 2011. Google ScholarDigital Library
- L. Qian, M. J. Cafarella, and H. V. Jagadish. Sample-driven schema mapping. In SIGMOD Conference, pages 73--84, 2012. Google ScholarDigital Library
- T. Sellam and M. L. Kersten. Meet Charles, big data query advisor. In CIDR, 2013.Google Scholar
- B. Settles. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2012.Google ScholarDigital Library
- S. Staworko and P. Wieczorek. Learning twig and path queries. In ICDT, pages 140--154, 2012. Google ScholarDigital Library
- P. P. Talukdar, M. Jacob, M. S. Mehmood, K. Crammer, Z. G. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. PVLDB, 1(1):785--796, 2008. Google ScholarDigital Library
- B. ten Cate, V. Dalmau, and P. G. Kolaitis. Learning schema mappings. ACM Trans. Database Syst., 38(4):28, 2013. Google ScholarDigital Library
- Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. In SIGMOD Conference, pages 535--548, 2009. Google ScholarDigital Library
- J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD Conference, pages 229--240, 2013. Google ScholarDigital Library
- P. T. Wood. Query languages for graph databases. SIGMOD Record, 41(1):50--60, 2012. Google ScholarDigital Library
- M. Yakout, A. K. Elmagarmid, J. Neville, M. Ouzzani, and I. F. Ilyas. Guided data repair. PVLDB, 4(5):279--289, 2011. Google ScholarDigital Library
- Z. Yan, N. Zheng, Z. G. Ives, P. P. Talukdar, and C. Yu. Actively soliciting feedback for query answers in keyword search-based data integration. PVLDB, 6(3):205--216, 2013. Google ScholarDigital Library
- M. Zhang, H. Elmeleegy, C. M. Procopiuc, and D. Srivastava. Reverse engineering complex join queries. In SIGMOD Conference, pages 809--820, 2013. Google ScholarDigital Library
- M. M. Zloof. Query by example. In AFIPS National Computer Conference, pages 431--438, 1975. Google ScholarDigital Library
Index Terms
- A Paradigm for Learning Queries on Big Data
Recommendations
Learning queries for relational, semi-structured, and graph databases
SIGMOD'13 PhD Symposium: Proceedings of the 2013 SIGMOD/PODS Ph.D. symposiumWeb applications store their data within various database models, such as relational, semi-structured, and graph data models to name a few. We study learning algorithms for queries for the above mentioned models. As a further goal, we aim to apply the ...
Learning twig and path queries
ICDT '12: Proceedings of the 15th International Conference on Database TheoryWe investigate the problem of learning XML queries, path queries and twig queries, from examples given by the user. A learning algorithm takes on the input a set of XML documents with nodes annotated by the user and returns a query that selects the ...
A Brief Survey on Big Data in Healthcare
This article presents a brief introduction to big data and big data analytics and also their roles in the healthcare system. A definite range of scientific researches about big data analytics in the healthcare system have been reviewed. The definition ...
Comments