ABSTRACT
Modern data analysis is increasingly employing data-intensive flows for processing very large volumes of data. As the data flows become more and more complex and operate in a highly dynamic environment, we argue that we need to resort to automated cost-based optimization solutions rather than relying on efficient designs by human experts. We further demonstrate that the current state-of-the-art in flow optimizations needs to be extended and we propose a promising direction for optimizing flows at the logical level, and more specifically, for deciding the sequence of flow tasks.
- D. Abadi et al. The beckman database research self-assessment meeting. Technical report, 2013.Google Scholar
- S. Abrishami, M. Naghibzadeh, and D. H. Epema. Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Future Generation Computer Systems, 29(1):158 -- 169, 2013. Google ScholarDigital Library
- J. Burge, K. Munagala, and U. Srivastava. Ordering pipelined query operators with precedence constraints. Technical Report 2005-40, Stanford InfoLab, 2005.Google Scholar
- S. Chaudhuri, U. Dayal, and V. Narasayya. An overview of business intelligence technology. Commun. ACM, 54:88--98, 2011. Google ScholarDigital Library
- R. Dewan, A. Seidmann, and Z. Walter. Workflow optimization through task redesign in business information processes. In HICSS, pages 240--252. IEEE Computer Society, 1998. Google ScholarDigital Library
- R. Halasipuram, P. M. Deshpande, and S. Padmanabhan. Determining essential statistics for cost based optimization of an etl workflow. In EDBT, pages 307--318, 2014.Google Scholar
- S. Holl, O. Zimmermann, M. Palmblad, Y. Mohammed, and M. Hofmann-Apitius. A new optimization phase for scientific workflow management systems. Future Generation Comp. Syst., 36:352--362, 2014.Google ScholarCross Ref
- F. Hueske, M. Peters, M. Sax, A. Rheinlander, R. Bergmann, A. Krettek, and K. Tzoumas. Opening the black boxes in data flow optimization. PVLDB, 5(11):1256--1267, 2012. Google ScholarDigital Library
- G. Kougka and A. Gounaris. On optimizing work ows using query processing techniques. In SSDBM, pages 601--606, 2012. Google ScholarDigital Library
- G. Kougka and A. Gounaris. Declarative expression and optimization of data-intensive flows. In DaWaK, pages 13--25, 2013.Google ScholarDigital Library
- R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of nonrecursive queries. In VLDB, pages 128--137, 1986. Google ScholarDigital Library
- N. Kumar and P. S. Kumar. An efficient heuristic for logical optimization of etl workflows. In BIRTE, volume 84 of Lecture Notes in Business Information Processing, pages 68--83. Springer, 2010.Google Scholar
- E. S. Ogasawara, D. de Oliveira, P. Valduriez, J. Dias, F. Porto, and M. Mattoso. An algebraic approach for data-centric scientific workflows. PVLDB, 4:1328--1339, 2011.Google ScholarDigital Library
- A. Simitsis, P. Vassiliadis, and T. K. Sellis. State-space optimization of etl workflows. IEEE Trans. Knowl. Data Eng., 17(10):1404--1419, 2005. Google ScholarDigital Library
- Y. L. Varol and D. Rotem. An algorithm to generate all topological sorting arrangements. The Computer Journal, 24(1):83--84, 1981.Google ScholarCross Ref
- M. Vrhovnik, H. Schwarz, O. Suhre, B. Mitschang, V. Markl, A. Maier, and T. Kraft. An approach to optimize data processing in business processes. In VLDB, pages 615--626, 2007. Google ScholarDigital Library
- Z. Xiao, H. Chang, and Y. Yi. Optimization of workflow resources allocation with cost constraint. In Proc. of the 10th Int. Conf. on Computer supported cooperative work in design, pages 647--656, 2007. Google ScholarDigital Library
Index Terms
- Optimization of Data-intensive Flows: Is it Needed? Is it Solved?
Recommendations
Declarative Expression and Optimization of Data-Intensive Flows
DaWaK 2013: Proceedings of the 15th International Conference on Data Warehousing and Knowledge Discovery - Volume 8057Data-intensive analytic flows, such as populating a datawarehouse or analyzing a click stream at runtime, are very common in modern business intelligence scenarios. Current state-of-the-art data flow management techniques rely on the users to specify ...
Optimization of analytic data flows for next generation business intelligence applications
TPCTC'11: Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and CharacterizationThis paper addresses the challenge of optimizing analytic data flows for modern business intelligence (BI) applications. We first describe the changing nature of BI in today's enterprises as it has evolved from batch-based processes, in which the back-...
Comments