ABSTRACT
The purpose of a data warehouse is to aid decision making. As the real-time enterprise evolves, synchronism between transactional data and data warehouses is redefined. To cope with real-time requirements, the data warehouses must be able to enable continuous data integration, in order to deal with the most recent business data. Traditional data warehouses are unable to support any dynamics in structure and content while they are available for OLAP. Their data is periodically updated because they are unprepared for continuous data integration. For real-time enterprises with needs in decision support while the transactions are occurring, (near) real-time data warehousing seem very promising. In this paper we present a survey on testing today's most used loading techniques and analyze which are the best data loading methods, presenting a methodology for efficiently supporting continuous data integration for data warehouses. To accomplish this, we use techniques such as table structure replication with minimum content and query predicate restrictions for selecting data, to enable loading data in the data warehouse continuously, with minimum impact in query execution time. We demonstrate the efficiency of the method using benchmark TPC-H and executing query workloads while simultaneously performing continuous data integration.
- D. J. Abadi, D. Carney, et al.: "Aurora: A New Model and Architecture for Data Stream Management", The VLDB Journal, 12(2), pp. 120--139, 2003. Google ScholarDigital Library
- S. Babu, and J. Widom: "Continuous Queries Over Data Streams", SIGMOD Record 30(3), pp. 109--120. Google ScholarDigital Library
- T. Binder: Gong User Manual, Tecco Software AG, 2003.Google Scholar
- M. Bouzeghoub, F. Fabret, and M. Matulovic: "Modeling Data Warehouse Refreshment Process as a Workflow Application", Int. Workshop on Design and Management of DW (DMDW), 1999.Google Scholar
- R. M. Bruckner, B. List, and J. Schiefer: "Striving Towards Near Real-Time Data Integration for Data Warehouses", Int. Conf. Data Warehousing and Knowledge Discovery (DAWAK), 2002. Google ScholarDigital Library
- R. M. Bruckner, and A. M. Tjoa: "Capturing Delays and Valid Times in Data Warehouses -- Towards Timely Consistent Analyses". Journal of Intelligent Inf. Systems (JIIS), 19:2, pp. 169--190, 2002. Google ScholarDigital Library
- S. Chaudhuri, and U. Dayal: "An Overview of Data Warehousing and OLAP Technology", SIGMOD Record, Volume 26, Number 1, pp. 65--74, 1997. Google ScholarDigital Library
- W. H. Inmon, R. H. Terdeman, J. Norris-Montanari, and D. Meers: Data Warehousing for E-Business, J. Wiley&Sons, 2001. Google ScholarDigital Library
- C. Italiano, and J. E. Ferreira: "Synchronization Options for Data Warehouse Designs", IEEE Computer Magazine, 2006. Google ScholarDigital Library
- Karakasidis, P. Vassiliadis, and E. Pitoura: "ETL Queues for Active Data Warehousing", IQIS'05, 2005. Google ScholarDigital Library
- R. Kimball, L. Reeves, M. Ross, and W. Thornthwaite: The Data Warehouse Lifecycle Toolkit -- Expert Methods for Designing, Developing and Deploying Data Warehouses, Wiley Computer Pub, 1998. Google ScholarDigital Library
- R. Kimball, and J. Caserta: The Data Warehouse ETL Toolkit, Wiley Computer Publishing, 2004. Google ScholarDigital Library
- E. Kuhn: "The Zero-Delay Data Warehouse: Mobilizing Heterogeneous Databases", Int. Conf. on Very Large Data Bases (VLDB), 2003. Google ScholarDigital Library
- W. Labio, J. Yang, Y. Cui, H. Garcia-Molina, and J. Widom: "Performance Issues in Incremental Warehouse Maintenance", Int. Conf. on Very Large Data Bases (VLDB), 2000. Google ScholarDigital Library
- D. Lomet, and J. Gehrke: Special Issue on Data Stream Processing, IEEE Data Eng. Bulletin, 26(1), 2003.Google Scholar
- Oracle Corporation, 2005. www.oracle.comGoogle Scholar
- T. B. Pedersen: "How is BI Used in Industry?", Int. Conf. on Data Warehousing and Knowledge Discovery (DAWAK), 2004.Google ScholarCross Ref
- J. F. Roddick, and M. Schrefl: "Towards an Accommodation of Delay in Temporal Active Databases", 11<sup>th</sup> Australasian Database Conference (ADC), 2000. Google ScholarDigital Library
- Simitsis, P. Vassiliadis and T. Sellis: "Optimizing ETL Processes in Data Warehouses", Int. Conf. on Data Engineering (ICDE), 2005. Google ScholarDigital Library
- U. Srivastava, and J. Widom: "Flexible Time Management in Data Stream Systems", Int. Conf. on Principles of Database Systems (PODS), 2004. Google ScholarDigital Library
- D. Theodoratus, and M. Bouzeghoub: "Data Currency Quality Factors in Data Warehouse Design", Int. Workshop on Design and Management of Data Warehouses (DMDW), 1999.Google Scholar
- TPC-H decision support benchmark, Transaction Processing Council, www.tpc.com.Google Scholar
- P. Vassiliadis, Z. Vagena, S. Skiadopoulos, N. Karayannidis, and T. Sellis: "ARKTOS: Towards the Modelling, Design, Control and Execution of ETL Processes", Inf. Systems, Vol. 26(8), 2001. Google ScholarDigital Library
- White: "Intelligent Business Strategies: Real-Time Data Warehousing Heats Up", DM Preview, www.dmreview.com/article_sub_cfm?articleld=5570. 2002.Google Scholar
- J. Yang: "Temporal Data Warehousing", Ph.D. Thesis, Dp. Computer Science, Stanford Univ, 2001. Google ScholarDigital Library
- J. Yang, and J. Widom: "Incremental Computation and Maintenance of Temporal Aggregates", 17<sup>th</sup> Intern. Conference on Data Engineering (ICDE), 2001. Google ScholarDigital Library
- J. Yang, and J. Widom: "Temporal View Self-Maintenance", 7<sup>th</sup> Int. Conf. Extending Database Technology (EDBT), 2001. Google ScholarDigital Library
- T. Zurek, and K. Kreplin: "SAP Business Information Warehouse -- From Data Warehousing to an E-Business Platform", 17 Int. Conf. on Data Engineering (ICDE), 2001. Google ScholarDigital Library
- D. Burleson: Oracle data load (import, SQL*Loader) speed tips, Burleson Consulting, http://www.dbaoracle.com/oracle_tips_load_speed.htm. 2006.Google Scholar
- Oracle Corporation: http://www.oracle.com/technology/products/database/utilities/index. html. 2008.Google Scholar
- N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Çetintemel, M. Cherniack, R. Tibbetts, S. B. Zdonik: "Towards a Streaming SQL Standard", Int. Conf. Very Large Data Bases (VLDB), 1(2): 1379--1390, 2008. Google ScholarDigital Library
- N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, N. Frantzell: "Meshing Streaming Updates with Persistent Data in an Active Data Warehouse", IEEE Transactions on Knowl. Data Eng., 20(7): 976--991, 2008. Google ScholarDigital Library
- H. Agrawal, G. Chafle, S. Goyal, S. Mittal, S. Mukherjea: "An Enhanced Extract-Transform-Load System for Migrating Data in Telecom Billing", Int. Conference on Data Engineering (ICDE), pp. 1277--1286, 2008. Google ScholarDigital Library
- A. Simitsis, P. Vassiliadis: "A method for the mapping of conceptual designs to logical blueprints for ETL processes", Decision Support Systems (45) 22--40, 2008. Google ScholarDigital Library
- P. Vassiliadis, A. Simitsis, P. Georgantas, M. Terrovitis, S. Skiadopoulos: "A generic and customizable framework for the design of ETL scenarios", Inform. Systems, 30(7): 492--525, 2005.Google ScholarDigital Library
- IBM, IBM Data warehouse manager, available at http://www-3.ibm.com/software/data/db2/data warehouse/.Google Scholar
- Informatica, Power Center, available at http://www.informatica.com/products/data+integration/powercenter/default. htm.Google Scholar
- Microsoft, Data transformation services, available at http://www.microsoft.com.Google Scholar
- Oracle Corporation, "Oracle warehouse builder product page", available at http://otn.oracle.com/products/warehouse/content.htmlGoogle Scholar
Index Terms
- Optimizing data warehouse loading procedures for enabling useful-time data warehousing
Recommendations
Real-time data warehouse loading methodology
IDEAS '08: Proceedings of the 2008 international symposium on Database engineering & applicationsA data warehouse provides information for analytical processing, decision making and data mining tools. As the concept of real-time enterprise evolves, the synchronism between transactional data and data warehouses, statically implemented, has been ...
A data warehouse architecture for clinical data warehousing
ACSW '07: Proceedings of the fifth Australasian symposium on ACSW frontiers - Volume 68Data warehousing methodologies share a common set of tasks, including business requirements analysis, data design, architectural design, implementation and deployment. Clinical data warehouses are complex and time consuming to review a series of patient ...
Optimizing ETL by a Two-Level Data Staging Method
In data warehousing, the data from source systems are populated into a central data warehouse DW through extraction, transformation and loading ETL. The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as ...
Comments