skip to main content
10.1145/1620432.1620464acmconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

Optimizing data warehouse loading procedures for enabling useful-time data warehousing

Published:16 September 2009Publication History

ABSTRACT

The purpose of a data warehouse is to aid decision making. As the real-time enterprise evolves, synchronism between transactional data and data warehouses is redefined. To cope with real-time requirements, the data warehouses must be able to enable continuous data integration, in order to deal with the most recent business data. Traditional data warehouses are unable to support any dynamics in structure and content while they are available for OLAP. Their data is periodically updated because they are unprepared for continuous data integration. For real-time enterprises with needs in decision support while the transactions are occurring, (near) real-time data warehousing seem very promising. In this paper we present a survey on testing today's most used loading techniques and analyze which are the best data loading methods, presenting a methodology for efficiently supporting continuous data integration for data warehouses. To accomplish this, we use techniques such as table structure replication with minimum content and query predicate restrictions for selecting data, to enable loading data in the data warehouse continuously, with minimum impact in query execution time. We demonstrate the efficiency of the method using benchmark TPC-H and executing query workloads while simultaneously performing continuous data integration.

References

  1. D. J. Abadi, D. Carney, et al.: "Aurora: A New Model and Architecture for Data Stream Management", The VLDB Journal, 12(2), pp. 120--139, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Babu, and J. Widom: "Continuous Queries Over Data Streams", SIGMOD Record 30(3), pp. 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Binder: Gong User Manual, Tecco Software AG, 2003.Google ScholarGoogle Scholar
  4. M. Bouzeghoub, F. Fabret, and M. Matulovic: "Modeling Data Warehouse Refreshment Process as a Workflow Application", Int. Workshop on Design and Management of DW (DMDW), 1999.Google ScholarGoogle Scholar
  5. R. M. Bruckner, B. List, and J. Schiefer: "Striving Towards Near Real-Time Data Integration for Data Warehouses", Int. Conf. Data Warehousing and Knowledge Discovery (DAWAK), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. M. Bruckner, and A. M. Tjoa: "Capturing Delays and Valid Times in Data Warehouses -- Towards Timely Consistent Analyses". Journal of Intelligent Inf. Systems (JIIS), 19:2, pp. 169--190, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Chaudhuri, and U. Dayal: "An Overview of Data Warehousing and OLAP Technology", SIGMOD Record, Volume 26, Number 1, pp. 65--74, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. H. Inmon, R. H. Terdeman, J. Norris-Montanari, and D. Meers: Data Warehousing for E-Business, J. Wiley&Sons, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Italiano, and J. E. Ferreira: "Synchronization Options for Data Warehouse Designs", IEEE Computer Magazine, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Karakasidis, P. Vassiliadis, and E. Pitoura: "ETL Queues for Active Data Warehousing", IQIS'05, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Kimball, L. Reeves, M. Ross, and W. Thornthwaite: The Data Warehouse Lifecycle Toolkit -- Expert Methods for Designing, Developing and Deploying Data Warehouses, Wiley Computer Pub, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Kimball, and J. Caserta: The Data Warehouse ETL Toolkit, Wiley Computer Publishing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Kuhn: "The Zero-Delay Data Warehouse: Mobilizing Heterogeneous Databases", Int. Conf. on Very Large Data Bases (VLDB), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. Labio, J. Yang, Y. Cui, H. Garcia-Molina, and J. Widom: "Performance Issues in Incremental Warehouse Maintenance", Int. Conf. on Very Large Data Bases (VLDB), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Lomet, and J. Gehrke: Special Issue on Data Stream Processing, IEEE Data Eng. Bulletin, 26(1), 2003.Google ScholarGoogle Scholar
  16. Oracle Corporation, 2005. www.oracle.comGoogle ScholarGoogle Scholar
  17. T. B. Pedersen: "How is BI Used in Industry?", Int. Conf. on Data Warehousing and Knowledge Discovery (DAWAK), 2004.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. F. Roddick, and M. Schrefl: "Towards an Accommodation of Delay in Temporal Active Databases", 11<sup>th</sup> Australasian Database Conference (ADC), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Simitsis, P. Vassiliadis and T. Sellis: "Optimizing ETL Processes in Data Warehouses", Int. Conf. on Data Engineering (ICDE), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. U. Srivastava, and J. Widom: "Flexible Time Management in Data Stream Systems", Int. Conf. on Principles of Database Systems (PODS), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Theodoratus, and M. Bouzeghoub: "Data Currency Quality Factors in Data Warehouse Design", Int. Workshop on Design and Management of Data Warehouses (DMDW), 1999.Google ScholarGoogle Scholar
  22. TPC-H decision support benchmark, Transaction Processing Council, www.tpc.com.Google ScholarGoogle Scholar
  23. P. Vassiliadis, Z. Vagena, S. Skiadopoulos, N. Karayannidis, and T. Sellis: "ARKTOS: Towards the Modelling, Design, Control and Execution of ETL Processes", Inf. Systems, Vol. 26(8), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. White: "Intelligent Business Strategies: Real-Time Data Warehousing Heats Up", DM Preview, www.dmreview.com/article_sub_cfm?articleld=5570. 2002.Google ScholarGoogle Scholar
  25. J. Yang: "Temporal Data Warehousing", Ph.D. Thesis, Dp. Computer Science, Stanford Univ, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Yang, and J. Widom: "Incremental Computation and Maintenance of Temporal Aggregates", 17<sup>th</sup> Intern. Conference on Data Engineering (ICDE), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Yang, and J. Widom: "Temporal View Self-Maintenance", 7<sup>th</sup> Int. Conf. Extending Database Technology (EDBT), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Zurek, and K. Kreplin: "SAP Business Information Warehouse -- From Data Warehousing to an E-Business Platform", 17 Int. Conf. on Data Engineering (ICDE), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Burleson: Oracle data load (import, SQL*Loader) speed tips, Burleson Consulting, http://www.dbaoracle.com/oracle_tips_load_speed.htm. 2006.Google ScholarGoogle Scholar
  30. Oracle Corporation: http://www.oracle.com/technology/products/database/utilities/index. html. 2008.Google ScholarGoogle Scholar
  31. N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Çetintemel, M. Cherniack, R. Tibbetts, S. B. Zdonik: "Towards a Streaming SQL Standard", Int. Conf. Very Large Data Bases (VLDB), 1(2): 1379--1390, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Polyzotis, S. Skiadopoulos, P. Vassiliadis, A. Simitsis, N. Frantzell: "Meshing Streaming Updates with Persistent Data in an Active Data Warehouse", IEEE Transactions on Knowl. Data Eng., 20(7): 976--991, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Agrawal, G. Chafle, S. Goyal, S. Mittal, S. Mukherjea: "An Enhanced Extract-Transform-Load System for Migrating Data in Telecom Billing", Int. Conference on Data Engineering (ICDE), pp. 1277--1286, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Simitsis, P. Vassiliadis: "A method for the mapping of conceptual designs to logical blueprints for ETL processes", Decision Support Systems (45) 22--40, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Vassiliadis, A. Simitsis, P. Georgantas, M. Terrovitis, S. Skiadopoulos: "A generic and customizable framework for the design of ETL scenarios", Inform. Systems, 30(7): 492--525, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. IBM, IBM Data warehouse manager, available at http://www-3.ibm.com/software/data/db2/data warehouse/.Google ScholarGoogle Scholar
  37. Informatica, Power Center, available at http://www.informatica.com/products/data+integration/powercenter/default. htm.Google ScholarGoogle Scholar
  38. Microsoft, Data transformation services, available at http://www.microsoft.com.Google ScholarGoogle Scholar
  39. Oracle Corporation, "Oracle warehouse builder product page", available at http://otn.oracle.com/products/warehouse/content.htmlGoogle ScholarGoogle Scholar

Index Terms

  1. Optimizing data warehouse loading procedures for enabling useful-time data warehousing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          IDEAS '09: Proceedings of the 2009 International Database Engineering & Applications Symposium
          September 2009
          347 pages
          ISBN:9781605584027
          DOI:10.1145/1620432

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 September 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate74of210submissions,35%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader