skip to main content
10.1145/1807167.1807224acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

The DataPath system: a data-centric analytic processing engine for large data warehouses

Published:06 June 2010Publication History

ABSTRACT

Since the 1970's, database systems have been "compute-centric". When a computation needs the data, it requests the data, and the data are pulled through the system. We believe that this is problematic for two reasons. First, requests for data naturally incur high latency as the data are pulled through the memory hierarchy, and second, it makes it difficult or impossible for multiple queries or operations that are interested in the same data to amortize the bandwidth and latency costs associated with their data access.

In this paper, we describe a purely-push based, research prototype database system called DataPath. DataPath is "data-centric". In DataPath, queries do not request data. Instead, data are automatically pushed onto processors, where they are then processed by any interested computation. We show experimentally on a multi-terabyte benchmark that this basic design principle makes for a very lean and fast database system.

References

  1. D. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. VLDB J., 12(2):120--139, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Ailamaki, D. DeWitt, M. Hill, and M. Skounakis. Weaving relations for cache performance. In VLDB, pages 169--180, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, pages 261--272, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Candea, N. Polyzotis, and R. Vingralek. A scalable, predictable join operator for highly concurrent data warehouses. PVLDB, 2(1):277--288, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Chen, D. DeWitt, F. Tian, and Y. Wang. Niagaracq: A scalable continuous query system for internet databases. In SIGMOD Conference, pages 379--390, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chen, A. Ailamaki, P. Gibbons, and T. Mowry. Improving hash join performance through prefetching. ACM Trans. Database Syst., 32(3):17, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. C. et al. Telegraphcq: Continuous dataflow processing for an uncertain world. In CIDR, 2003.Google ScholarGoogle Scholar
  8. G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73--170, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Graefe. Volcano - an extensible and parallel query evaluation system. IEEE TKDE, 6(1):120--135, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Harizopoulos and A. Ailamaki. Stageddb: Designing database servers for modern hardware. IEEE Data Eng. Bull., 28(2):11--16, 2005.Google ScholarGoogle Scholar
  11. S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. Qpipe: A simultaneously pipelined relational query engine. In SIGMOD, pages 383--394, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn. Optimizing disjunctive queries with expensive predicates. In SIGMOD Conference, pages 336--347, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Litwin. Linear hashing: A new tool for file and table addressing. In VLDB, pages 212--223. IEEE Computer Society, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Manegold, P. Boncz, and N. Nes. Cache-conscious radix-decluster projections. In VLDB, pages 684--695, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Mannino, P. Chu, and T. Sager. Statistical profile estimation in database systems. ACM Comput. Surv., 20(3):191--221, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. K. Sellis. Global query optimization. In SIGMOD Conference, pages 191--205, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. K. Sellis. Multiple-query optimization. ACM Trans. Database Syst., 13(1):23--52, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Steinbrunn, K. Peithner, G. Moerkotte, and A. Kemper. Bypassing joins in disjunctive queries. In VLDB, pages 228--238, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable performance for unpredictable workloads. PVLDB, 2(1):706--717, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Zukowski, S. Héman, and P. Boncz. Architecture-conscious hashing. In DaMoN, page 6, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Zukowski, S. Héman, N. Nes, and P. Boncz. Cooperative scans: Dynamic bandwidth sharing in a dbms. In VLDB, pages 723--734, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The DataPath system: a data-centric analytic processing engine for large data warehouses

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
      June 2010
      1286 pages
      ISBN:9781450300322
      DOI:10.1145/1807167

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 6 June 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Author Tags

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader