ABSTRACT
Since the 1970's, database systems have been "compute-centric". When a computation needs the data, it requests the data, and the data are pulled through the system. We believe that this is problematic for two reasons. First, requests for data naturally incur high latency as the data are pulled through the memory hierarchy, and second, it makes it difficult or impossible for multiple queries or operations that are interested in the same data to amortize the bandwidth and latency costs associated with their data access.
In this paper, we describe a purely-push based, research prototype database system called DataPath. DataPath is "data-centric". In DataPath, queries do not request data. Instead, data are automatically pushed onto processors, where they are then processed by any interested computation. We show experimentally on a multi-terabyte benchmark that this basic design principle makes for a very lean and fast database system.
- D. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. VLDB J., 12(2):120--139, 2003. Google ScholarDigital Library
- A. Ailamaki, D. DeWitt, M. Hill, and M. Skounakis. Weaving relations for cache performance. In VLDB, pages 169--180, 2001. Google ScholarDigital Library
- R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, pages 261--272, 2000. Google ScholarDigital Library
- G. Candea, N. Polyzotis, and R. Vingralek. A scalable, predictable join operator for highly concurrent data warehouses. PVLDB, 2(1):277--288, 2009. Google ScholarDigital Library
- J. Chen, D. DeWitt, F. Tian, and Y. Wang. Niagaracq: A scalable continuous query system for internet databases. In SIGMOD Conference, pages 379--390, 2000. Google ScholarDigital Library
- S. Chen, A. Ailamaki, P. Gibbons, and T. Mowry. Improving hash join performance through prefetching. ACM Trans. Database Syst., 32(3):17, 2007. Google ScholarDigital Library
- S. C. et al. Telegraphcq: Continuous dataflow processing for an uncertain world. In CIDR, 2003.Google Scholar
- G. Graefe. Query evaluation techniques for large databases. ACM Comput. Surv., 25(2):73--170, 1993. Google ScholarDigital Library
- G. Graefe. Volcano - an extensible and parallel query evaluation system. IEEE TKDE, 6(1):120--135, 1994. Google ScholarDigital Library
- S. Harizopoulos and A. Ailamaki. Stageddb: Designing database servers for modern hardware. IEEE Data Eng. Bull., 28(2):11--16, 2005.Google Scholar
- S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. Qpipe: A simultaneously pipelined relational query engine. In SIGMOD, pages 383--394, 2005. Google ScholarDigital Library
- A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn. Optimizing disjunctive queries with expensive predicates. In SIGMOD Conference, pages 336--347, 1994. Google ScholarDigital Library
- W. Litwin. Linear hashing: A new tool for file and table addressing. In VLDB, pages 212--223. IEEE Computer Society, 1980. Google ScholarDigital Library
- S. Manegold, P. Boncz, and N. Nes. Cache-conscious radix-decluster projections. In VLDB, pages 684--695, 2004. Google ScholarDigital Library
- M. Mannino, P. Chu, and T. Sager. Statistical profile estimation in database systems. ACM Comput. Surv., 20(3):191--221, 1988. Google ScholarDigital Library
- T. K. Sellis. Global query optimization. In SIGMOD Conference, pages 191--205, 1986. Google ScholarDigital Library
- T. K. Sellis. Multiple-query optimization. ACM Trans. Database Syst., 13(1):23--52, 1988. Google ScholarDigital Library
- M. Steinbrunn, K. Peithner, G. Moerkotte, and A. Kemper. Bypassing joins in disjunctive queries. In VLDB, pages 228--238, 1995. Google ScholarDigital Library
- P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable performance for unpredictable workloads. PVLDB, 2(1):706--717, 2009. Google ScholarDigital Library
- M. Zukowski, S. Héman, and P. Boncz. Architecture-conscious hashing. In DaMoN, page 6, 2006. Google ScholarDigital Library
- M. Zukowski, S. Héman, N. Nes, and P. Boncz. Cooperative scans: Dynamic bandwidth sharing in a dbms. In VLDB, pages 723--734, 2007. Google ScholarDigital Library
Index Terms
- The DataPath system: a data-centric analytic processing engine for large data warehouses
Recommendations
Cache matching: thread scheduling to maximize data reuse
HPC '14: Proceedings of the High Performance Computing SymposiumDatacenters today often execute multiple data-intensive threads concurrently. To improve the latency of threads accessing slow external storage, data is often cached in memory. The way in which the cache is shared between concurrent threads has a ...
Improved Techniques for Caches of Search Engines Results
WISM '10: Proceedings of the 2010 International Conference on Web Information Systems and Mining - Volume 01Result caching is an efficient technique for reducing the query processing load, hence it is commonly used in search engines. In this paper, we study query result caching and proposes a cache management policy for achieving higher hit ratios compared to ...
Matrix multiplication: a case study of enhanced data cache utilization
Modern machines present two challenges to algorithm engineers and compiler writers: They have superscalar, super-pipelined structure, and they have elaborate memory subsystems specifically designed to reduce latency and increase bandwidth. Matrix ...
Comments