ABSTRACT
The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve the throughput and energy efficiency of large-scale data processing. In particular, data partitioning is a critical operation for manipulating large data sets. It is often the limiting factor in database performance and represents a significant fraction of the overall runtime of large data queries.
To accelerate partitioning, this paper describes a hardware accelerator for range partitioning, or HARP, and a hardware-software data streaming framework. The streaming framework offers a seamless execution environment for streaming accelerators such as HARP. Together, HARP and the streaming framework provide an order of magnitude improvement in partitioning performance and energy. A detailed analysis of a 32nm physical design shows 7.8 times the throughput of a highly optimized and optimistic software implementation, while consuming just 6.9% of the area and 4.3% of the power of a single Xeon core in the same technology generation.
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB, 1999. Google ScholarDigital Library
- S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, 2011. Google ScholarDigital Library
- Bluespec, Inc. Bluespec Core Technology. http://www.bluespec.com.Google Scholar
- H. Boral and D. J. DeWitt. Database machines: an idea whose time has passed? In IWDM, 1983.Google Scholar
- R. D. Cameron and D. Lin. Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle. In ASPLOS, 2009. Google ScholarDigital Library
- Centrum Wiskunde and Informatica. http://www.monetdb.org.Google Scholar
- S. Chakraborty and L. Thiele. A new task model for streaming applications and its schedulability analysis. In DATE, 2005. Google ScholarDigital Library
- D. Chatziantoniou and K. A. Ross. Partitioned optimization of complex queries. Information Systems (IS), 32(2):248--282, 2007. Google ScholarDigital Library
- J. Cieslewicz and K. A. Ross. Data partitioning on chip multiprocessors. In DaMoN, 2008. Google ScholarDigital Library
- S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi. The reconfigurable streaming vector processor (RSVPTM). In MICRO, 2003. Google ScholarDigital Library
- B. F. Cooper and K. Schwan. Distributed stream management using utility-driven self-adaptive middleware. In CAC, 2005. Google ScholarDigital Library
- Q. Deng, D. Meisner, L. Ramos, T. F. Wenisch, and R. Bianchini. Memscale: active low-power modes for main memory. In ASPLOS, 2011. Google ScholarDigital Library
- M. Duller, J. S. Rellermeyer, G. Alonso, and N. Tatbul. Virtualizing stream processing. In Middleware, 2011. Google ScholarDigital Library
- E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. Parallel application memory scheduling. In MICRO, 2011. Google ScholarDigital Library
- B. Flachs et al. A streaming processing unit for a CELL processor. In ISSCC, 2005.Google ScholarCross Ref
- S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer. PipeRench: a co/processor for streaming multimedia acceleration. In ISCA, 1999. Google ScholarDigital Library
- M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006. Google ScholarDigital Library
- N. K. Govindaraju and D. Manocha. Efficient relational database management using graphics processors. In DaMoN, 2005. Google ScholarDigital Library
- V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically specialized datapaths for energy efficient computing. In HPCA, 2011. Google ScholarDigital Library
- G. Graefe and P.-A. Larson. B-tree indexes and CPU caches. In ICDE, 2001. Google ScholarDigital Library
- N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4), 2011. Google ScholarDigital Library
- HP Labs. http://www.hpl.hp.com/research/cacti/.Google Scholar
- IBM. DB2 Partitioning Features. http://www.ibm.com/developerworks/data/library/techarticle/dm-0608mcinerney.Google Scholar
- IBM. IBM What is big data? Bringing big data to enterprise. http://www-01.ibm.com/software/data/bigdata/.Google Scholar
- Intel Corporation. Intel® Xeon® Processor E5620. http://ark.intel.com/products/47925.Google Scholar
- E. Ipek, O. Mutlu, J. F. Martínez, and R. Caruana. Self-optimizing memory controllers: A reinforcement learning approach. In ISCA, 2008. Google ScholarDigital Library
- N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani. Design, implementation, and evaluation of the linear road bnchmark on the stream processing core. In SIGMOD, 2006. Google ScholarDigital Library
- N. P. Jouppi. Improvind direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA, 1990. Google ScholarDigital Library
- C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB, 2(2):1378--1389, 2009. Google ScholarDigital Library
- C. Kozyrakis, A. Kansal, S. Sankar, and K. Vaid. Server engineering insights for large-scale online services. IEEE Micro, 30(4), July/August 2010. Google ScholarDigital Library
- J. Krueger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast updates on read-optimized databases using multi-core CPUs. PVLDB, 5(1):61--72, Sept. 2011. Google ScholarDigital Library
- D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. Cameron. Parabix: Boosting the efficiency of text processing on commodity processors. In HPCA, 2012. Google ScholarDigital Library
- K. T. Malladi, F. Nothaft, K. Periyathambi, B. C. Lee, C. Kozyrakis, and M. Horowitz. Towards energy-proportional datacenter memory with mobile dram. In ISCA, 2012. Google ScholarDigital Library
- Microsoft. Microsoft SQL Server 2012. http://technet.microsoft.com/en-us/sqlserver/ff898410.Google Scholar
- C. Mohan. Impact of recent hardware and software trends on high performance transaction processing and analytics. In TPCTC, 2011. Google ScholarDigital Library
- R. Müller and J. Teubner. FPGAs: a new point in the database design space. In EDBT, 2010.Google ScholarDigital Library
- MySQL. Date and time datatype representation. http://dev.mysql.com/doc/internals/en/date-and-time-data-type-representation.html.Google Scholar
- C. Natarajan, B. Christenson, and F. Briggs. A study of performance impact of memory controller features in multi-processor server environment. In WMPI, 2004. Google ScholarDigital Library
- L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDMW, 2010. Google ScholarDigital Library
- Oracle. Oracle Database 11g: Partitioning. http://www.oracle.com/technetwork/database/options/partitioning/index.html.Google Scholar
- N. Rafique, W.-T. Lim, and M. Thottethodi. Effective Management of DRAM Bandwidth in Multicore Processors. In PACT, 2007. Google ScholarDigital Library
- S. Rixner. Memory controller optimizations for web servers. In MICRO, 2004. Google ScholarDigital Library
- K. A. Ross and J. Cieslewicz. Optimal splitters for database partitioning with size bounds. In ICDT, pages 98--110, 2009. Google ScholarDigital Library
- P. Saab. Scaling memcached at Facebook, Dec 2008. https://www.facebook.com/note.php?note_id=39391378919.Google Scholar
- V. Salapura, T. Karkhanis, P. Nagpurkar, and J. Moreira. Accelerating business analytics applications. In HPCA, 2012. Google ScholarDigital Library
- B. Schlegel, R. Gemulla, and W. Lehner. k-ary search on modern processors. In DaMoN, 2009. Google ScholarDigital Library
- J. Shao and B. Davis. A burst scheduling access reordering mechanism. In HPCA, 2007. Google ScholarDigital Library
- H. Subramoni, F. Petrini, V. Agarwal, and D. Pasetto. Intra-socket and inter-socket communication in multi-core systems. IEEE Computer Architecture Letters, 9:13--16, January 2010. Google ScholarDigital Library
- Synopsys, Inc. 32/28nm Generic Library for IC Design, Design Compiler, IC Compiler. http://www.synopsys.com.Google Scholar
- L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, 2011. Google ScholarDigital Library
- Transaction Processing Performance Council. http://www.tpc.org/tpch/default.asp.Google Scholar
- M. A. Watkins and D. H. Albonesi. ReMAP: A reconfigurable heterogeneous multicore architecture. In MICRO, 2010. Google ScholarDigital Library
- L. Woods, J. Teubner, and G. Alonso. Complex event detection at wire speed with FPGAs. PVLDB, 3(1):660--669, 2010. Google ScholarDigital Library
- Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In DaMoN, 2011. Google ScholarDigital Library
- J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD, 2002. Google ScholarDigital Library
Index Terms
- Navigating big data with high-throughput, energy-efficient data partitioning
Recommendations
FPGA-based Data Partitioning
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataImplementing parallel operators in multi-core machines often involves a data partitioning step that divides the data into cache-size blocks and arranges them so to allow concurrent threads to process them in parallel. Data partitioning is expensive, in ...
Q100: the architecture and design of a database processing unit
ASPLOS '14In this paper, we propose Database Processing Units, or DPUs, a class of domain-specific database processors that can efficiently handle database applications. As a proof of concept, we present the instruction set architecture, microarchitecture, and ...
Navigating big data with high-throughput, energy-efficient data partitioning
ICSA '13The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve ...
Comments