research-article

Navigating big data with high-throughput, energy-efficient data partitioning

Authors:
Lisa Wu

Columbia University, New York

Columbia University, New York
View Profile

,
Raymond J. Barker

Columbia University, New York

Columbia University, New York
View Profile

,
Martha A. Kim

Columbia University, New York

Columbia University, New York
View Profile

,
Kenneth A. Ross

Columbia University, New York

Columbia University, New York
View Profile

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer ArchitectureJune 2013Pages 249–260https://doi.org/10.1145/2485922.2485944

Published:23 June 2013Publication History

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

Pages 249–260

ABSTRACT

The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve the throughput and energy efficiency of large-scale data processing. In particular, data partitioning is a critical operation for manipulating large data sets. It is often the limiting factor in database performance and represents a significant fraction of the overall runtime of large data queries.

To accelerate partitioning, this paper describes a hardware accelerator for range partitioning, or HARP, and a hardware-software data streaming framework. The streaming framework offers a seamless execution environment for streaming accelerators such as HARP. Together, HARP and the streaming framework provide an order of magnitude improvement in partitioning performance and energy. A detailed analysis of a 32nm physical design shows 7.8 times the throughput of a highly optimized and optimistic software implementation, while consuming just 6.9% of the area and 4.3% of the power of a single Xeon core in the same technology generation.

References

A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB, 1999. Google ScholarDigital Library
S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, 2011. Google ScholarDigital Library
Bluespec, Inc. Bluespec Core Technology. http://www.bluespec.com.Google Scholar
H. Boral and D. J. DeWitt. Database machines: an idea whose time has passed? In IWDM, 1983.Google Scholar
R. D. Cameron and D. Lin. Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle. In ASPLOS, 2009. Google ScholarDigital Library
Centrum Wiskunde and Informatica. http://www.monetdb.org.Google Scholar
S. Chakraborty and L. Thiele. A new task model for streaming applications and its schedulability analysis. In DATE, 2005. Google ScholarDigital Library
D. Chatziantoniou and K. A. Ross. Partitioned optimization of complex queries. Information Systems (IS), 32(2):248--282, 2007. Google ScholarDigital Library
J. Cieslewicz and K. A. Ross. Data partitioning on chip multiprocessors. In DaMoN, 2008. Google ScholarDigital Library
S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi. The reconfigurable streaming vector processor (RSVPTM). In MICRO, 2003. Google ScholarDigital Library
B. F. Cooper and K. Schwan. Distributed stream management using utility-driven self-adaptive middleware. In CAC, 2005. Google ScholarDigital Library
Q. Deng, D. Meisner, L. Ramos, T. F. Wenisch, and R. Bianchini. Memscale: active low-power modes for main memory. In ASPLOS, 2011. Google ScholarDigital Library
M. Duller, J. S. Rellermeyer, G. Alonso, and N. Tatbul. Virtualizing stream processing. In Middleware, 2011. Google ScholarDigital Library
E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. Parallel application memory scheduling. In MICRO, 2011. Google ScholarDigital Library
B. Flachs et al. A streaming processing unit for a CELL processor. In ISSCC, 2005.Google ScholarCross Ref
S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer. PipeRench: a co/processor for streaming multimedia acceleration. In ISCA, 1999. Google ScholarDigital Library
M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006. Google ScholarDigital Library
N. K. Govindaraju and D. Manocha. Efficient relational database management using graphics processors. In DaMoN, 2005. Google ScholarDigital Library
V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically specialized datapaths for energy efficient computing. In HPCA, 2011. Google ScholarDigital Library
G. Graefe and P.-A. Larson. B-tree indexes and CPU caches. In ICDE, 2001. Google ScholarDigital Library
N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4), 2011. Google ScholarDigital Library
HP Labs. http://www.hpl.hp.com/research/cacti/.Google Scholar
IBM. DB2 Partitioning Features. http://www.ibm.com/developerworks/data/library/techarticle/dm-0608mcinerney.Google Scholar
IBM. IBM What is big data? Bringing big data to enterprise. http://www-01.ibm.com/software/data/bigdata/.Google Scholar
Intel Corporation. Intel® Xeon® Processor E5620. http://ark.intel.com/products/47925.Google Scholar
E. Ipek, O. Mutlu, J. F. Martínez, and R. Caruana. Self-optimizing memory controllers: A reinforcement learning approach. In ISCA, 2008. Google ScholarDigital Library
N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani. Design, implementation, and evaluation of the linear road bnchmark on the stream processing core. In SIGMOD, 2006. Google ScholarDigital Library
N. P. Jouppi. Improvind direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA, 1990. Google ScholarDigital Library
C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB, 2(2):1378--1389, 2009. Google ScholarDigital Library
C. Kozyrakis, A. Kansal, S. Sankar, and K. Vaid. Server engineering insights for large-scale online services. IEEE Micro, 30(4), July/August 2010. Google ScholarDigital Library
J. Krueger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast updates on read-optimized databases using multi-core CPUs. PVLDB, 5(1):61--72, Sept. 2011. Google ScholarDigital Library
D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. Cameron. Parabix: Boosting the efficiency of text processing on commodity processors. In HPCA, 2012. Google ScholarDigital Library
K. T. Malladi, F. Nothaft, K. Periyathambi, B. C. Lee, C. Kozyrakis, and M. Horowitz. Towards energy-proportional datacenter memory with mobile dram. In ISCA, 2012. Google ScholarDigital Library
Microsoft. Microsoft SQL Server 2012. http://technet.microsoft.com/en-us/sqlserver/ff898410.Google Scholar
C. Mohan. Impact of recent hardware and software trends on high performance transaction processing and analytics. In TPCTC, 2011. Google ScholarDigital Library
R. Müller and J. Teubner. FPGAs: a new point in the database design space. In EDBT, 2010.Google ScholarDigital Library
MySQL. Date and time datatype representation. http://dev.mysql.com/doc/internals/en/date-and-time-data-type-representation.html.Google Scholar
C. Natarajan, B. Christenson, and F. Briggs. A study of performance impact of memory controller features in multi-processor server environment. In WMPI, 2004. Google ScholarDigital Library
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDMW, 2010. Google ScholarDigital Library
Oracle. Oracle Database 11g: Partitioning. http://www.oracle.com/technetwork/database/options/partitioning/index.html.Google Scholar
N. Rafique, W.-T. Lim, and M. Thottethodi. Effective Management of DRAM Bandwidth in Multicore Processors. In PACT, 2007. Google ScholarDigital Library
S. Rixner. Memory controller optimizations for web servers. In MICRO, 2004. Google ScholarDigital Library
K. A. Ross and J. Cieslewicz. Optimal splitters for database partitioning with size bounds. In ICDT, pages 98--110, 2009. Google ScholarDigital Library
P. Saab. Scaling memcached at Facebook, Dec 2008. https://www.facebook.com/note.php?note_id=39391378919.Google Scholar
V. Salapura, T. Karkhanis, P. Nagpurkar, and J. Moreira. Accelerating business analytics applications. In HPCA, 2012. Google ScholarDigital Library
B. Schlegel, R. Gemulla, and W. Lehner. k-ary search on modern processors. In DaMoN, 2009. Google ScholarDigital Library
J. Shao and B. Davis. A burst scheduling access reordering mechanism. In HPCA, 2007. Google ScholarDigital Library
H. Subramoni, F. Petrini, V. Agarwal, and D. Pasetto. Intra-socket and inter-socket communication in multi-core systems. IEEE Computer Architecture Letters, 9:13--16, January 2010. Google ScholarDigital Library
Synopsys, Inc. 32/28nm Generic Library for IC Design, Design Compiler, IC Compiler. http://www.synopsys.com.Google Scholar
L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, 2011. Google ScholarDigital Library
Transaction Processing Performance Council. http://www.tpc.org/tpch/default.asp.Google Scholar
M. A. Watkins and D. H. Albonesi. ReMAP: A reconfigurable heterogeneous multicore architecture. In MICRO, 2010. Google ScholarDigital Library
L. Woods, J. Teubner, and G. Alonso. Complex event detection at wire speed with FPGAs. PVLDB, 3(1):660--669, 2010. Google ScholarDigital Library
Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In DaMoN, 2011. Google ScholarDigital Library
J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD, 2002. Google ScholarDigital Library

Index Terms

Navigating big data with high-throughput, energy-efficient data partitioning
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems

Recommendations

FPGA-based Data Partitioning
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Implementing parallel operators in multi-core machines often involves a data partitioning step that divides the data into cache-size blocks and arranges them so to allow concurrent threads to process them in parallel. Data partitioning is expensive, in ...
Read More
Q100: the architecture and design of a database processing unit
ASPLOS '14

In this paper, we propose Database Processing Units, or DPUs, a class of domain-specific database processors that can efficiently handle database applications. As a proof of concept, we present the instruction set architecture, microarchitecture, and ...
Read More
Navigating big data with high-throughput, energy-efficient data partitioning
ICSA '13

The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
General Chair:
Avi Mendelson
Technion
ACM SIGARCH Computer Architecture News Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accelerator
data partitioning
microarchitecture
specialized functional unit
streaming data
Qualifiers
- research-article
Conference

Acceptance Rates
ISCA '13 Paper Acceptance Rate56of288submissions,19%Overall Acceptance Rate543of3,203submissions,17%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 80
  Total Citations
  View Citations
- 1,138
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Navigating big data with high-throughput, energy-efficient data partitioning

ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

ABSTRACT

References

Cited By

Index Terms

Recommendations

FPGA-based Data Partitioning

Q100: the architecture and design of a database processing unit

Navigating big data with high-throughput, energy-efficient data partitioning