skip to main content
10.1145/2485922.2485944acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Navigating big data with high-throughput, energy-efficient data partitioning

Published:23 June 2013Publication History

ABSTRACT

The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve the throughput and energy efficiency of large-scale data processing. In particular, data partitioning is a critical operation for manipulating large data sets. It is often the limiting factor in database performance and represents a significant fraction of the overall runtime of large data queries.

To accelerate partitioning, this paper describes a hardware accelerator for range partitioning, or HARP, and a hardware-software data streaming framework. The streaming framework offers a seamless execution environment for streaming accelerators such as HARP. Together, HARP and the streaming framework provide an order of magnitude improvement in partitioning performance and energy. A detailed analysis of a 32nm physical design shows 7.8 times the throughput of a highly optimized and optimistic software implementation, while consuming just 6.9% of the area and 4.3% of the power of a single Xeon core in the same technology generation.

References

  1. A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood. DBMSs on a modern processor: Where does time go? In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Blanas, Y. Li, and J. M. Patel. Design and evaluation of main memory hash join algorithms for multi-core CPUs. In SIGMOD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bluespec, Inc. Bluespec Core Technology. http://www.bluespec.com.Google ScholarGoogle Scholar
  4. H. Boral and D. J. DeWitt. Database machines: an idea whose time has passed? In IWDM, 1983.Google ScholarGoogle Scholar
  5. R. D. Cameron and D. Lin. Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Centrum Wiskunde and Informatica. http://www.monetdb.org.Google ScholarGoogle Scholar
  7. S. Chakraborty and L. Thiele. A new task model for streaming applications and its schedulability analysis. In DATE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Chatziantoniou and K. A. Ross. Partitioned optimization of complex queries. Information Systems (IS), 32(2):248--282, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cieslewicz and K. A. Ross. Data partitioning on chip multiprocessors. In DaMoN, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Norris, M. Schuette, and A. Saidi. The reconfigurable streaming vector processor (RSVPTM). In MICRO, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. F. Cooper and K. Schwan. Distributed stream management using utility-driven self-adaptive middleware. In CAC, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Q. Deng, D. Meisner, L. Ramos, T. F. Wenisch, and R. Bianchini. Memscale: active low-power modes for main memory. In ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Duller, J. S. Rellermeyer, G. Alonso, and N. Tatbul. Virtualizing stream processing. In Middleware, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C. J. Lee, J. A. Joao, O. Mutlu, and Y. N. Patt. Parallel application memory scheduling. In MICRO, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Flachs et al. A streaming processing unit for a CELL processor. In ISSCC, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer. PipeRench: a co/processor for streaming multimedia acceleration. In ISCA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. I. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In ASPLOS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. K. Govindaraju and D. Manocha. Efficient relational database management using graphics processors. In DaMoN, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Govindaraju, C.-H. Ho, and K. Sankaralingam. Dynamically specialized datapaths for energy efficient computing. In HPCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Graefe and P.-A. Larson. B-tree indexes and CPU caches. In ICDE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31(4), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. HP Labs. http://www.hpl.hp.com/research/cacti/.Google ScholarGoogle Scholar
  23. IBM. DB2 Partitioning Features. http://www.ibm.com/developerworks/data/library/techarticle/dm-0608mcinerney.Google ScholarGoogle Scholar
  24. IBM. IBM What is big data? Bringing big data to enterprise. http://www-01.ibm.com/software/data/bigdata/.Google ScholarGoogle Scholar
  25. Intel Corporation. Intel® Xeon® Processor E5620. http://ark.intel.com/products/47925.Google ScholarGoogle Scholar
  26. E. Ipek, O. Mutlu, J. F. Martínez, and R. Caruana. Self-optimizing memory controllers: A reinforcement learning approach. In ISCA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani. Design, implementation, and evaluation of the linear road bnchmark on the stream processing core. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. P. Jouppi. Improvind direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In ISCA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Kim, E. Sedlar, J. Chhugani, T. Kaldewey, A. D. Nguyen, A. D. Blas, V. W. Lee, N. Satish, and P. Dubey. Sort vs. hash revisited: Fast join implementation on modern multi-core CPUs. PVLDB, 2(2):1378--1389, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Kozyrakis, A. Kansal, S. Sankar, and K. Vaid. Server engineering insights for large-scale online services. IEEE Micro, 30(4), July/August 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Krueger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast updates on read-optimized databases using multi-core CPUs. PVLDB, 5(1):61--72, Sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Lin, N. Medforth, K. S. Herdy, A. Shriraman, and R. Cameron. Parabix: Boosting the efficiency of text processing on commodity processors. In HPCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. T. Malladi, F. Nothaft, K. Periyathambi, B. C. Lee, C. Kozyrakis, and M. Horowitz. Towards energy-proportional datacenter memory with mobile dram. In ISCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Microsoft. Microsoft SQL Server 2012. http://technet.microsoft.com/en-us/sqlserver/ff898410.Google ScholarGoogle Scholar
  35. C. Mohan. Impact of recent hardware and software trends on high performance transaction processing and analytics. In TPCTC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Müller and J. Teubner. FPGAs: a new point in the database design space. In EDBT, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. MySQL. Date and time datatype representation. http://dev.mysql.com/doc/internals/en/date-and-time-data-type-representation.html.Google ScholarGoogle Scholar
  38. C. Natarajan, B. Christenson, and F. Briggs. A study of performance impact of memory controller features in multi-processor server environment. In WMPI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDMW, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Oracle. Oracle Database 11g: Partitioning. http://www.oracle.com/technetwork/database/options/partitioning/index.html.Google ScholarGoogle Scholar
  41. N. Rafique, W.-T. Lim, and M. Thottethodi. Effective Management of DRAM Bandwidth in Multicore Processors. In PACT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Rixner. Memory controller optimizations for web servers. In MICRO, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. K. A. Ross and J. Cieslewicz. Optimal splitters for database partitioning with size bounds. In ICDT, pages 98--110, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. Saab. Scaling memcached at Facebook, Dec 2008. https://www.facebook.com/note.php?note_id=39391378919.Google ScholarGoogle Scholar
  45. V. Salapura, T. Karkhanis, P. Nagpurkar, and J. Moreira. Accelerating business analytics applications. In HPCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. B. Schlegel, R. Gemulla, and W. Lehner. k-ary search on modern processors. In DaMoN, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Shao and B. Davis. A burst scheduling access reordering mechanism. In HPCA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. H. Subramoni, F. Petrini, V. Agarwal, and D. Pasetto. Intra-socket and inter-socket communication in multi-core systems. IEEE Computer Architecture Letters, 9:13--16, January 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Synopsys, Inc. 32/28nm Generic Library for IC Design, Design Compiler, IC Compiler. http://www.synopsys.com.Google ScholarGoogle Scholar
  50. L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The impact of memory subsystem resource sharing on datacenter applications. In ISCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Transaction Processing Performance Council. http://www.tpc.org/tpch/default.asp.Google ScholarGoogle Scholar
  52. M. A. Watkins and D. H. Albonesi. ReMAP: A reconfigurable heterogeneous multicore architecture. In MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. L. Woods, J. Teubner, and G. Alonso. Complex event detection at wire speed with FPGAs. PVLDB, 3(1):660--669, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In DaMoN, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Navigating big data with high-throughput, energy-efficient data partitioning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
        June 2013
        686 pages
        ISBN:9781450320795
        DOI:10.1145/2485922
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
          ICSA '13
          June 2013
          666 pages
          ISSN:0163-5964
          DOI:10.1145/2508148
          Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 June 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ISCA '13 Paper Acceptance Rate56of288submissions,19%Overall Acceptance Rate543of3,203submissions,17%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader