skip to main content
10.1145/3287624.3287636acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

Collaborative accelerators for in-memory MapReduce on scale-up machines

Published:21 January 2019Publication History

ABSTRACT

Relying on efficient data analytics platforms is increasingly becoming crucial for both small and large scale datasets. While MapReduce implementations, such as Hadoop and Spark, were originally proposed for petascale processing in scale-out clusters, it has been noted that, today, most data centers processes operate on gigabyte-order or smaller datasets, which are best processed in single high-end scale-up machines. In this context, Phoenix++ is a highly optimized MapReduce framework available for chip-multiprocessor (CMP) scale-up machines. In this paper we observe that Phoenix++ suffers from an inefficient utilization of the memory subsystem, and a serialized execution of the MapReduce stages. To overcome these inefficiencies, we propose CASM, an architecture that equips each core in a CMP design with a dedicated instance of a specialized hardware unit (the CASM accelerators). These units collaborate to manage the key-value data structure and minimize both on- and off-chip communication costs. Our experimental evaluation on a 64-core design indicates that CASM provides more than a 4x speedup over the highly optimized Phoenix++ framework, while keeping area overhead at only 6%, and reducing energy demands by over 3.5x.

References

  1. Abraham Addisie, Hiwot Kassa, Opeoluwa Matthews, and Valeria Bertacco. 2018. Heterogeneous Memory Subsystem for Natural Graph Analytics. In Proc. IISWC.Google ScholarGoogle ScholarCross RefCross Ref
  2. Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute caches. In Proc. HPCA.Google ScholarGoogle ScholarCross RefCross Ref
  3. Faraz Ahmad, Seyong Lee, Mithuna Thottethodi, and TN Vijaykumar. 2012. Puma: Purdue mapreduce benchmarks suite. (2012).Google ScholarGoogle Scholar
  4. Raja Appuswamy, Christos Gkantsidis, Dushyanth Narayanan, Orion Hodson, and Antony Rowstron. 2013. Scale-up vs scale-out for hadoop: Time to rethink?. In Proc. SOCC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nathan Beckmann and Daniel Sanchez. 2013. Jigsaw: Scalable software-defined caches. In Proc. PACT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark Hill, and David Wood. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cheng-Tao Chu, Sang K Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Kunle Olukotun, and Andrew Y Ng. 2007. Map-reduce for machine learning on multicore. In Advances in neural information processing systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Paolo Costa, Austin Donnelly, Antony Rowstron, and Greg O'Shea. 2012. Camdoop: Exploiting in-network aggregation for big data applications. In Proc. NSDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proc. OSDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Karthi Duraisamy, Ryan Gary Kim, Wonje Choi, Guangshuo Liu, Partha Pratim Pande, Radu Marculescu, and Diana Marculescu. 2015. Energy efficient MapReduce with VFI-enabled multicore platforms. In Proc. DAC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Wenbin Fang, Bingsheng He, Qiong Luo, and N.K. Govindaraju. 2011. Mars: Accelerating MapReduce with Graphics Processors. TPDS (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In Proc. PACT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Project Gutenberg. https:/www.gutenberg.org. Accessed: 2017-11-05.Google ScholarGoogle Scholar
  14. T. J. Ham, L. Wu, N. Sundaram, N. Satish, and M. Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proc. MICRO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Hayes, O. Palomar, O. Unsal, A. Cristal, and M. Valero. 2016. Future Vector Microprocessor Extensions for Data Aggregations. In Proc. ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Kachris, G. Sirakoulis, and D. Soudris. 2014. A Reconfigurable MapReduce accelerator for multi-core all-programmable SoCs. In Proc. ISSOC.Google ScholarGoogle Scholar
  17. Andreas Klein. 2013. Stream ciphers. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the walkers: Accelerating index traversals for in-memory databases. In Proc. MICRO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mian Lu, Yun Liang, Huynh Phung Huynh, Zhongliang Ong, Bingsheng He, and R.S.M. Goh. 2015. MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors. Parallel and Distributed Systems, IEEE Transactions on 26, 11 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Metreveli, N. Zeldovich, and M. F. Kaashoek. 2012. Cphash: A cache-partitioned hash table. In Proc. ACM SIGPLAN. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. K. Mishra, E. Nurvitadhi, G. Venkatesh, J. Pearce, and D. Marr. 2017. Finegrained accelerators for sparse machine learning workloads. In Proc. ASP-DAC.Google ScholarGoogle Scholar
  22. Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, and Karthikeyan Sankaralingam. 2017. Stream-dataflow acceleration. In Proc. ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. 2007. Evaluating MapReduce for multi-core and multiprocessor systems. In Proc. HPCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yi Shan, Bo Wang, Jing Yan, Yu Wang, Ningyi Xu, and Huazhong Yang. 2010. FPMR: MapReduce framework on FPGA. In Proc. FPGA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Justin Talbot, Richard M. Yoo, and Christos Kozyrakis. 2011. Phoenix++: Modular MapReduce for Shared-memory Systems. In Proc. MapReduce. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tom White. 2012. Hadoop: The definitive guide. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ahmad Yasin. 2014. A top-down method for performance analysis and counters architecture. In Proc. ISPASS,.Google ScholarGoogle ScholarCross RefCross Ref
  28. Richard M. Yoo, Anthony Romano, and Christos Kozyrakis. 2009. Phoenix Rebirth: Scalable MapReduce on a Large-scale Shared-memory System. In Proc. IISWC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Taekyung Yoo, Minsub Yim, Ilgyun Jeong, Yunsu Lee, and Seung-Tae Chun. 2016. Performance evaluation of in-memory computing on scale-up and scale-out cluster. In Proc. ICUFN.Google ScholarGoogle Scholar
  30. Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zhi-Lin Zhao, Chang-Dong Wang, Yuan-Yu Wan, Zi-Wei Huang, and Jian-Huang Lai. 2015. Pipeline item-based collaborative filtering based on MapReduce. In Proc. BDCloud. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference
    January 2019
    794 pages
    ISBN:9781450360074
    DOI:10.1145/3287624

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 21 January 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate466of1,454submissions,32%

    Upcoming Conference

    ASPDAC '25

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader