research-article

Collaborative accelerators for in-memory MapReduce on scale-up machines

Authors:
Abraham Addisie

University of Michigan

University of Michigan
View Profile

,
Valeria Bertacco

University of Michigan

University of Michigan
View Profile

ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation ConferenceJanuary 2019Pages 747–753https://doi.org/10.1145/3287624.3287636

Published:21 January 2019Publication History

ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

Pages 747–753

ABSTRACT

Relying on efficient data analytics platforms is increasingly becoming crucial for both small and large scale datasets. While MapReduce implementations, such as Hadoop and Spark, were originally proposed for petascale processing in scale-out clusters, it has been noted that, today, most data centers processes operate on gigabyte-order or smaller datasets, which are best processed in single high-end scale-up machines. In this context, Phoenix++ is a highly optimized MapReduce framework available for chip-multiprocessor (CMP) scale-up machines. In this paper we observe that Phoenix++ suffers from an inefficient utilization of the memory subsystem, and a serialized execution of the MapReduce stages. To overcome these inefficiencies, we propose CASM, an architecture that equips each core in a CMP design with a dedicated instance of a specialized hardware unit (the CASM accelerators). These units collaborate to manage the key-value data structure and minimize both on- and off-chip communication costs. Our experimental evaluation on a 64-core design indicates that CASM provides more than a 4x speedup over the highly optimized Phoenix++ framework, while keeping area overhead at only 6%, and reducing energy demands by over 3.5x.

References

Abraham Addisie, Hiwot Kassa, Opeoluwa Matthews, and Valeria Bertacco. 2018. Heterogeneous Memory Subsystem for Natural Graph Analytics. In Proc. IISWC.Google ScholarCross Ref
Shaizeen Aga, Supreet Jeloka, Arun Subramaniyan, Satish Narayanasamy, David Blaauw, and Reetuparna Das. 2017. Compute caches. In Proc. HPCA.Google ScholarCross Ref
Faraz Ahmad, Seyong Lee, Mithuna Thottethodi, and TN Vijaykumar. 2012. Puma: Purdue mapreduce benchmarks suite. (2012).Google Scholar
Raja Appuswamy, Christos Gkantsidis, Dushyanth Narayanan, Orion Hodson, and Antony Rowstron. 2013. Scale-up vs scale-out for hadoop: Time to rethink?. In Proc. SOCC. Google ScholarDigital Library
Nathan Beckmann and Daniel Sanchez. 2013. Jigsaw: Scalable software-defined caches. In Proc. PACT. Google ScholarDigital Library
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark Hill, and David Wood. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News (2011). Google ScholarDigital Library
Cheng-Tao Chu, Sang K Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Kunle Olukotun, and Andrew Y Ng. 2007. Map-reduce for machine learning on multicore. In Advances in neural information processing systems. Google ScholarDigital Library
Paolo Costa, Austin Donnelly, Antony Rowstron, and Greg O'Shea. 2012. Camdoop: Exploiting in-network aggregation for big data applications. In Proc. NSDI. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In Proc. OSDI. Google ScholarDigital Library
Karthi Duraisamy, Ryan Gary Kim, Wonje Choi, Guangshuo Liu, Partha Pratim Pande, Radu Marculescu, and Diana Marculescu. 2015. Energy efficient MapReduce with VFI-enabled multicore platforms. In Proc. DAC. Google ScholarDigital Library
Wenbin Fang, Bingsheng He, Qiong Luo, and N.K. Govindaraju. 2011. Mars: Accelerating MapReduce with Graphics Processors. TPDS (2011). Google ScholarDigital Library
Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical near-data processing for in-memory analytics frameworks. In Proc. PACT. Google ScholarDigital Library
Project Gutenberg. https:/www.gutenberg.org. Accessed: 2017-11-05.Google Scholar
T. J. Ham, L. Wu, N. Sundaram, N. Satish, and M. Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proc. MICRO. Google ScholarDigital Library
T. Hayes, O. Palomar, O. Unsal, A. Cristal, and M. Valero. 2016. Future Vector Microprocessor Extensions for Data Aggregations. In Proc. ISCA. Google ScholarDigital Library
C. Kachris, G. Sirakoulis, and D. Soudris. 2014. A Reconfigurable MapReduce accelerator for multi-core all-programmable SoCs. In Proc. ISSOC.Google Scholar
Andreas Klein. 2013. Stream ciphers. Springer. Google ScholarDigital Library
Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the walkers: Accelerating index traversals for in-memory databases. In Proc. MICRO. Google ScholarDigital Library
Mian Lu, Yun Liang, Huynh Phung Huynh, Zhongliang Ong, Bingsheng He, and R.S.M. Goh. 2015. MrPhi: An Optimized MapReduce Framework on Intel Xeon Phi Coprocessors. Parallel and Distributed Systems, IEEE Transactions on 26, 11 (2015). Google ScholarDigital Library
Z. Metreveli, N. Zeldovich, and M. F. Kaashoek. 2012. Cphash: A cache-partitioned hash table. In Proc. ACM SIGPLAN. Google ScholarDigital Library
A. K. Mishra, E. Nurvitadhi, G. Venkatesh, J. Pearce, and D. Marr. 2017. Finegrained accelerators for sparse machine learning workloads. In Proc. ASP-DAC.Google Scholar
Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, and Karthikeyan Sankaralingam. 2017. Stream-dataflow acceleration. In Proc. ISCA. Google ScholarDigital Library
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. 2007. Evaluating MapReduce for multi-core and multiprocessor systems. In Proc. HPCA. Google ScholarDigital Library
Yi Shan, Bo Wang, Jing Yan, Yu Wang, Ningyi Xu, and Huazhong Yang. 2010. FPMR: MapReduce framework on FPGA. In Proc. FPGA. Google ScholarDigital Library
Justin Talbot, Richard M. Yoo, and Christos Kozyrakis. 2011. Phoenix++: Modular MapReduce for Shared-memory Systems. In Proc. MapReduce. Google ScholarDigital Library
Tom White. 2012. Hadoop: The definitive guide. Google ScholarDigital Library
Ahmad Yasin. 2014. A top-down method for performance analysis and counters architecture. In Proc. ISPASS,.Google ScholarCross Ref
Richard M. Yoo, Anthony Romano, and Christos Kozyrakis. 2009. Phoenix Rebirth: Scalable MapReduce on a Large-scale Shared-memory System. In Proc. IISWC. Google ScholarDigital Library
Taekyung Yoo, Minsub Yim, Ilgyun Jeong, Yunsu Lee, and Seung-Tae Chun. 2016. Performance evaluation of in-memory computing on scale-up and scale-out cluster. In Proc. ICUFN.Google Scholar
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud (2010). Google ScholarDigital Library
Zhi-Lin Zhao, Chang-Dong Wang, Yuan-Yu Wan, Zi-Wei Huang, and Jian-Huang Lai. 2015. Pipeline item-based collaborative filtering based on MapReduce. In Proc. BDCloud. Google ScholarDigital Library

Recommendations

Petascale computing with accelerators
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

A trend is developing in high performance computing in which commodity processors are coupled to various types of computational accelerators. Such systems are commonly called hybrid systems. In this paper, we describe our experience developing an ...
Read More
Towards achieving performance portability using directives for accelerators
WACCPD '16: Proceedings of the Third International Workshop on Accelerator Programming Using Directives

In this paper we explore the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architectures with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine ...
Read More
Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators

In recent years, image processing has been a key application area for mobile and embedded computing platforms. In this context, many-core accelerators are a viable solution to efficiently execute highly parallel kernels. However, architectural ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference
January 2019
794 pages
ISBN:9781450360074
DOI:10.1145/3287624
General Chair:
Toshiyuki Shibuya
Fujitsu Laboratories
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 January 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate466of1,454submissions,32%
Upcoming Conference
ASPDAC '25

Sponsor:

sigda

30th Asia and South Pacific Design Automation Conference

January 20 - 23, 2025

Tokyo , Japan
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 134
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Collaborative accelerators for in-memory MapReduce on scale-up machines

ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

Petascale computing with accelerators

Towards achieving performance portability using directives for accelerators

Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Collaborative accelerators for in-memory MapReduce on scale-up machines

ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

Petascale computing with accelerators

Towards achieving performance portability using directives for accelerators

Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media