research-article

System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures

Authors:
Maria Malik

George Mason University

George Mason University

0000-0001-8425-2501
View Profile

,
Setareh Rafatirad

George Mason University

George Mason University
View Profile

,
Houman Homayoun

George Mason University

George Mason University
View Profile

ACM Transactions on Modeling and Performance Evaluation of Computing Systems Volume 3 Issue 3Article No.: 14pp 1–32https://doi.org/10.1145/3229049

Published:23 July 2018Publication History

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Abstract

The rapid growth in data yields challenges to process data efficiently using current high-performance server architectures such as big Xeon cores. Furthermore, physical design constraints, such as power and density, have become the dominant limiting factor for scaling out servers. Low-power embedded cores in servers such as little Atom have emerged as a promising solution to enhance energy-efficiency to address these challenges. Therefore, the question of whether to process the big data applications on big Xeon- or Little Atom-based servers becomes important. In this work, through methodical investigation of power and performance measurements, and comprehensive application-level, system-level, and micro-architectural level analysis, we characterize dominant big data applications on big Xeon- and little Atom-based server architectures. The characterization results across a wide range of real-world big data applications, and various software stacks demonstrate how the choice of big- versus little-core-based server for energy-efficiency is significantly influenced by the size of data, performance constraints, and presence of accelerator. In addition, we analyze processor resource utilization of this important class of applications, such as memory footprints, CPU utilization, and disk bandwidth, to understand their run-time behavior. Furthermore, we perform micro-architecture-level analysis to highlight where improvement is needed in big- and little-core microarchitectures to address their performance bottlenecks.

References

Accelerating Hadoop* applications using Intel QuickAssist tech. 2013.Google Scholar
D. G. Andersen et al. 2009. FAWN: A fast array of wimpy nodes. In the Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). Google ScholarDigital Library
R. Appuswamy et al. 2013. Scale-up vs. scale-out for Hadoop: Time to rethink? In Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 20. Google ScholarDigital Library
T. G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A database benchmark based on the Facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 1185--1196. Google ScholarDigital Library
Nidhi Arora, Kiran Chandramohan, Nagaraju Pothineni, and Anshul Kumar. 2010. Instruction selection in asip synthesis using functional matching. In 23rd International Conference on VLSI Design, 2010 (VLSID'10). IEEE, 146--151. Google ScholarDigital Library
Arora Manish et al. 2012. Redefining the role of the CPU in the era of CPU-GPU integration. IEEE Micro. 32, 6 (2012), 4--16. Google ScholarDigital Library
M. Arnold et al. 2001. Designing domain-specific processors. In Proceedings of the 9th International Conference on Codesign and System Synthesis (CODES’01). ACM. Google ScholarDigital Library
L. A. Barroso et al. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synth. Lect. Comput. Architect. 8, 3 (2013), 1--154. Google ScholarDigital Library
Chaitanya Baru, Milind Bhandarkar, Raghunath Nambiar, Meikel Poess, and Tilmann Rabl. 2012. Setting the direction for big data benchmark standards. In Technology Conference on Performance Evaluation and Benchmarking. Springer, Berlin, Heidelberg, 197--208.Google Scholar
E. Blem et al. 2013. Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures. In Proceedings of the Conference on High-Performance Computer Architecture (HPCA’13). Google ScholarDigital Library
Collaborative Filtering. Retrieved 2017 from http://archive.cloudera.com/cdh5/cdh/5/mahout/mahout-core/org/apache/mahout/cf/taste/hadoop/item/package-tree.html.Google Scholar
M. Dimitrov et al. 2013. Memory system characterization of big data workloads. Proceedings of the IEEE International Conference on Big Data.Google ScholarCross Ref
Dstat. Retrieved 2018 from http://lintut.com/dstat-linux-monitoring-tools/.Google Scholar
M. Ferdman et al. 2012. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the ACM SIGPLAN Conference. Google ScholarDigital Library
Frequent Itemset Mining Dataset Repository. Retrieved 2017 from http://fimi.ua.ac.be/data/.Google Scholar
W. Gao et al. 2013. Bigdatabench: A big data benchmark suite from web search engines. In Proceedings of the Conference on Architectures and Systems for Big Data (ASBD’13) in Conjunction with the International Symposium on Computer Architecture (ISCA’13).Google Scholar
A. Ghazal et al. 2013. Bigbench: Towards an industry standard benchmark for big data analytics. In Proceedings of the ACM SIGMOD Conference. Google ScholarDigital Library
A. Gutierrez et al. 2014. Integrated 3D-stacked server designs for increasing physical density of key-value stores. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). Google ScholarDigital Library
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011), 6--15. Google ScholarDigital Library
H. Homayoun et al. 2012. Dynamically heterogeneous cores through 3D resource pooling. In Proceedings of the Conference on High-Performance Computer Architecture (HPCA’12). Google ScholarDigital Library
S. Huang et al. 2010. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In Proceedings of the 26th International Conference on Data Engineering Workshops (ICDEW’10).Google ScholarCross Ref
K. Hwang et al. 2016. Cloud performance modeling with benchmark evaluation of elastic scaling strategies. IEEE Trans. Parall. Distrib. Syst. 27, 1 (2016), 130--143. Google ScholarDigital Library
Intel VTune Amplifier XE Performance Profiler. 2015. Retrieved from http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.Google Scholar
Z. Jia et al. 2014. Characterizing and subsetting big data workloads. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’14).Google ScholarCross Ref
Z. Jia et al. 2017. Understanding big data analytics workloads on modern processors. IEEE Trans. Parall. Distrib. Syst. 28, 6 (2017), 1797--1810. Google ScholarDigital Library
H. Johann et al. 2015. Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers. ACM SIGPLAN Notices 50, 4, 223--238. ACM, 2015. Google ScholarDigital Library
R. T. Kaushik et al. 2010. Greenhdfs: Towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster. Proceedings of the USENIX Annual Technical Conference.Google Scholar
T. Khavari et al. 2014. Energy-efficient mapping of biomedical applications on domain-specific accelerator under process variation. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’14). Google ScholarDigital Library
V. Kontorinis et al. 2012. Managing distributed UPS energy for effective power capping in data centers. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA’12). Google ScholarDigital Library
V. Kontorinis et al. 2014. Enabling dynamic heterogeneity through core-on-core stacking. In Proceedings of the the 51st Annual Design Automation Conference (DAC’14). Google ScholarDigital Library
K. R. Krish, Ali Anwar, and Ali R. Butt. 2014. {phi} Sched: A heterogeneity aware Hadoop workflow scheduler. In Proceedings of the International Symposium on Modeling Analysis and Simulation of Telecommunication Systems (MASCOTS’14). Google ScholarDigital Library
James T. Kukunas, V. Gopal, J. Guilford, S. Gulley, A. van de Ven, and W. Feghali. 2014. High performance ZLIB compression on Intel® architecture processors. White paper.Google Scholar
A. Li et al. 2010. CloudCmp: Comparing public cloud providers. Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. Google ScholarDigital Library
Li Haoyuan et al. 2008. PFP: Parallel FP-growth for query recommendation. Proceedings of the ACM Conference on RecOmmender Systems. Google ScholarDigital Library
T. Li et al. 2009. Fast enumeration of maximal valid subgraphs for custom instruction identification. Proceedings of the Conference on Automation Science and Engineering (CASES’09). Google ScholarDigital Library
F. Liang et al. 2014. Performance characterization of Hadoop and data MPI based on Amdahl's second law. In Proceedings of the 9th IEEE International Conference on Networking, Architecture, and Storage (NAS’14). 207--215. Google ScholarDigital Library
K. Lim et al. 2008. Understanding and designing new server architectures for emerging warehouse-computing environments. ACM SIGARCH Comput. Architect. News 36, 3, 315--326. Google ScholarDigital Library
Z. Lin and P. Chow. 2013. Zcluster: A zynq-based Hadoop cluster. In Proceedings of the International Conference on Field Programmable Technology (FPT’13). 450--453.Google Scholar
Luo Chunjie et al. 2012. Cloudrank-d: Benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6, 4 (2012), 347--362.Google ScholarCross Ref
Mahout: Scalable machine-learning and data-mining library. Retrieved 2017 from http://mahout.apache.org/.Google Scholar
M. Malik and H. Homayoun. 2015. Big data on low power cores: Are low power embedded processors a good fit for the big data workloads? In Proceedings of the 33rd IEEE International Conference on Computer Design (ICCD’15). 379--382. Google ScholarDigital Library
M. Malik et al. 2015a. System and architecture level characterization of big data applications on big and little core server architectures. In Proceedings of the IEEE International Conference on Big Data. 85--94. Google ScholarDigital Library
M. Malik et al. 2015b. Characterizing Hadoop applications on microservers for performance and energy efficiency optimizations. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’16). 153--154.Google Scholar
S. Nilakantan et al. 2013. Platform-independent analysis of function-level communication in workloads. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’13).Google ScholarCross Ref
K. Ousterhout et al. 2015. Making sense of performance in data analytics frameworks. In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI’15). 293--307. Google ScholarDigital Library
V. J. Reddi et al. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In Proceedings of the ACM SIGPLAN Conference. Google ScholarDigital Library
H. Sayadi, D. Pathak, I. Savidis, and H. Homayoun. 2018. Power conversion efficiency-aware mapping of multithreaded applications on heterogeneous architectures: A comprehensive parameter tuning. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC’18). IEEE, 70--75. Google ScholarDigital Library
Y. Shan et al. 2010. FPMR: Mapreduce framework on FPGA. In Proceedings of the Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’10). Google ScholarDigital Library
SPMF. Retrieved 2016 from http://www.philippe-fournier-viger.com/spmf/index.php.Google Scholar
T. Honjo and K. Oikawa. 2013. Hardware acceleration of Hadoop mapreduce. In Proceedings of the IEEE International Conference on Big Data. 118--124.Google Scholar
T. K. Prakash et al. 2008. Performance characterization of SPEC CPU2006 Benchmarks on Intel Core 2 Duo Processor. In Proceedings of the International Seminar on Aerospace Science and Technology (ISAST’08).Google Scholar
J. Veiga et al. 2016. Performance evaluation of big data frameworks for large-scale data analytics. In Proceedings of the IEEE International Conference on Big Data.Google ScholarCross Ref
L. Wang et al. 2014. Bigdatabench: A big data benchmark suite from internet services. In Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE.Google ScholarCross Ref
WattsUpPro power meter. 2015. Retrieved from https://www.wattsupmeters.com/secure/index.Google Scholar
T. L. Willke et al. 2012. Graphbuilder—A scalable graph construction library for Apache Hadoop. In Big Learning Workshop on Neural Information Processing Systems (NIPS’12).Google Scholar
Xi Luo, Walid Najjar, and Vagelis Hristidis. 2013. Efficient near-duplicate document detection using FPGAs. In 2013 IEEE International Conference on Big Data. IEEE, 54--61.Google Scholar
P. Yu et al. 2007. Disjoint pattern enumeration for custom instructions identification. In Proceedings of the Conference on Field Programmable Logic (FPL’07). 273--278.Google ScholarCross Ref
P. Yu et al. 2004. Scalable custom instructions identification for instruction-set extensible processors. In Proceedings of the Conference on Automation Science and Engineering (CASES’04). Google ScholarDigital Library

Index Terms

System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

A comprehensive memory analysis of data intensive workloads on server class architecture
MEMSYS '18: Proceedings of the International Symposium on Memory Systems

The emergence of data analytics frameworks requires computational resources and memory subsystems that can naturally scale to manage massive amounts of diverse data. Given the large size and heterogeneity of the data, it is currently unclear whether ...
Read More
Heterogeneous chip multiprocessor architectures for big data applications
CF '16: Proceedings of the ACM International Conference on Computing Frontiers

Emerging big data analytics applications require a significant amount of server computational power. The costs of building and running a computing server to process big data and the capacity to which we can scale it are driven in large part by those ...
Read More
System and architecture level characterization of big data applications on big and little core server architectures
BIG DATA '15: Proceedings of the 2015 IEEE International Conference on Big Data (Big Data)

Emerging Big Data applications require a significant amount of server computational power. Big data analytics applications rely heavily on specific deep machine learning and data mining algorithms, and exhibit high computational intensity, memory ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Modeling and Performance Evaluation of Computing Systems Volume 3, Issue 3
September 2018
138 pages
ISSN:2376-3639
EISSN:2376-3647
DOI:10.1145/3232716
Editors:
Sem Borst
Nokia Bell Labs / Eindhoven University of Technology, Netherlands
,
Carey Williamson
University of Calgary, Canada
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2018
- Accepted: 1 May 2018
- Revised: 1 March 2018
- Received: 1 November 2017
Published in tompecs Volume 3, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Performance
accelerator
big data
characterization
high-performance server
low-power server
power
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 298
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures

ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

A comprehensive memory analysis of data intensive workloads on server class architecture

Heterogeneous chip multiprocessor architectures for big data applications

System and architecture level characterization of big data applications on big and little core server architectures