Abstract
The rapid growth in data yields challenges to process data efficiently using current high-performance server architectures such as big Xeon cores. Furthermore, physical design constraints, such as power and density, have become the dominant limiting factor for scaling out servers. Low-power embedded cores in servers such as little Atom have emerged as a promising solution to enhance energy-efficiency to address these challenges. Therefore, the question of whether to process the big data applications on big Xeon- or Little Atom-based servers becomes important. In this work, through methodical investigation of power and performance measurements, and comprehensive application-level, system-level, and micro-architectural level analysis, we characterize dominant big data applications on big Xeon- and little Atom-based server architectures. The characterization results across a wide range of real-world big data applications, and various software stacks demonstrate how the choice of big- versus little-core-based server for energy-efficiency is significantly influenced by the size of data, performance constraints, and presence of accelerator. In addition, we analyze processor resource utilization of this important class of applications, such as memory footprints, CPU utilization, and disk bandwidth, to understand their run-time behavior. Furthermore, we perform micro-architecture-level analysis to highlight where improvement is needed in big- and little-core microarchitectures to address their performance bottlenecks.
- Accelerating Hadoop* applications using Intel QuickAssist tech. 2013.Google Scholar
- D. G. Andersen et al. 2009. FAWN: A fast array of wimpy nodes. In the Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). Google ScholarDigital Library
- R. Appuswamy et al. 2013. Scale-up vs. scale-out for Hadoop: Time to rethink? In Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 20. Google ScholarDigital Library
- T. G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A database benchmark based on the Facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 1185--1196. Google ScholarDigital Library
- Nidhi Arora, Kiran Chandramohan, Nagaraju Pothineni, and Anshul Kumar. 2010. Instruction selection in asip synthesis using functional matching. In 23rd International Conference on VLSI Design, 2010 (VLSID'10). IEEE, 146--151. Google ScholarDigital Library
- Arora Manish et al. 2012. Redefining the role of the CPU in the era of CPU-GPU integration. IEEE Micro. 32, 6 (2012), 4--16. Google ScholarDigital Library
- M. Arnold et al. 2001. Designing domain-specific processors. In Proceedings of the 9th International Conference on Codesign and System Synthesis (CODES’01). ACM. Google ScholarDigital Library
- L. A. Barroso et al. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synth. Lect. Comput. Architect. 8, 3 (2013), 1--154. Google ScholarDigital Library
- Chaitanya Baru, Milind Bhandarkar, Raghunath Nambiar, Meikel Poess, and Tilmann Rabl. 2012. Setting the direction for big data benchmark standards. In Technology Conference on Performance Evaluation and Benchmarking. Springer, Berlin, Heidelberg, 197--208.Google Scholar
- E. Blem et al. 2013. Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures. In Proceedings of the Conference on High-Performance Computer Architecture (HPCA’13). Google ScholarDigital Library
- Collaborative Filtering. Retrieved 2017 from http://archive.cloudera.com/cdh5/cdh/5/mahout/mahout-core/org/apache/mahout/cf/taste/hadoop/item/package-tree.html.Google Scholar
- M. Dimitrov et al. 2013. Memory system characterization of big data workloads. Proceedings of the IEEE International Conference on Big Data.Google ScholarCross Ref
- Dstat. Retrieved 2018 from http://lintut.com/dstat-linux-monitoring-tools/.Google Scholar
- M. Ferdman et al. 2012. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the ACM SIGPLAN Conference. Google ScholarDigital Library
- Frequent Itemset Mining Dataset Repository. Retrieved 2017 from http://fimi.ua.ac.be/data/.Google Scholar
- W. Gao et al. 2013. Bigdatabench: A big data benchmark suite from web search engines. In Proceedings of the Conference on Architectures and Systems for Big Data (ASBD’13) in Conjunction with the International Symposium on Computer Architecture (ISCA’13).Google Scholar
- A. Ghazal et al. 2013. Bigbench: Towards an industry standard benchmark for big data analytics. In Proceedings of the ACM SIGMOD Conference. Google ScholarDigital Library
- A. Gutierrez et al. 2014. Integrated 3D-stacked server designs for increasing physical density of key-value stores. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). Google ScholarDigital Library
- Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011), 6--15. Google ScholarDigital Library
- H. Homayoun et al. 2012. Dynamically heterogeneous cores through 3D resource pooling. In Proceedings of the Conference on High-Performance Computer Architecture (HPCA’12). Google ScholarDigital Library
- S. Huang et al. 2010. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In Proceedings of the 26th International Conference on Data Engineering Workshops (ICDEW’10).Google ScholarCross Ref
- K. Hwang et al. 2016. Cloud performance modeling with benchmark evaluation of elastic scaling strategies. IEEE Trans. Parall. Distrib. Syst. 27, 1 (2016), 130--143. Google ScholarDigital Library
- Intel VTune Amplifier XE Performance Profiler. 2015. Retrieved from http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.Google Scholar
- Z. Jia et al. 2014. Characterizing and subsetting big data workloads. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’14).Google ScholarCross Ref
- Z. Jia et al. 2017. Understanding big data analytics workloads on modern processors. IEEE Trans. Parall. Distrib. Syst. 28, 6 (2017), 1797--1810. Google ScholarDigital Library
- H. Johann et al. 2015. Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers. ACM SIGPLAN Notices 50, 4, 223--238. ACM, 2015. Google ScholarDigital Library
- R. T. Kaushik et al. 2010. Greenhdfs: Towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster. Proceedings of the USENIX Annual Technical Conference.Google Scholar
- T. Khavari et al. 2014. Energy-efficient mapping of biomedical applications on domain-specific accelerator under process variation. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’14). Google ScholarDigital Library
- V. Kontorinis et al. 2012. Managing distributed UPS energy for effective power capping in data centers. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA’12). Google ScholarDigital Library
- V. Kontorinis et al. 2014. Enabling dynamic heterogeneity through core-on-core stacking. In Proceedings of the the 51st Annual Design Automation Conference (DAC’14). Google ScholarDigital Library
- K. R. Krish, Ali Anwar, and Ali R. Butt. 2014. {phi} Sched: A heterogeneity aware Hadoop workflow scheduler. In Proceedings of the International Symposium on Modeling Analysis and Simulation of Telecommunication Systems (MASCOTS’14). Google ScholarDigital Library
- James T. Kukunas, V. Gopal, J. Guilford, S. Gulley, A. van de Ven, and W. Feghali. 2014. High performance ZLIB compression on Intel® architecture processors. White paper.Google Scholar
- A. Li et al. 2010. CloudCmp: Comparing public cloud providers. Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. Google ScholarDigital Library
- Li Haoyuan et al. 2008. PFP: Parallel FP-growth for query recommendation. Proceedings of the ACM Conference on RecOmmender Systems. Google ScholarDigital Library
- T. Li et al. 2009. Fast enumeration of maximal valid subgraphs for custom instruction identification. Proceedings of the Conference on Automation Science and Engineering (CASES’09). Google ScholarDigital Library
- F. Liang et al. 2014. Performance characterization of Hadoop and data MPI based on Amdahl's second law. In Proceedings of the 9th IEEE International Conference on Networking, Architecture, and Storage (NAS’14). 207--215. Google ScholarDigital Library
- K. Lim et al. 2008. Understanding and designing new server architectures for emerging warehouse-computing environments. ACM SIGARCH Comput. Architect. News 36, 3, 315--326. Google ScholarDigital Library
- Z. Lin and P. Chow. 2013. Zcluster: A zynq-based Hadoop cluster. In Proceedings of the International Conference on Field Programmable Technology (FPT’13). 450--453.Google Scholar
- Luo Chunjie et al. 2012. Cloudrank-d: Benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6, 4 (2012), 347--362.Google ScholarCross Ref
- Mahout: Scalable machine-learning and data-mining library. Retrieved 2017 from http://mahout.apache.org/.Google Scholar
- M. Malik and H. Homayoun. 2015. Big data on low power cores: Are low power embedded processors a good fit for the big data workloads? In Proceedings of the 33rd IEEE International Conference on Computer Design (ICCD’15). 379--382. Google ScholarDigital Library
- M. Malik et al. 2015a. System and architecture level characterization of big data applications on big and little core server architectures. In Proceedings of the IEEE International Conference on Big Data. 85--94. Google ScholarDigital Library
- M. Malik et al. 2015b. Characterizing Hadoop applications on microservers for performance and energy efficiency optimizations. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’16). 153--154.Google Scholar
- S. Nilakantan et al. 2013. Platform-independent analysis of function-level communication in workloads. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’13).Google ScholarCross Ref
- K. Ousterhout et al. 2015. Making sense of performance in data analytics frameworks. In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI’15). 293--307. Google ScholarDigital Library
- V. J. Reddi et al. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In Proceedings of the ACM SIGPLAN Conference. Google ScholarDigital Library
- H. Sayadi, D. Pathak, I. Savidis, and H. Homayoun. 2018. Power conversion efficiency-aware mapping of multithreaded applications on heterogeneous architectures: A comprehensive parameter tuning. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC’18). IEEE, 70--75. Google ScholarDigital Library
- Y. Shan et al. 2010. FPMR: Mapreduce framework on FPGA. In Proceedings of the Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’10). Google ScholarDigital Library
- SPMF. Retrieved 2016 from http://www.philippe-fournier-viger.com/spmf/index.php.Google Scholar
- T. Honjo and K. Oikawa. 2013. Hardware acceleration of Hadoop mapreduce. In Proceedings of the IEEE International Conference on Big Data. 118--124.Google Scholar
- T. K. Prakash et al. 2008. Performance characterization of SPEC CPU2006 Benchmarks on Intel Core 2 Duo Processor. In Proceedings of the International Seminar on Aerospace Science and Technology (ISAST’08).Google Scholar
- J. Veiga et al. 2016. Performance evaluation of big data frameworks for large-scale data analytics. In Proceedings of the IEEE International Conference on Big Data.Google ScholarCross Ref
- L. Wang et al. 2014. Bigdatabench: A big data benchmark suite from internet services. In Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE.Google ScholarCross Ref
- WattsUpPro power meter. 2015. Retrieved from https://www.wattsupmeters.com/secure/index.Google Scholar
- T. L. Willke et al. 2012. Graphbuilder—A scalable graph construction library for Apache Hadoop. In Big Learning Workshop on Neural Information Processing Systems (NIPS’12).Google Scholar
- Xi Luo, Walid Najjar, and Vagelis Hristidis. 2013. Efficient near-duplicate document detection using FPGAs. In 2013 IEEE International Conference on Big Data. IEEE, 54--61.Google Scholar
- P. Yu et al. 2007. Disjoint pattern enumeration for custom instructions identification. In Proceedings of the Conference on Field Programmable Logic (FPL’07). 273--278.Google ScholarCross Ref
- P. Yu et al. 2004. Scalable custom instructions identification for instruction-set extensible processors. In Proceedings of the Conference on Automation Science and Engineering (CASES’04). Google ScholarDigital Library
Index Terms
- System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures
Recommendations
A comprehensive memory analysis of data intensive workloads on server class architecture
MEMSYS '18: Proceedings of the International Symposium on Memory SystemsThe emergence of data analytics frameworks requires computational resources and memory subsystems that can naturally scale to manage massive amounts of diverse data. Given the large size and heterogeneity of the data, it is currently unclear whether ...
Heterogeneous chip multiprocessor architectures for big data applications
CF '16: Proceedings of the ACM International Conference on Computing FrontiersEmerging big data analytics applications require a significant amount of server computational power. The costs of building and running a computing server to process big data and the capacity to which we can scale it are driven in large part by those ...
System and architecture level characterization of big data applications on big and little core server architectures
BIG DATA '15: Proceedings of the 2015 IEEE International Conference on Big Data (Big Data)Emerging Big Data applications require a significant amount of server computational power. Big data analytics applications rely heavily on specific deep machine learning and data mining algorithms, and exhibit high computational intensity, memory ...
Comments