skip to main content
research-article

System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures

Published:23 July 2018Publication History
Skip Abstract Section

Abstract

The rapid growth in data yields challenges to process data efficiently using current high-performance server architectures such as big Xeon cores. Furthermore, physical design constraints, such as power and density, have become the dominant limiting factor for scaling out servers. Low-power embedded cores in servers such as little Atom have emerged as a promising solution to enhance energy-efficiency to address these challenges. Therefore, the question of whether to process the big data applications on big Xeon- or Little Atom-based servers becomes important. In this work, through methodical investigation of power and performance measurements, and comprehensive application-level, system-level, and micro-architectural level analysis, we characterize dominant big data applications on big Xeon- and little Atom-based server architectures. The characterization results across a wide range of real-world big data applications, and various software stacks demonstrate how the choice of big- versus little-core-based server for energy-efficiency is significantly influenced by the size of data, performance constraints, and presence of accelerator. In addition, we analyze processor resource utilization of this important class of applications, such as memory footprints, CPU utilization, and disk bandwidth, to understand their run-time behavior. Furthermore, we perform micro-architecture-level analysis to highlight where improvement is needed in big- and little-core microarchitectures to address their performance bottlenecks.

References

  1. Accelerating Hadoop* applications using Intel QuickAssist tech. 2013.Google ScholarGoogle Scholar
  2. D. G. Andersen et al. 2009. FAWN: A fast array of wimpy nodes. In the Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Appuswamy et al. 2013. Scale-up vs. scale-out for Hadoop: Time to rethink? In Proceedings of the 4th annual Symposium on Cloud Computing. ACM, 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A database benchmark based on the Facebook social graph. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 1185--1196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nidhi Arora, Kiran Chandramohan, Nagaraju Pothineni, and Anshul Kumar. 2010. Instruction selection in asip synthesis using functional matching. In 23rd International Conference on VLSI Design, 2010 (VLSID'10). IEEE, 146--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arora Manish et al. 2012. Redefining the role of the CPU in the era of CPU-GPU integration. IEEE Micro. 32, 6 (2012), 4--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Arnold et al. 2001. Designing domain-specific processors. In Proceedings of the 9th International Conference on Codesign and System Synthesis (CODES’01). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. A. Barroso et al. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synth. Lect. Comput. Architect. 8, 3 (2013), 1--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chaitanya Baru, Milind Bhandarkar, Raghunath Nambiar, Meikel Poess, and Tilmann Rabl. 2012. Setting the direction for big data benchmark standards. In Technology Conference on Performance Evaluation and Benchmarking. Springer, Berlin, Heidelberg, 197--208.Google ScholarGoogle Scholar
  10. E. Blem et al. 2013. Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures. In Proceedings of the Conference on High-Performance Computer Architecture (HPCA’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Collaborative Filtering. Retrieved 2017 from http://archive.cloudera.com/cdh5/cdh/5/mahout/mahout-core/org/apache/mahout/cf/taste/hadoop/item/package-tree.html.Google ScholarGoogle Scholar
  12. M. Dimitrov et al. 2013. Memory system characterization of big data workloads. Proceedings of the IEEE International Conference on Big Data.Google ScholarGoogle ScholarCross RefCross Ref
  13. Dstat. Retrieved 2018 from http://lintut.com/dstat-linux-monitoring-tools/.Google ScholarGoogle Scholar
  14. M. Ferdman et al. 2012. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the ACM SIGPLAN Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Frequent Itemset Mining Dataset Repository. Retrieved 2017 from http://fimi.ua.ac.be/data/.Google ScholarGoogle Scholar
  16. W. Gao et al. 2013. Bigdatabench: A big data benchmark suite from web search engines. In Proceedings of the Conference on Architectures and Systems for Big Data (ASBD’13) in Conjunction with the International Symposium on Computer Architecture (ISCA’13).Google ScholarGoogle Scholar
  17. A. Ghazal et al. 2013. Bigbench: Towards an industry standard benchmark for big data analytics. In Proceedings of the ACM SIGMOD Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Gutierrez et al. 2014. Integrated 3D-stacked server designs for increasing physical density of key-value stores. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011), 6--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Homayoun et al. 2012. Dynamically heterogeneous cores through 3D resource pooling. In Proceedings of the Conference on High-Performance Computer Architecture (HPCA’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Huang et al. 2010. The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In Proceedings of the 26th International Conference on Data Engineering Workshops (ICDEW’10).Google ScholarGoogle ScholarCross RefCross Ref
  22. K. Hwang et al. 2016. Cloud performance modeling with benchmark evaluation of elastic scaling strategies. IEEE Trans. Parall. Distrib. Syst. 27, 1 (2016), 130--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Intel VTune Amplifier XE Performance Profiler. 2015. Retrieved from http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.Google ScholarGoogle Scholar
  24. Z. Jia et al. 2014. Characterizing and subsetting big data workloads. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’14).Google ScholarGoogle ScholarCross RefCross Ref
  25. Z. Jia et al. 2017. Understanding big data analytics workloads on modern processors. IEEE Trans. Parall. Distrib. Syst. 28, 6 (2017), 1797--1810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Johann et al. 2015. Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers. ACM SIGPLAN Notices 50, 4, 223--238. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. T. Kaushik et al. 2010. Greenhdfs: Towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster. Proceedings of the USENIX Annual Technical Conference.Google ScholarGoogle Scholar
  28. T. Khavari et al. 2014. Energy-efficient mapping of biomedical applications on domain-specific accelerator under process variation. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. V. Kontorinis et al. 2012. Managing distributed UPS energy for effective power capping in data centers. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V. Kontorinis et al. 2014. Enabling dynamic heterogeneity through core-on-core stacking. In Proceedings of the the 51st Annual Design Automation Conference (DAC’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. R. Krish, Ali Anwar, and Ali R. Butt. 2014. {phi} Sched: A heterogeneity aware Hadoop workflow scheduler. In Proceedings of the International Symposium on Modeling Analysis and Simulation of Telecommunication Systems (MASCOTS’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. James T. Kukunas, V. Gopal, J. Guilford, S. Gulley, A. van de Ven, and W. Feghali. 2014. High performance ZLIB compression on Intel® architecture processors. White paper.Google ScholarGoogle Scholar
  33. A. Li et al. 2010. CloudCmp: Comparing public cloud providers. Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Li Haoyuan et al. 2008. PFP: Parallel FP-growth for query recommendation. Proceedings of the ACM Conference on RecOmmender Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. Li et al. 2009. Fast enumeration of maximal valid subgraphs for custom instruction identification. Proceedings of the Conference on Automation Science and Engineering (CASES’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. Liang et al. 2014. Performance characterization of Hadoop and data MPI based on Amdahl's second law. In Proceedings of the 9th IEEE International Conference on Networking, Architecture, and Storage (NAS’14). 207--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Lim et al. 2008. Understanding and designing new server architectures for emerging warehouse-computing environments. ACM SIGARCH Comput. Architect. News 36, 3, 315--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Z. Lin and P. Chow. 2013. Zcluster: A zynq-based Hadoop cluster. In Proceedings of the International Conference on Field Programmable Technology (FPT’13). 450--453.Google ScholarGoogle Scholar
  39. Luo Chunjie et al. 2012. Cloudrank-d: Benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6, 4 (2012), 347--362.Google ScholarGoogle ScholarCross RefCross Ref
  40. Mahout: Scalable machine-learning and data-mining library. Retrieved 2017 from http://mahout.apache.org/.Google ScholarGoogle Scholar
  41. M. Malik and H. Homayoun. 2015. Big data on low power cores: Are low power embedded processors a good fit for the big data workloads? In Proceedings of the 33rd IEEE International Conference on Computer Design (ICCD’15). 379--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Malik et al. 2015a. System and architecture level characterization of big data applications on big and little core server architectures. In Proceedings of the IEEE International Conference on Big Data. 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Malik et al. 2015b. Characterizing Hadoop applications on microservers for performance and energy efficiency optimizations. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’16). 153--154.Google ScholarGoogle Scholar
  44. S. Nilakantan et al. 2013. Platform-independent analysis of function-level communication in workloads. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’13).Google ScholarGoogle ScholarCross RefCross Ref
  45. K. Ousterhout et al. 2015. Making sense of performance in data analytics frameworks. In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI’15). 293--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. V. J. Reddi et al. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In Proceedings of the ACM SIGPLAN Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. H. Sayadi, D. Pathak, I. Savidis, and H. Homayoun. 2018. Power conversion efficiency-aware mapping of multithreaded applications on heterogeneous architectures: A comprehensive parameter tuning. In Proceedings of the 23rd Asia and South Pacific Design Automation Conference (ASP-DAC’18). IEEE, 70--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Shan et al. 2010. FPMR: Mapreduce framework on FPGA. In Proceedings of the Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. SPMF. Retrieved 2016 from http://www.philippe-fournier-viger.com/spmf/index.php.Google ScholarGoogle Scholar
  50. T. Honjo and K. Oikawa. 2013. Hardware acceleration of Hadoop mapreduce. In Proceedings of the IEEE International Conference on Big Data. 118--124.Google ScholarGoogle Scholar
  51. T. K. Prakash et al. 2008. Performance characterization of SPEC CPU2006 Benchmarks on Intel Core 2 Duo Processor. In Proceedings of the International Seminar on Aerospace Science and Technology (ISAST’08).Google ScholarGoogle Scholar
  52. J. Veiga et al. 2016. Performance evaluation of big data frameworks for large-scale data analytics. In Proceedings of the IEEE International Conference on Big Data.Google ScholarGoogle ScholarCross RefCross Ref
  53. L. Wang et al. 2014. Bigdatabench: A big data benchmark suite from internet services. In Proceedings of the 20th International Symposium on High Performance Computer Architecture (HPCA’14). IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  54. WattsUpPro power meter. 2015. Retrieved from https://www.wattsupmeters.com/secure/index.Google ScholarGoogle Scholar
  55. T. L. Willke et al. 2012. Graphbuilder—A scalable graph construction library for Apache Hadoop. In Big Learning Workshop on Neural Information Processing Systems (NIPS’12).Google ScholarGoogle Scholar
  56. Xi Luo, Walid Najjar, and Vagelis Hristidis. 2013. Efficient near-duplicate document detection using FPGAs. In 2013 IEEE International Conference on Big Data. IEEE, 54--61.Google ScholarGoogle Scholar
  57. P. Yu et al. 2007. Disjoint pattern enumeration for custom instructions identification. In Proceedings of the Conference on Field Programmable Logic (FPL’07). 273--278.Google ScholarGoogle ScholarCross RefCross Ref
  58. P. Yu et al. 2004. Scalable custom instructions identification for instruction-set extensible processors. In Proceedings of the Conference on Automation Science and Engineering (CASES’04). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Modeling and Performance Evaluation of Computing Systems
        ACM Transactions on Modeling and Performance Evaluation of Computing Systems  Volume 3, Issue 3
        September 2018
        138 pages
        ISSN:2376-3639
        EISSN:2376-3647
        DOI:10.1145/3232716
        • Editors:
        • Sem Borst,
        • Carey Williamson
        Issue’s Table of Contents

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 July 2018
        • Accepted: 1 May 2018
        • Revised: 1 March 2018
        • Received: 1 November 2017
        Published in tompecs Volume 3, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader