skip to main content
10.1145/2591971.2591998acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Balanced resource allocations across multiple dynamic MapReduce clusters

Authors Info & Claims
Published:16 June 2014Publication History

ABSTRACT

Running multiple instances of the MapReduce framework concurrently in a multicluster system or datacenter enables data, failure, and version isolation, which is attractive for many organizations. It may also provide some form of performance isolation, but in order to achieve this in the face of time-varying workloads submitted to the MapReduce instances, a mechanism for dynamic resource (re-)allocations to those instances is required. In this paper, we present such a mechanism called Fawkes that attempts to balance the allocations to MapReduce instances so that they experience similar service levels. Fawkes proposes a new abstraction for deploying MapReduce instances on physical resources, the MR-cluster, which represents a set of resources that can grow and shrink, and that has a core on which MapReduce is installed with the usual data locality assumptions but that relaxes those assumptions for nodes outside the core. Fawkes dynamically grows and shrinks the active MR-clusters based on a family of weighting policies with weights derived from monitoring their operation.

We empirically evaluate Fawkes on a multicluster system and show that it can deliver good performance and balanced resource allocations, even when the workloads of the MR-clusters are very uneven and bursty, with workloads composed from both synthetic and real-world benchmarks.

References

  1. Amazon Elastic MapReduce. http://aws.amazon.com/elasticmapreduce.Google ScholarGoogle Scholar
  2. Apache Hadoop. http://hadoop.apache.org.Google ScholarGoogle Scholar
  3. The Distributed ASCI Supercomputer 4. http://www.cs.vu.nl/das4.Google ScholarGoogle Scholar
  4. Hadoop Capacity Scheduler. http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html.Google ScholarGoogle Scholar
  5. B. Avi-Itzhak and H. Levy. On Measuring Fairness in Queues. Advances in Applied Probability, 36(3), 2004.Google ScholarGoogle Scholar
  6. J. Buisson, O. Sonmez, H. Mohamed, W. Lammers, and D. Epema. Scheduling Malleable Applications in Multicluster Systems. IEEE Cluster, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Chen, S. Alspaugh, and R. Katz. Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads. VLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Chohan, C. Castillo, M. Spreitzer, M. Steinder, A. Tantawi, and C. Krintz. See Spot Run: Using Spot Instances for MapReduce Workflows. HotCloud, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean and S. Ghemawat. Mapreduce: Simplified Data Processing on Large Clusters. Comm. of the ACM, 51(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Ernemann, V. Hamscher, U. Schwiegelshohn, R. Yahyapour, and A. Streit. On Advantages of Grid Computing for Parallel Job Scheduling. CCGrid, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Feitelson, L. Rudolph, U. Schwiegelshohn, K. Sevcik, and P. Wong. Theory and Practice in Parallel Job Scheduling. JSSPP, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Ghit, N. Yigitbasi, and D. Epema. Resource Management for Dynamic MapReduce Clusters in Multicluster Systems. MTAGS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types. NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Choosy: Max-Min Fair Sharing for Datacenter Jobs with Constraints. Eurosys, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. I. Goiri, K. Le, T. D. Nguyen, J. Guitart, J. Torres, and R. Bianchini. GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks. EuroSys, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. G. Greenberg and N. Madras. How Fair Is Fair Queuing. JACM, 39(3), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Hegeman, B. Ghit, M. Capota, J. Hidders, D. Epema, and A. Iosup. The BTWorld Use Case for Big Data Analytics: Description, MapReduce Logical Workflow, and Empirical Evaluation. IEEE Big Data, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  18. H. Herodotou, F. Dong, and S. Babu. No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-Intensive Analytics. SOCC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. The Hibench Benchmark Suite: Characterization of the MapReduce-based Data Analysis. ICDEW, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. SIGOPS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Lin, A. Wierman, L. Andrew, and E. Thereska. Dynamic Right-Sizing for Power-Proportional Data Centers. INFOCOM, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  23. H. Mohamed and D. Epema. Koala: A Co-allocating Grid Scheduler. Concurrency and Computation: Practice and Experience, 20(16), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Nguyen and M. Vojnovic. Weighted Proportional Allocation. SIGMETRICS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A Not-So-Foreign Language for Data Processing. SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Raz, H. Levy, and B. Avi-Itzhak. A Resource-Allocation Queueing Fairness Measure. SIGMETRICS/PERFORMANCE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Shue, M. J. Freedman, and A. Shaikh. Performance Isolation and Fairness for Multi-Tenant Cloud Storage. OSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Tan, X. Meng, and L. Zhang. Delay Tails in MapReduce Scheduling. SIGMETRICS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, et al. Apache Hadoop Yarn: Yet Another Resource Negotiator. SOCC, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Wang, M. Lin, F. Ciucu, A. Wierman, and C. Lin. Characterizing the Impact of the Workload on the Value of Dynamic Resizing in Data Centers. SIGMETRICS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Wierman and M. Harchol-Balter. Classifying Scheduling Policies with Respect to Unfairness in an M/GI/1. SIGMETRICS, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Wojciechowski, M. Capotă, J. Pouwelse, and A. Iosup. BTWorld: Towards Observing the Global BitTorrent File-Sharing Network. HPDC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Zaharia, D. Borthakur, J. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica. Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. EuroSys, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Balanced resource allocations across multiple dynamic MapReduce clusters

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMETRICS '14: The 2014 ACM international conference on Measurement and modeling of computer systems
      June 2014
      614 pages
      ISBN:9781450327893
      DOI:10.1145/2591971

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 June 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMETRICS '14 Paper Acceptance Rate40of237submissions,17%Overall Acceptance Rate459of2,691submissions,17%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader