New Trends and Ideas
Providing fair-share scheduling on multicore computing systems via progress balancing

https://doi.org/10.1016/j.jss.2016.11.053Get rights and content

Highlights

  • Fair-share scheduling is essential for achieving performance isolation.

  • Current Linux CFS cannot effectively achieve multicore fairness.

  • A task migration policy, progress balancing, is proposed for multicore fairness.

  • The proposed algorithm extends our previous work via mechanism called throttling.

  • It bounds the virtual runtime difference between any pair of tasks by a constant.

Abstract

Performance isolation in a scalable multicore system is often attempted through periodic load balancing paired with per-core fair-share scheduling. Unfortunately, load balancing cannot guarantee the desired level of multicore fairness since it may produce unbounded differences in the progress of tasks. In reality, the balancing of load across cores is only indirectly related to multicore fairness. To address this limitation and ultimately achieve multicore fairness, we propose a new task migration policy we name progress balancing, and present an algorithm for its realization. Progress balancing periodically distributes tasks among cores to directly balance the progress of tasks by bounding their virtual runtime differences. In doing so, it partitions runnable tasks into task groups and allocates them onto cores such that tasks with larger virtual runtimes run on a core with a larger load and thus proceed more slowly. We formally prove the fairness property of our algorithm. To demonstrate its effectiveness, we implemented our algorithm into Linux kernel 3.10 and performed extensive experiments. In the target system, our algorithm yields the maximum virtual runtime difference of 1.07  s, regardless of the uptime of tasks, whereas the Linux CFS produces unbounded virtual runtime differences.

Introduction

As the trend of resource consolidation continues in computing systems of diverse application domains, performance isolation has become one of the essential features that must be supported by an underlying operating system. For example, cloud computing allows tenants to pay for computing resources that are dynamically provisioned by a cloud service provider. The cloud service provider places its abundant computing resources into a consolidated cloud server and lets tenants rent a slice of these resources. It is thus essential to guarantee the provisioning of the required computing resources as specified in a service level agreement signed by a tenant and the cloud service provider (Huh et al., 2012).

As such, fair-share scheduling has been increasingly adopted in various computing systems since its weight-based resource allocation can offer a means to achieve effective performance isolation at the operating system level. The key idea behind fair-share scheduling is to assign weights to runnable tasks in a system and provide each task with a share of the resource proportionally to its weight (Baruah, Cohen, Plaxton, Varvel, 1996, Jones, Roşu, Roşu, 1997).

Due to its popularity, a large number of fair-share scheduling algorithms have been proposed in the literature (Caprita, Chan, Nieh, Stein, Zheng, 2005, Caprita, Nieh, Stein, 2006, Chandra, Adler, Goyal, Shenoy, 2005, Duda, Cheriton, 1999, Ghodsi, Zaharia, Shenker, Stoica, 2013, Goyal, Vin, Chen, 1996, Isard, Prabhakaran, Currey, Wieder, Talwar, Goldberg, 2009, Kolivas, Molnar, Roberson, 2003, Srinivasan, Anderson, 2005, Waldspurger, Weihl, 1994, Waldspurger, Weihl, 1995, Zaharia, Borthakur, Sen Sarma, Elmeleegy, Shenker, Stoica, 2010). Among them, algorithms in (Duda, Cheriton, 1999, Goyal, Vin, Chen, 1996, Waldspurger, Weihl, 1994, Waldspurger, Weihl, 1995) focus on fair-share scheduling in a uniprocessor system. They achieve fairness by assigning each task a CPU time proportional to its relative weight. As modern computing systems are increasingly being equipped with multicore hardware, considerable research effort has also been given to developing multicore fair-share scheduling algorithms. These algorithms can be classified into either centralized or distributed run-queue algorithms.

Centralized run-queue algorithms achieve multicore fairness by extending a single core fair-share scheduling algorithm using a single global run-queue (Caprita, Chan, Nieh, Stein, Zheng, 2005, Chandra, Adler, Goyal, Shenoy, 2005, Kolivas, Srinivasan, Anderson, 2005). Unfortunately, they are limited in terms of scalability because they need a global lock for synchronization at the centralized run-queue. This may incur non-trivial performance degradation when systems scale up.

Distributed run-queue algorithms use a local run-queue on each core and allow each core to make its own independent scheduling decisions (Caprita, Nieh, Stein, 2006, Ghodsi, Zaharia, Shenker, Stoica, 2013, Isard, Prabhakaran, Currey, Wieder, Talwar, Goldberg, 2009, Molnar, Roberson, 2003, Zaharia, Borthakur, Sen Sarma, Elmeleegy, Shenker, Stoica, 2010). Such a distributed run-queue structure inevitably necessitates load balancing among cores in the system since the loads of the cores differ with time. Load balancing can possibly be achieved through periodic task migration among local run-queues. In these algorithms, the load of a core is defined as the sum of the weights of tasks in the corresponding run-queue, and thus load balancing naturally leads to weight-sum balancing in the system (Willebeek-LeMair, Reeves, 1993, Xu, 1997).

Unfortunately, such weight-sum balancing often fails to guarantee fair-share scheduling in a multicore system, even if an idealized perfect scheduling algorithm such as GPS (Parekh and Gallagher, 1994) is used for per-core scheduling. This is because a task’s weight is a static constant that does not capture the progress of the task whereas fair-share scheduling needs to balance the progress of entire tasks in the system.

To address this problem, we introduce a new policy for task migration, which we name Progress Balancing. Unlike load balancing, our progress balancing seeks to balance the virtual runtimes of runnable tasks where a task’s virtual runtime is defined as its cumulative runtime inversely scaled by its weight. It represents the relative progress of a task. In our approach, we attempt to bound a virtual runtime difference between any pair of tasks by a constant via progress balancing. Intuitively, this guarantees multicore fairness since the virtual runtimes of tasks become identical if all the tasks have received CPU times exactly in proportion to their relative weights, as in an idealized multicore scheduling such as generalized multiprocessor sharing (GMS) (Chandra et al., 2005).

Linux, which has emerged as the most favored operating system, also supports fair-share scheduling via the completely fair scheduler (CFS). Moreover, Linux has addressed multicore issues since its 2.2 release. Unfortunately, Linux falls short of expectations in terms of fair-share and multicore scheduling together. Like many other multicore fair-share scheduling algorithms based on distributed run-queues, CFS consists of two components: a per-core fair-share scheduler and a load balancer. Each component has a shortcoming in achieving multicore fairness.

First, the current realization of a task’s virtual runtime in CFS cannot be utilized for achieving fairness in a multicore environment. The goal of CFS is to approximate GPS by giving each task a CPU time in proportion to its assigned weight. At the per-core level, CFS successfully achieves this goal by using the virtual runtime. At every scheduling decision point, a per-core scheduler dispatches the task with the smallest virtual runtime on the core. However, in a multicore system, the virtual runtime cannot be a global measure for a task’s progress since the per-core scheduler of CFS maintains only relative order among the runnable tasks in each run-queue by locally manipulating virtual runtimes when a task is enqueued or dequeued.

Second, CFS relies on load balancing to achieve multicore fairness. As pointed out earlier, balancing loads among cores is irrelevant to bounding differences in the virtual runtimes of tasks. Note that virtual runtime distribution varies on cores regardless of loads and that the per-core schedulers of CFS run independently of each other. To exacerbate the problem, CFS cannot provide perfect load balancing for two reasons: weight quantization error and reluctant load balancing, as discussed in (Huh et al., 2012). The quantization error occurs because a task’s weight is specified with one of the predefined positive integers in Linux implementations. In addition, CFS allows for persistent load imbalance among cores since it is very reluctant to perform load balancing, unless the load imbalance exceeds a given threshold.

In our earlier work (Huh et al., 2012), we attempted to address the second limitation by proposing a virtual runtime-based task migration algorithm. Our previous algorithm has two shortcomings. First, it provides only pairwise fairness between two adjacent cores and has not yet been generalized into the larger number of cores. A straightforward extension of the algorithm fails to maintain the boundedness property among virtual runtimes. We thus newly introduce a mechanism called throttling. Second, the previous algorithm cannot solve the limitation of CFS’s virtual runtime since it still uses the same per-core scheduler as that of CFS. It merely inherits the same problem.

In this paper, we propose a progress balancing algorithm for achieving multicore fairness by extending our previous work. The algorithm works for a system with 2n cores by hierarchically applying a pairwise progress balancing algorithm similar to the one given in (Huh et al., 2012). The proposed algorithm periodically partitions runnable tasks into the same number of groups as the number of cores in the system and shuffles the tasks in such a way that a core with larger virtual runtimes receives a larger load and the load differences among cores are bounded. The proposed algorithm achieves this property by incorporating a work-non-conserving mechanism called throttling. This property ensures that tasks with larger virtual runtimes run at a slower pace until the next period. In addition, we elaborately redesign CFS so that it keeps track of the cumulative virtual runtimes of the tasks. Our formal analysis shows that our approach achieves a constant bound on the virtual runtime differences, regardless of the number of tasks or cores.

We implemented the proposed approach on a multicore server machine equipped with Intel Xeon E5506 processors running with Linux kernel 3.10. We experimented with the target machine and measured the maximum virtual runtime difference of tasks as well as the run-time overhead incurred by our algorithm. Experimental results demonstrate the effectiveness of our progress balancing approach. When we set the balancing period to 500 ms, the maximum virtual runtime difference was observed to be 1.07  s, regardless of the machine’s uptime. In contrast, the virtual runtime differences indefinitely diverge in Linux with the legacy CFS. Our approach incurs only negligible run-time overhead compared to the legacy CFS. The remainder of this paper is organized as follows. Section 2 discusses the related work on multicore fair-share scheduling algorithms. Section 3 models our target system and gives associated notations. In Section 4, we define progress balancing and state the problem at hand. Section 5 discusses the technical aspects of our progress balancing. Section 6 reports on the experimental evaluation. Finally, Section 7 concludes the paper.

Section snippets

Related work

A number of research results on multicore fair-share scheduling can be found in the literature. These results can be classified into either centralized or distributed run-queue algorithms.

System model and notations

In this section, we show the overall architecture of the target computing system and present the technical details of the CFS runtime algorithm and its load balancing mechanism along with notations used throughout this paper. This discussion provides readers with a background for understanding the progress balancing.

Problem description and solution overview

In this section, we formally state our problem along with a motivating example. We then give an overview of the proposed solution approach.

Progress balancing algorithm

In this section, we provide technical details of the progress balancing algorithm we propose to achieve fairness in a multicore system. We first describe the algorithm in detail and prove its properties.

Fig. 4 shows the pseudo code of the top-level algorithm Progress Balancing. It first merge-sorts m input task groups in the descending order of their virtual runtimes and stores all tasks in a system-wide balancing queue H. It then performs the Partition and Throttle step in order. Finally, it

Implementation and experimental evaluation

In this section, we report on the experiments we conducted to evaluate the degree of fairness improved by the proposed approach. We briefly describe our implementation of the solution approach and the experimental setup including the target system configuration and a performance metric. We then show and analyze the experimental results. Finally, we present the run-time overhead incurred by our approach.

Conclusion

In this paper, we proposed a progress balancing algorithm for achieving multicore fairness. It works together with a per-core fair-share scheduling algorithm and runs periodically. Specifically, at every balancing period, it partitions runnable tasks into the same number of task groups as the number of CPU cores in a system and shuffles the tasks to ensure that tasks with larger virtual runtimes run at a slower pace until the subsequent balancing period.

In order to show the fairness property of

Acknowledgments

This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No.R0101-15-237, Development of General-Purpose OS and Virtualization Technology to Reduce 30% of Energy for High-density Servers based on Low-power Processors) and by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (NIPA-2014(H0301-14-1020)) supervised by the NIPA

Sungju Huh earned his BS degree in Computer Science from Konkuk University, Korea, in 2007. He received his MS degree in Computer and Radio Communications Engineering from Korea University, in 2009. He received his PhD degree in Computer Science and Engineering from the Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology of Seoul National University in 2015. He is also a member of Real-Time Operating System Laboratory at Seoul National University. His

References (42)

  • P. Turner et al.

    CPU bandwidth control for CFS

    Linux Symposium

    (2010)
  • ab, 2015. ab - apache http server benchmarking tool. URL...
  • S.K. Baruah et al.

    Proportionate progress: a notion of fairness in resource allocation

    Algorithmica

    (1996)
  • C. Boneti et al.

    A dynamic scheduler for balancing HPC applications

    Proceedings of the 2008 ACM/IEEE Conference on Supercomputing

    (2008)
  • R. Buyya et al.

    Market-oriented cloud computing: vision, hype, and reality for delivering it services as computing utilities

    High Performance Computing and Communications, 2008. HPCC’08. 10th IEEE International Conference on

    (2008)
  • B. Caprita et al.

    Group Ratio Round-robin: O (1) Proportional Share Scheduling for Uniprocessor and Multiprocessor Systems

    USENIX Annual Technical Conference

    (2005)
  • B. Caprita et al.

    Grouped distributed queues: distributed queue, proportional share multiprocessor scheduling

    Proceedings of the Twenty-Fifth Annual ACM Symposium on Principles of DistributedComputing

    (2006)
  • A. Chandra et al.

    Surplus fair scheduling: a proportional-share CPU scheduling algorithm for symmetric multiprocessors

    Proceedings of the 4th Conference on Symposium on Operating System Design & Implementation-Volume 4

    (2005)
  • G. Chuanxiong

    SRR: An o (1) time complexity packet scheduler for flows in multi-service packet networks

    ACM SIGCOMM Comput Commun Rev

    (2001)
  • S.P. Dandamudi

    Reducing run queue contention in shared memory multiprocessors

    Computer

    (1997)
  • J. Dejun et al.

    EC2 performance analysis for resource provisioning of service-oriented applications

    Service-Oriented Computing. ICSOC/ServiceWave 2009 Workshops

    (2010)
  • K.J. Duda et al.

    Borrowed-virtual-time (BVT) scheduling: supporting latency-sensitive threads in a general-purpose scheduler

    ACM SIGOPS Operating Systems Review

    (1999)
  • FFmpeg, 2015. Ffmpeg documentation. URL...
  • A. Ghodsi et al.

    Choosy: max-min fair sharing for datacenter jobs with constraints

    Proceedings of the 8th ACM European Conference on Computer Systems

    (2013)
  • P. Goyal et al.

    Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks

    ACM SIGCOMM Computer Communication Review

    (1996)
  • P. Goyal et al.

    Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks

    Netw. IEEE/ACM Trans.

    (1997)
  • S. Hofmeyr et al.

    Juggle: proactive load balancing on multicore computers

    Proceedings of the 20th International Symposium on High Performance Distributed Computing

    (2011)
  • S. Hofmeyr et al.

    Load balancing on speed

    ACM Sigplan Notices

    (2010)
  • S. Huh et al.

    Providing fair share scheduling on multicore cloud servers via virtual runtime-based task migration algorithm

    Distributed Computing Systems (ICDCS), 2012 IEEE 32nd International Conference on

    (2012)
  • M. Isard et al.

    Quincy: fair scheduling for distributed computing clusters

    Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles

    (2009)
  • M.B. Jones et al.

    CPU Reservations and Time Constraints: Efficient, Predictable Scheduling of Independent Activities

    (1997)
  • Cited by (4)

    • A scheduler for SCADA-based multi-source fusion systems

      2020, Information Fusion
      Citation Excerpt :

      In the context of information fusion, there are some proposals that focus on resources that can be accessed by means of specific-purpose IoT networks [38,47] or general-purpose communication networks [11,23,36,46,50]; there are also some proposals that deal with arbitrary resources [3,5,35,49,56,57,60]. Regarding agents, there are several proposals that focus on allocating them to the data centres [12], the machines [22,31,41,55,58,59], or the cores [14,24,43] that have more computing power available, since this helps prevent starvation. The previous proposals were devised in the context of systems whose agents interact by means of remote-method invocations.

    • Multi-fold scheduling algorithm for multi-core multi-processor systems

      2020, Proceedings of the 2020 International Conference on Computing, Communication and Security, ICCCS 2020
    • Optimization of load and energy over cloud computing system based on nature inspired algorithms

      2019, International Journal of Scientific and Technology Research
    • An optimized model for energy efficiency on cloud system using PSO & CUCKOO search algorithm

      2019, International Journal of Innovative Technology and Exploring Engineering

    Sungju Huh earned his BS degree in Computer Science from Konkuk University, Korea, in 2007. He received his MS degree in Computer and Radio Communications Engineering from Korea University, in 2009. He received his PhD degree in Computer Science and Engineering from the Department of Transdisciplinary Studies, Graduate School of Convergence Science and Technology of Seoul National University in 2015. He is also a member of Real-Time Operating System Laboratory at Seoul National University. His current research interests include task scheduling for multicore systems, software platforms for embedded systems and virtualization.

    Seongsoo Hong earned his BS and MS degrees in Computer Engineering from Seoul National University, Korea, in 1986 and 1988, respectively. He received his PhD degree in Computer Science from the University of Maryland, College Park, in 1994. He is currently a professor in the Department of Electrical and Computer Engineering at Seoul National University. His current research interests include embedded and real-time systems design, real-time operating systems, software architecture for embedded and real-time systems and cross-layer optimization of complex, multi-layered software systems. He is the steering committee chair of IEEE RTCSA. He served as a general co-chair of IEEE RTCSA 2006 and CASES 2006 and as a program committee co-chair of IEEE RTAS 2005, RTCSA 2003, IEEE ISORC 2002 and ACM LCTES 2001. He has served on numerous program committees, including IEEE RTSS and ACM OOPSLA. He is a member of the National Academy of Engineering of Korea. He is currently a senior member of the IEEE and a senior member of the ACM.

    View full text