in [Kernel]

Prev: Bug in current -git tree causing dbus and gnome to chew up cpu time
Next: Introduce O_CLOEXEC (take >2)
From: Ting Yang on 2 May 2007 19:50 Srivatsa Vaddagiri wrote: > I briefly went thr' the paper and my impression is it expect each task > to specify the length of each new request it initiates. Is that correct? > No, the timeslice l_i here serves as a granularity control w.r.t responsiveness (or latency depends on how you interpret it). As wli said it can be express as a function of the priority, as we do for weight now. It is not related with the length of each new request. A request may be 1 seconds long, but the scheduler may still process it using 10ms timeslice. Smaller timeslice leads to more accuracy, i.e. closer to ideal case. However, the maximum of timeslice l_i used by all active tasks determines the total responsiveness of the system, which I will explain in detail later. > There is also p->wait_runtime which is taken into account when > calculating p->fair_key. So if p3 had waiting in runqueue for long > before, it can get to run quicker than 10ms later. Consider if p3 is a newly started task or waked up task and carries no p->wait_runtime. Ting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (a)vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ting Yang on 2 May 2007 22:50 Hi, As encouraged by some of you, I have started implementing EEVDF. However, I am quite new in this area, and may not be experienced enough to get it through quickly. The main problems, I am facing now ,is how to treat the semantics of yeild() and yield_to(). I probably will throw a lot of questions along the way of my implementation. Also I found my previous email was not clear enough in describing the properties of CFS and EEVDF and caused some confusion, and there were also some mistakes too. In this email, I will try to make up for that. *** Let's start from CFS: For simplicity, let's assume that CFS preempt the current task p1 by another tasks p2, when p1->key - p2->key >1, and the virtual time rq->fair_clock is initialized to be 0. Suppose, at time t = 0, we start n+1 tasks that run long enough. task 1 has weight n and all other tasks have weight 1. It is clear that, at time t=0, p_1->key = p_2->key = ... =p_(n+1)-> key = rq->fair_clock = 0 Since all tasks has the same key, CFS breaks the ties arbitrarily, which leads to many possibilities. Let's consider 2 of them: _Case One:_ p1, which has weight n, executes first: t = 1: rq->fair_clock = 1/2n, p1->key = 1/n // others are not changed. t = 2: rq->fair_clock = 2/2n, p1->key = 2/n ... t = n: rq->fair_clock = n/2n, p1->key = n/n = 1 Only after p1 executes n ticks, the scheduler will pick another task for execution. Between time [0, n) the amount of actual work done by p1 is n. The amount of work should be done in ideal fluid-flow system is n * n/2n = n/2. Therefore the lag is n/2 - n = -n/2, negative means p1 goes faster than the ideal case. As we can see this lag is O(n). _Case Two:_ the scheduler executes the tasks in the order p2, p3, ..., p_(n+1), p1 t = 1: rq->fair_clock = 1/2n, p2->key = 1; // others are not changed t = 2: rq->fair_clock = 2/2n, p3->key = 1; .... t = n: rq->fair_clock = n/2n, p_(n+1)->key = 1; Then the scheduler picks p1 (weight n) for execution. Between time [0, n) the amount actual work done by p1 is 0, and the ideal amount is n/2. Therefore the lag is n/2 - 0, positive means p1 falls behind the ideal case. The lag here for p1 is also O(n). As I said in the previous email, p->fair_key only has the information of past execution of a task and reflects a fair start point. It does not have the information about weight. *** Now, let's look at EEVDF. I have to say that I missed a very important concept in EEVDF which leads to confusions here. EEVDF stands for _Eligible_ Earliest Virtual Deadline First, and I did not explain what is _eligible_. EEVDF maintains a virtual start time ve_i and virtual deadline vd_i for each task p_i, as well as a virtual time vt. A newly started/waked task has its ve_i initialized to be the current virtual time. Once a timeslice l_i amount of work is done, the new virtual start time is set to be the previous virtual deadline, and then virtual deadline vd_i is recalculated. A task is eligible, if and only if ve_i <= current virtual time vt EEVDF, at every tick, always picks the eligible task which has the earliest virtual deadline for execution Let's see how it works using a similar example as for CFS above. Suppose, at time t = 0, we starts n+1 tasks. p1 has weight n, and all others have weight 1. For simplicity, we assume all task use timeslice l_i = 1, and virtual time vt is initialized to be 0. - at time t = 0, we have vt = 0; ve_1 = 0, vd_1 = ve_1 + l_1/w_1 = 1/n ve_2 = 0, vd_2 = ve_1 + l_2/w_2 = 1 ... ve_(n+1) = 0, vd_(n+1) = ve_(n+1) + l_(n+1)/w_(n+1) = 1; Since p1 is eligible and has the earliest deadline 1/n, the scheduler will executes it first. (Here, the weight which encoded in the deadline plays an important rule, and allows higher weight tasks to be executed first). - at time t = 1: vt = 1/2n, ve_1 = 1/n (previous vd_1), vd_1 = ve_1 + 1/n = 2/n Since ve_1 > vt, p1 is _not_ eligible. EEVDF picks another task for execution by breaking the tie, say it executes p2. - at time t = 2: vt = 2/2n = 1/n, ve_1 = 1/n, vd_1 = 2/n ve_2 = 1, ve_2 = ve_2 + 1/1 = 2 // this makes p2 not eligible Since vt = ve_1, p1 becomes eligible again and has the earliest deadline 2/n, it will be scheduled for execution. As EEVDF repeats, it give a schedule like p1, p2, p1,p3, p1, p4, p1 .... (presented by each tick). As you can see, now p1 never falls behind/goes before the ideal case by 1. Now, let's check how timeslice l_i impacts the system. Suppose, we change the timeslice of p1 from 1 to 2, and keep others unchanged. EEVDF gives a schedule like: p1, p1, p2, p3, p1, p1, p4, p5, p1, p1, .... (presented by each tick) similarly if timeslice of p1 is set to be 3, EEVDF gives p1, p1, p1, p2, p3, p4, p1, p1, p1, .... (presented by each tick) As the timeslice of p1 increases, the system checks for reschedule less frequently, thus makes the lag of p1 becomes longer, but the lag will not be larger than the maximum timeslice used, as long as it is a fixed constant. On the other hand, increasing the timeslice of other tasks has no effect on when p1 is schedule. ( you can try to play the algorithm by yourself :-)) In CFS, a task has to increases p->fair_key for a certain amount so that the scheduler can consider it to be preempted. Higher weight leads to less progress in p->fair_key, and then effectively large timeslice. Suppose the preempt granularity is 5 virtual ticks, then a task of weight 1 needs to run 5 ticks, weight 2 need 10 ticks, weight 10 needs 50 ticks. The effect is that the timeslice increases linearly w.r.t weight, and causes the O(n) lag. In fact, we use timeslice n for p1 in the above example for EEVDF, it behaves exactly as CFS. Now, I have fix an incorrect statement about EEVDF's lag bound in my previous email: Under EEVDF, a task's lag (difference between the amount work should be done in ideal fluid-flow system and the actual amount of work done) is bounded by the maximum timeslice used. (not occasionally as I said in my previous email). This actually means the maximum timeslice used controls the total responsiveness of the system. Also the combination of high weight and smaller timeslice will give better response guarantee for those bursty response-time sensitive tasks. Sorry for the late replay. I had an eye exam today and got my eyes dilated, which forced me to stay away from my computer for a while :-) Ting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (a)vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ting Yang on 2 May 2007 23:10 Li, Tong N wrote: > Thanks for the excellent explanation. I think EEVDF and many algs alike > assume global ordering of all tasks in the system (based on virtual > time), whereas CFS does so locally on each processor and relies on load > balancing to achieve fairness across processors. It'd achieve strong > fairness locally, but I'm not sure about its global fairness properties > in an MP environment. If ideally the total load weight on each processor > is always the same, then local fairness would imply global fairness, but > this is a bin packing problem and is intractable ... First, I am not assuming a global ordering of all tasks. As the current implementation, EEVDF should maintain virtual time locally for each CPU. EEVDF is a proportional time share scheduler, therefore the relative weight and actual cpu share for each task varies when tasks join and leave. There will be not bin-pack problem for such systems. I understand that bin-pack problem does exist in Real-time world. Suppose in a system has 2 cpus, there a 3 tasks, all of which needs to finish 30ms work within a window of 50ms. Any 2 of them stay together will exceeds the bandwidth of one cpu. There is a bin-pack problem, unless the system has to be clever enough to break one of them down into 2 requests of 15ms/25ms, and execute them on different cpus at different time without overlap, which is quite difficult :-) In the proportional world, weights and cpu share are scale to fit with the bandwidth of a cpu. Therefore putting 2 of them on one cpu is fine, and the fairness for each cpu is preserved. On the other hand, moving one task back and forth among 2 cpus do give better throughput and better global fairness. I have not dig into the load balancing algorithms of SMP yet, so I leave it aside for now, first thing first :-) Thanks ! Ting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (a)vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Ting Yang on 2 May 2007 23:20 > On Wed, May 02, 2007 at 11:06:34PM +0530, Srivatsa Vaddagiri wrote: > >> There is also p->wait_runtime which is taken into account when >> calculating p->fair_key. So if p3 had waiting in runqueue for long >> before, it can get to run quicker than 10ms later. >> > > Virtual time is time from the task's point of view, which it has spent > executing. ->wait_runtime is a device to subtract out time spent on the > runqueue but not running from what would otherwise be virtual time to > express lag, whether deliberately or coincidentally. ->wait_runtime > would not be useful for EEVDF AFAICT, though it may be interesting to > report. I just want to point out that ->wait_runtime, in fact, stores the lag of each task in CFS, except that it is also used by other things, and occasionally tweaked (heuristically ?). Under normal cases the sum of lags of all active tasks in such a system, should be a constant 0. The lag information is equally important to EEVDF, when some tasks leave the system (becomes inactive) carrying certain amount of lag. The key point here is that we have to spread the lag (either negative or positive) to all remaining task, so that the fairness of the system is preserved. I thinks CFS implementation does not seems to handle this properly. I am running out time today :-( I will write an email about CFS -v8 tomorrow, describing 2 issues in CFS I found related to this. Ting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (a)vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Zoltan Boszormenyi on 3 May 2007 04:30
Hi! > *** Balbir Singh <balbir (a)linux.vnet.ibm.com> wrote:> > > The problem is with comparing a s64 values with (s64)ULONG_MAX, which > > evaluates to -1. Then we check if exec_delta64 and fair_delta64 are > > greater than (s64)ULONG_MAX (-1), if so we assign (s64)ULONG_MAX to > > the respective values. > > ah, indeed ... > > > The fix is to compare these values against (s64)LONG_MAX and assign > > (s64)LONG_MAX to exec_delta64 and fair_delta64 if they are greater > > than (s64)LONG_MAX. > > > > Tested on PowerPC, the regression is gone, tasks are load balanced as > > they were in v7. > > thanks, applied! > > Ingo I started up 12 glxgears to see the effect of CFS v8 on my Athlon64 X2 4200. Without this patch all the GL load was handled by the second core, the system only stressed the first core if other processes were also started, i.e. a kernel compilation. With this patch the load is evenly balanced across the two cores all the time. And while doing make -j4 on the kernel, the 12 gears are still spinning about 185+ FPS and there are only slightly visible hiccups. Switching between workspaces, i.e. refreshing the large windows of Thunderbird and Firefox are done very quickly, the whole system is exceptionally responsive. Thanks for this great work! Best regards, Zolt�n B�sz�rm�nyi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (a)vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |