CFS scheduler, -v8 [Kernel]

Prev: Bug in current -git tree causing dbus and gnome to chew up cpu time
Next: Introduce O_CLOEXEC (take >2)

From: Srivatsa Vaddagiri on 3 May 2007 10:50

On Thu, May 03, 2007 at 03:29:32PM +0200, Damien Wyart wrote:
> What are your thoughts/plans concerning merging CFS into mainline ? Is
> it a bit premature to get it into 2.6.22 ? I remember Linus was ok to
> change the default scheduler rapidly (the discussion was about sd at
> that time) to get more testing than in -mm.

And what about group scheduling extensions? Do you have plans to work on
it? I was begining to work on a prototype to do group scheduling based
on CFS, basically on the lines of what you and Linus had outlined
earlier:

http://lkml.org/lkml/2007/4/18/271
http://lkml.org/lkml/2007/4/18/244

--
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ting Yang on 3 May 2007 11:10

Ingo Molnar wrote:
> * Ting Yang <tingy(a)cs.umass.edu> wrote:
>
>
>> Authors of this paper proposed a scheduler: Earlist Eligible Virtual
>> Deadline First (EEVDF). EEVDF uses exactly the same method as CFS to
>> track the execution of each running task. The only difference between
>> EEVDF and CFS is that EEVDF tries to _deadline_ fair while CFS is
>> _start-time_ fair. [...]
>>
>
> Well, this is a difference but note that it's far from being the 'only
> difference' between CFS and EEVDF:
>
>
I re-ordered comments a little bit to make my discussion easier:
> - in CFS you have to "earn" your right to be on the CPU, while EEVDF
> gives out timeslices (quanta)
>
> - thus in CFS there's no concept of "deadline" either (the 'D' from
> EEVDF).
>
These two are needed to implement the deadline fair policy, therefore
are cover in the difference I mentioned. Due to the way that authors
using different terms, the paper may leads to certain confusion, here 3
important concepts needed:
- the scheduling quanta *q* in the paper: it refers to the minimum
unit that a scheduler can
schedules tasks. For system ticks 1000 per second, q is 1ms. For
system ticks 100 per second
(old linux), q is 10ms.
- the length of a request submitted by a task: it refers to the
amount of work needed to process a
request. EEVDF _does_ not need this information for scheduling
at all.
- the timesliced used to process requests, which denoted as *r* in
the paper: it refers to the unit that
scheduler uses to process a request. If the length of a request
is larger than timeslice r, the request is
effectively divided into pieces of length r.

The scheduler can use a suitable timeslice at will to handle a process.
Using the timeslice, it can estimate the deadline based on the weight of
a task, which tells how urgent the task is. and therefore remove the
potential O(n). (but it comes with a cost: more context switches)
> - EEVDF concentrates on real-time (SCHED_RR-alike) workloads where they
> know the length of work units
That is _not_ really the case, although authors of this paper used a
quite aggressive title and reclaimed so.
Look carefully at the Theorem 3 and its Corollary on pg 16, it says
If not request of client k is larger than a time quanta, then at any
time t the lag is bounded by
-q < lag_k(t) < q
It gives three facts:
1. For the system to be hard real-time, the length of request (note,
not the quanta q, timeslice r)
of any task at any time, must be less than the quanta of
schedule, which in modern Linux is 1ms.
This make EEVDF effectively useless for hard real-time system,
where application can not miss
deadline
2. For the system to be soft real-time: EEVDF needs to first fixed
the bandwidth of a task, so that
it does not varies when tasks join and leave. Then EEVDF can
give a soft real-time guarantee with
bounded delay of max(timeslice r). Authors mentioned this, but
unfortunately this feature was
never actually implemented.
3. EEVDF does gives a proportional time-share system that any task
running on it will has not delay
longer than max(timeslice t) w.r.t the ideal fluid-flow system.

Authors of this paper only developed framework that can possibly used by
real-time system (either hard or soft), their main attempt was to glue
real-time and proportional time-share together. I think the real-time
part of this paper does not actually ring the bell to me. However, isn't
the fact 3 listed above really what we needed ?
> - while CFS does not need any knowledge
> about 'future work', it measures 'past behavior' and makes its
> decisions based on that. So CFS is purely 'history-based'.
>
As I explained about, EEVDF does not need any future knowledge, instead
it uses a unit (timeslice) to process request from tasks, by predicting
the deadline of when this unit has to be finished.

There is a firm theoretic base behind this "guess":, which did not
appear in the paper: as long as the total bandwidth needed by all active
tasks does not exceeds the bandwidth of CPU, the predicted virtual
deadline vt, if mapped back to real time t in system, then that t is
exactly the deadline for the chosen unit in the ideal system. Since
EEVDF is proportionally share, the bandwidth for each task is scaled
based on the weight, the sum of them does not exceeds cpu capacity.
Therefore the "guess" of EEVDF is always right :-)

Predicting future taking weight into account, makes it choose better
tasks to execute. Smaller timeslice make it stick to the ideal system
closer in the cost of more context switches.
> - EEVDF seems to be calculating timeslices in units of milliseconds,
> while CFS follows a very strict 'precise' accounting scheme on the
> nanoseconds scale.
>
This is just an implementation issue, right? It is possible to implement
EEVDF in finer granularity.
> - the EEVDF paper is also silent on SMP issues.
>
Yes, you are right. I do not have much knowledge about load balancing
> - it seems EEVDF never existed as a kernel scheduler, it was a
> user-space prototype under FreeBSD with simulated workloads. (have
> they released that code at all?).
>
They never released the code, (quite typical behavior of system
researchers :-)), and that is why I asked if it worth a try.
> The main common ground seems to be that both CFS and EEVDF share the
> view that the central metric is 'virtual time' proportional to the load
> of the CPU (called the 'fair clock' in CFS) - but even for this the
> details of the actual mechanism differ: EEVDF uses 1/N while CFS (since
> -v8) uses a precise, smoothed and weighted load average that is close to
> (and reuses portions of) Peter Williams's load metric used in smp-nice.
>
>
Thanks a lot for giving your opinions, very nice discussing with you.

Ting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Ting Yang on 3 May 2007 11:30

Srivatsa Vaddagiri wrote:
> On Thu, May 03, 2007 at 10:50:15AM +0200, Ingo Molnar wrote:
>
>> - EEVDF concentrates on real-time (SCHED_RR-alike) workloads where they
>> know the length of work units
>>
>
> This is what I was thinking when I wrote earlier that EEVDF expects each
> task will specify "length of each new request"
> (http://lkml.org/lkml/2007/5/2/339).
>
This is not very true based on my understanding of EEVDF, please look at
the email I just sent out to Ingo for explanation.
> The other observation that I have of EEVDF is that it tries to be fair
> in the virtual time scale (each client will get 'wi' real units in 1
> virtual unit), whereas sometimes fairness in real-time scale also
> matters?
> For ex: a multi-media app would call scheduler fair to it, it it recvs
> atleast 1 ms cpu time every 10 *real* milleseconds (frame-time). A rogue
> user (or workload) that does a fork bomb may skew this fairness for that
> multi-media app in real-time scale under EEVDF?
>
First of all, CFS does not seems to address this issue to. This is a
typical real-time or soft real-time question, that is not only the
bandwidth of a task has to be fixed, i.e. 10% of cpu bandwidth (which
proportional shared system, like CFS, EEVDF does not do), and the work
need to satisfy a deadline.
In both CFS, EEVDF, the scheduler have keep tweaking weights to give a
fixed bandwidth to application. Authors of EEVDF claims this can be
done, but never implemented :-(

Ting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: William Lee Irwin III on 3 May 2007 12:00

On Thu, May 03, 2007 at 03:29:32PM +0200, Damien Wyart wrote:
>> What are your thoughts/plans concerning merging CFS into mainline ? Is
>> it a bit premature to get it into 2.6.22 ? I remember Linus was ok to
>> change the default scheduler rapidly (the discussion was about sd at
>> that time) to get more testing than in -mm.

On Thu, May 03, 2007 at 08:23:18PM +0530, Srivatsa Vaddagiri wrote:
> And what about group scheduling extensions? Do you have plans to work on
> it? I was begining to work on a prototype to do group scheduling based
> on CFS, basically on the lines of what you and Linus had outlined
> earlier:
> http://lkml.org/lkml/2007/4/18/271
> http://lkml.org/lkml/2007/4/18/244

Tong Li's Trio scheduler does a bit of this, though it doesn't seem to
have the mindshare cfs seems to have acquired.

The hyperlink seems to have broken, though:
http://www.cs.duke.edu/~tongli/linux/linux-2.6.19.2-trio.patch

-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Li, Tong N on 3 May 2007 14:50

On Thu, 2007-05-03 at 08:53 -0700, William Lee Irwin III wrote:
> On Thu, May 03, 2007 at 03:29:32PM +0200, Damien Wyart wrote:
> >> What are your thoughts/plans concerning merging CFS into mainline ? Is
> >> it a bit premature to get it into 2.6.22 ? I remember Linus was ok to
> >> change the default scheduler rapidly (the discussion was about sd at
> >> that time) to get more testing than in -mm.
>
> On Thu, May 03, 2007 at 08:23:18PM +0530, Srivatsa Vaddagiri wrote:
> > And what about group scheduling extensions? Do you have plans to work on
> > it? I was begining to work on a prototype to do group scheduling based
> > on CFS, basically on the lines of what you and Linus had outlined
> > earlier:
> > http://lkml.org/lkml/2007/4/18/271
> > http://lkml.org/lkml/2007/4/18/244
>
> Tong Li's Trio scheduler does a bit of this, though it doesn't seem to
> have the mindshare cfs seems to have acquired.
>
> The hyperlink seems to have broken, though:
> http://www.cs.duke.edu/~tongli/linux/linux-2.6.19.2-trio.patch

Yeah, I just fixed the link. The code was written based on the 2.6.19.2
scheduler. I could forward-port it to the latest kernel if there's
interest and I can find time. :) Here's a description of the design:

http://lkml.org/lkml/2007/4/20/286

Thanks,

tong
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: Bug in current -git tree causing dbus and gnome to chew up cpu time
Next: Introduce O_CLOEXEC (take >2)