CFS scheduler, -v8 [Kernel]

Prev: Bug in current -git tree causing dbus and gnome to chew up cpu time
Next: Introduce O_CLOEXEC (take >2)

From: Linus Torvalds on 7 May 2007 15:00

On Mon, 7 May 2007, Johannes Stezenbach wrote:
>
> One baffling example where gcc rewrites code is when
> conditionals depend on signed integer overflow:

Yes. This is one of my favourite beefs with gcc. Some of the optimization
decisions seem to make no sense.

Your example is a good one, but my private beef has been in alias
handling. Alias analysis is an important part of optimization, and there's
two kinds: the static (and exact, aka "safe") kind that you can do
regardless of any language definitions, because you *know* that you aren't
actually changing behaviour, and the additional type-based heuristics that
the C language allows.

So which ones would you expect a compiler to consider more important?

And which one do you think gcc will use?

Right. You can have static analysis that *guarantees* that two objects
alias, but if gcc determins that they have different types and thus might
not alias, it decides to use the heuristic instead of the firm knowledge,
and generate code that doesn't work.

"Because the language definition allows it".

Oh well.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Li, Tong N on 7 May 2007 17:00

On Mon, 2007-05-07 at 19:52 +0530, Srivatsa Vaddagiri wrote:
> On Thu, May 03, 2007 at 08:53:47AM -0700, William Lee Irwin III wrote:
> > On Thu, May 03, 2007 at 08:23:18PM +0530, Srivatsa Vaddagiri wrote:
> > > And what about group scheduling extensions? Do you have plans to work on
> > > it? I was begining to work on a prototype to do group scheduling based
> > > on CFS, basically on the lines of what you and Linus had outlined
> > > earlier:
> > > http://lkml.org/lkml/2007/4/18/271
> > > http://lkml.org/lkml/2007/4/18/244
> >
> > Tong Li's Trio scheduler does a bit of this, though it doesn't seem to
> > have the mindshare cfs seems to have acquired.
> >
> > The hyperlink seems to have broken, though:
> > http://www.cs.duke.edu/~tongli/linux/linux-2.6.19.2-trio.patch
>
> The big question I have is, how well does DWRR fits into the "currently hot"
> scheduling frameworks like CFS? For ex: if the goal is to do
> fair (group) scheduling of SCHED_NORMAL tasks, can CFS and DWRR co-exist?
> Both seem to be radically different algorithms and my initial impressions
> of them co-existing is "No", but would be glad to be corrected if I am
> wrong. If they can't co-exist, then we need a different way of doing
> group scheduling on top of CFS, as that is gaining more popularity on
> account of better handling of interactivity.

Yeah, the intent of DWRR was to provide proportional fairness and rely
on the underlying scheduler to support interactivity. In a way, DWRR
ensures that each task receives its fair share, while the underlying
scheduler determines the right order to run the tasks. Since SD is
structurally similar to the stock scheduler, DWRR should co-exist with
it easily. Co-existing with CFS requires more work, but I think the
round-robin mechanism in DWRR could be applicable to CFS to facilitate
cross-processor fairness.

> Tong,
> I understand a center hallmark of DWRR is SMP fairness.
> Have you considered how bad/good the other alternative to achieve SMP fairness
> which is in vogue today : pressure/weight based balancing (ex: smpnice and
> CKRM CPU scheduler - ckrm.sourceforge.net/downloads/ckrm-ols03-slides.pdf)?
>

The disadvantage of DWRR is its potential overhead and the advantage is
it can provide stronger fairness. If we have 2 processors and 3 tasks,
DWRR ensures that each task gets 66% of the total CPU time, while
smpnice would keep two tasks on the same processor and the third one on
another. I did an analysis and showed that the lag bound of DWRR is
constant if task weights are bounded by a constant. On the other hand,
the cost DWRR pays is that it requires more task migrations. I tested
with a set of benchmarks on an SMP and didn't see migrations were
causing much performance impact, but this is certainly a big issue for
NUMA.

tong

PS. I'm now porting the code to the latest kernel and will post as soon
as I'm done.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Peter Williams on 7 May 2007 20:40

Esben Nielsen wrote:
>
>
> On Sun, 6 May 2007, Linus Torvalds wrote:
>
>>
>>
>> On Sun, 6 May 2007, Ingo Molnar wrote:
>>>
>>> * Linus Torvalds <torvalds(a)linux-foundation.org> wrote:
>>>
>>>> So the _only_ valid way to handle timers is to
>>>> - either not allow wrapping at all (in which case "unsigned" is
>>>> better,
>>>> since it is bigger)
>>>> - or use wrapping explicitly, and use unsigned arithmetic (which is
>>>> well-defined in C) and do something like "(long)(a-b) > 0".
>>>
>>> hm, there is a corner-case in CFS where a fix like this is necessary.
>>>
>>> CFS uses 64-bit values for almost everything, and the majority of values
>>> are of 'relative' nature with no danger of overflow. (They are signed
>>> because they are relative values that center around zero and can be
>>> negative or positive.)
>>
>> Well, I'd like to just worry about that for a while.
>>
>> You say there is "no danger of overflow", and I mostly agree that once
>> we're talking about 64-bit values, the overflow issue simply doesn't
>> exist, and furthermore the difference between 63 and 64 bits is not
>> really
>> relevant, so there's no major reason to actively avoid signed entries.
>>
>> So in that sense, it all sounds perfectly sane. And I'm definitely not
>> sure your "292 years after bootup" worry is really worth even
>> considering.
>>
>
> I would hate to tell mission control for Mankind's first mission to another
> star to reboot every 200 years because "there is no need to worry about
> it."
>
> As a matter of principle an OS should never need a reboot (with
> exception for upgrading). If you say you have to reboot every 200 years,
> why not every 100? Every 50? .... Every 45 days (you know what I am
> referring to :-) ?

There's always going to be an upper limit on the representation of time.
At least until we figure out how to implement infinity properly.

>
>> When we're really so well off that we expect the hardware and software
>> stack to be stable over a hundred years, I'd start to think about issues
>> like that, in the meantime, to me worrying about those kinds of issues
>> just means that you're worrying about the wrong things.
>>
>> BUT.
>>
>> There's a fundamental reason relative timestamps are difficult and almost
>> always have overflow issues: the "long long in the future" case as an
>> approximation of "infinite timeout" is almost always relevant.
>>
>> So rather than worry about the system staying up 292 years, I'd worry
>> about whether people pass in big numbers (like some MAX_S64
>> approximation)
>> as an approximation for "infinite", and once you have things like that,
>> the "64 bits never overflows" argument is totally bogus.
>>
>> There's a damn good reason for using only *absolute* time. The whole
>> "signed values of relative time" may _sound_ good, but it really sucks in
>> subtle and horrible ways!
>>
>
> I think you are wrong here. The only place you need absolute time is a
> for the clock (CLOCK_REALTIME). You waste CPU using a 64 bit
> representation when you could have used a 32 bit. With a 32 bit
> implementation you are forced to handle the corner cases with wrap
> around and too big arguments up front. With a 64 bit you hide those
> problems.

As does the other method. A 32 bit signed offset with a 32 bit base is
just a crude version of 64 bit absolute time.

>
> I think CFS would be best off using a 32 bit timer counting in micro
> seconds. That would wrap around in 72 minuttes. But as the timers are
> relative you will never be able to specify a timer larger than 36
> minuttes in the future. But 36 minuttes is redicolously long for a
> scheduler and a simple test limiting time values to that value would not
> break anything.

Except if you're measuring sleep times. I think that you'll find lots
of tasks sleep for more than 72 minutes.

Peter
--
Peter Williams pwil3058(a)bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Matt Mackall on 8 May 2007 01:40

On Mon, May 07, 2007 at 09:28:32AM -0700, Linus Torvalds wrote:
>
>
> On Mon, 7 May 2007, Esben Nielsen wrote:
> >
> > What is (long)(a-b) ? I have tried to look it up in the C99 standeard but I
> > can't find it. Maybe it is in the referred LIA-1 standeard, which I can't find
> > with google.
>
> I don't worry about non-2's-complement machines (they don't exist, and
> likely won't exist in the future either).

They do exist. And they run Linux. SLES9 in fact.

http://en.wikipedia.org/wiki/UNIVAC_1100/2200_series#UNISYS_ClearPath_IX_series

http://www.unisys.com/eprise/main/admin/corporate/doc/ClearPath_Plus_Dorado_Model_390_Server_Specification_Sheet.pdf

That machine is a direct descendant of the Univac 1101 from 1950 and
is still software-compatible with 1107s from 1962.

(Granted, they only run Linux on the x86 side.)

--
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Esben Nielsen on 8 May 2007 03:40

On Mon, 7 May 2007, Johannes Stezenbach wrote:

> On Mon, May 07, 2007, Linus Torvalds wrote:
>> On Mon, 7 May 2007, Esben Nielsen wrote:
>>>
>>> What is (long)(a-b) ? I have tried to look it up in the C99 standeard but I
>>> can't find it. Maybe it is in the referred LIA-1 standeard, which I can't find
>>> with google.
>
> C99 defines unsigned overflow semantics, but it doesn't say anything
> about signed overflow, thus it's undefined -- and you have a hard
> time finding it out.
>
> However, I have no clue *why* it's undefined and not
> implementation defined. Does someone know?
>
>> I don't worry about non-2's-complement machines (they don't exist, and
>> likely won't exist in the future either).
>
> I think DSPs can do saturated arithmetics (clamp to min/max
> values instead of wrap around). Not that it matters for Linux...
>
>> So I worry about compilers rewriting my code.
>
> gcc has -fwrapv and -ftrapv to change signed integer overflow
> behaviour.
>
> One baffling example where gcc rewrites code is when
> conditionals depend on signed integer overflow:
>
> $ cat xx.c
> #include <assert.h>
>
> int foo(int a)
> {
> assert(a + 100 > a);
> return a;
> }
>
> int bar(int a)
> {
> if (a + 100 > a)
> a += 100;
> return a;
> }
> $ gcc -Wall -Wextra -fomit-frame-pointer -c xx.c
> $ objdump -dr xx.o
>
> xx.o: file format elf32-i386
>
> Disassembly of section .text:
>
> 00000000 <foo>:
> 0: 8b 44 24 04 mov 0x4(%esp),%eax
> 4: c3 ret
>
> 00000005 <bar>:
> 5: 83 44 24 04 64 addl $0x64,0x4(%esp)
> a: 8b 44 24 04 mov 0x4(%esp),%eax
> e: c3 ret
>
>
> The assert and the condition were just dropped
> by gcc -- without any warning.
>
> gcc-4.2 will add -fstrict-overflow and -Wstrict-overflow.
> http://gcc.gnu.org/gcc-4.2/changes.html
>
>
> Johannes
>

This is contrary to C99 standeard annex H2.2
(http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf):

"An implementation that defines signed integer types as also being modulo need
not detect integer overflow, in which case, only integer divide-by-zero need
be detected."

So if it doesn't properly defines wrapping it has to detect integer
overflow, right?

gcc does niether with that optimization :-(

Esben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 3 4 5 6 7 8 9 10 11 12 13 14 15
Prev: Bug in current -git tree causing dbus and gnome to chew up cpu time
Next: Introduce O_CLOEXEC (take >2)