From: Jason Baron on
On Thu, Jun 10, 2010 at 12:13:39PM -0400, Mathieu Desnoyers wrote:
> * Jason Baron (jbaron(a)redhat.com) wrote:
> > On Thu, Jun 10, 2010 at 02:14:40PM +0200, Ingo Molnar wrote:
> > > * Peter Zijlstra <peterz(a)infradead.org> wrote:
> > >
> > > > On Wed, 2010-06-09 at 17:39 -0400, Jason Baron wrote:
> > > > > + select HAVE_ARCH_JUMP_LABEL if !CC_OPTIMIZE_FOR_SIZE
> > > >
> > > > That deserves a comment somewhere, it basically makes OPTIMIZE_FOR_SIZE
> > > > useless...
> > >
> > > Hm, we need more than a comment for that - distros enable CC_OPTIMIZE_FOR_SIZE
> > > all the time, for the massive kernel image (and hotpath cache footprint)
> > > savings. Is this fixable?
> > >
> > > Thanks,
> > >
> > > Ingo
> > >
> >
> > When I tested 'jump label' with CC_OPTIMIZE_FOR_SIZE, I saw a small
> > performance drop , b/c there is less block re-ordering happening.
>
> Is this a performance drop compared to a jump-label-less kernel or
> compared to -O2 kernel compiled with jump labels ? Or both ?
>
> Mathieu
>

Hi Mathieu,

So I'm quoting tbench benchmark here. The performance drop was jump
label vs. all jump label patches backed out on -Os. If we move to -02,
both the no jump label patches and the jump label patches applied are
faster than all jump label patches backed out on -Os.

so:

jump labels -02 > no jump labels -02 > no jump labels -0s > jump lables
-Os

thanks,

-Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: H. Peter Anvin on
On 06/10/2010 05:14 AM, Ingo Molnar wrote:
>
> * Peter Zijlstra <peterz(a)infradead.org> wrote:
>
>> On Wed, 2010-06-09 at 17:39 -0400, Jason Baron wrote:
>>> + select HAVE_ARCH_JUMP_LABEL if !CC_OPTIMIZE_FOR_SIZE
>>
>> That deserves a comment somewhere, it basically makes OPTIMIZE_FOR_SIZE
>> useless...
>
> Hm, we need more than a comment for that - distros enable CC_OPTIMIZE_FOR_SIZE
> all the time, for the massive kernel image (and hotpath cache footprint)
> savings. Is this fixable?
>

Actually the current reports from the gcc community is that gcc 4.5.0 +
-Os produces a broken kernel even without asm goto:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44129

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* H. Peter Anvin <hpa(a)zytor.com> wrote:

> On 06/10/2010 05:14 AM, Ingo Molnar wrote:
> >
> > * Peter Zijlstra <peterz(a)infradead.org> wrote:
> >
> >> On Wed, 2010-06-09 at 17:39 -0400, Jason Baron wrote:
> >>> + select HAVE_ARCH_JUMP_LABEL if !CC_OPTIMIZE_FOR_SIZE
> >>
> >> That deserves a comment somewhere, it basically makes OPTIMIZE_FOR_SIZE
> >> useless...
> >
> > Hm, we need more than a comment for that - distros enable
> > CC_OPTIMIZE_FOR_SIZE all the time, for the massive kernel image (and
> > hotpath cache footprint) savings. Is this fixable?
>
> Actually the current reports from the gcc community is that gcc 4.5.0 + -Os
> produces a broken kernel even without asm goto:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44129

Well, most distros havent switched to gcc 4.5 yet (even rawhide is still on
4.4 + backports) and i guess the lack of testing shows. New GCC versions have
a long history of causing bugs in the kernel.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Andi Kleen <andi(a)firstfloor.org> wrote:

> > > A much better to get smaller kernel images is to do more __cold
> > > annotations for slow paths. Newer gcc will then simply only do -Os for
> > > these functions.
> >
> > That's an opt-in method and we cannot reach the kinds of 30% code size
> > reductions that -Os can achieve. Most code in the kernel is not cache-hot,
> > even on microbenchmarks.
>
> Maybe, maybe not. But yes it can be approached from both ways.

You dont seem to have understood my point: there's a big difference between an
opt-in and an opt-out model.

What you are arguing for is a 'bloaty code generator by default' model and
that model sucks.

Trying to achieve reductions by opt-in marking functions as a 'please reduce
it' __cold marker is a losing battle: most new kernel code is 'cold' and
should be reduced, yet most new code does not (and will not) come with __cold
markers.

The proper model is to assume that everything should be conservatively
size-reduced (because, almost by definition, 90% of new kernel code should
stay small and should stay out of the way), and where benchmarks+importance
proves it we can allow bloatier code generator via __hot.

Important codepaths can get __hot annotations just as much as they are
receiving 'inline' optimizations and other kinds of hand-tuning attention.

> Personally I would prefer to simply write less bloated code to get code
> reductions. Simpler code is often faster too.

You are posing this as an if-else choice, while in reality both should be
done: the best result is to write simpler/faster code _and_ to have a
compact-by-default code generator too ...

> > A much better model would be to actively mark hot codepaths with a __hot
> > attribute instead. Then the code size difference can be considered on a
> > case by case basis.
>
> Yes that works too for those who still use -Os.
>
> e.g. marking the scheduler and a few mm hot paths this way would certain
> make sense.

Possibly, but not without substantiating the rather vague statements you have
made so far.

If you are sending such per function annotation patches then you need to come
up with actual hard numbers as well. One convenient way to measure such things
is a before/after "perf stat --repeat" run - as the noise estimations can be
compared and we can see whether there's a provable effect. (And, of course,
disassembly of GCC suckage is helpful as well.)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on
> What you are arguing for is a 'bloaty code generator by default' model and
> that model sucks.

I am arguing for a "non sucky code by default" model.

It is widely known that "sucky code by default" sucks already,
that is why the big distros made their choice.

Anyways luckily the default is all config options so we don't need
to agree on this (and the best choice likely varies by workload
too)

> Possibly, but not without substantiating the rather vague statements you have
> made so far.

Yes, more data with recent builds is needed for concrete changes.

BTW afaik the "icache over everything" model was never really
substantiated by all that much data either, just somehow
it became dogma.

I must say I was a bit burned by doing annotations -- i added
unlikely() originally and as far as I can see most unlikely()s
are quite useless today because they do nothing the compiler
doesn't do already so I would prefer to not repeat that.

So my personal preference is actually less annotations over more.

-Andi

--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/