From: Ingo Molnar on

* Rafael J. Wysocki <rjw(a)sisk.pl> wrote:

> On Saturday 27 February 2010, Ingo Molnar wrote:
> >
> > * Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
> >
> > > > > Lets see. Over the last 60 days, I have reported 37 build errors. Of
> > > > > these, 16 were reported against x86, 14 against ppc, 7 against other
> > > > > archs.
> > > >
> > > > So only 43% of them were even relevant on the platform that 95+% of the
> > > > Linux testers use? Seems to support the points i made.
> > >
> > > Well, I hope you don't mean that because the majority of bug reporters (vs
> > > testers, the number of whom is unknown to me at least) use x86, we are free
> > > to break the other architectures. ;-)
> >
> > It means exactly that: just like we 'can' break compilation with gcc296,
> > ancient versions of binutils, odd bootloaders, can break the boot via odd
> > hardware, etc. When someone uses that architectures then the 'easy'
> > bugfixes will actually flow in very quickly and without much fuss
>
> Then I don't understand what the problem with getting them in at the
> linux-next stage is. They are necessary anyway, so we'll need to add them
> sooner or later and IMO the sooner the better.

The problem is the dynamics and resulting (non-)cleanliness of code. We have
architectures that have been conceptually broken for 5 years or more, but
still those problems get blamed on the last change that 'causes' the breakage:
the core kernel and the developers who try to make a difference.

I think your perspective and your opinion is correct, while my perspective is
real and correct as well - there's no contradiction really. Let me try to
explain how i see it:

You are working in a relatively well-designed piece of code which interfaces
to the kernel in sane ways - kernel/power/* et al. You might break the
cross-builds sometimes, but it's not very common, and in those cases it's
usually your own fault and you are grateful for linux-next to have caught that
stupidity. (i hope this a fair summary!)

I am not criticising that aspect of linux-next _at all_ - it's useful and
beneficial - and i'd like to thank Stephen for all his hard work. Other
aspects of linux-next useful as well: such as the patch conflict mediation
role.

But as it happens so often, people tend to talk more about the things that are
not so rosy, not about the things that work well.

The area i am worried about are new core kernel facilities and their
development and extension of existing facilities. _Those_ facilities are
affected by 'many architectures' in a different way from how you experience
it: often we can do very correct changes to them, which still 'break' on some
architecture due to _that architecture's conceptual fault_.

Let me give you an example that happened just yesterday. My cross-testing
found that a change in the tracing infrastructure code broke m32r and parisc.

The breakage:

/home/mingo/tip/kernel/trace/trace_clock.c:86: error: implicit declaration of function 'raw_local_irq_save'
/home/mingo/tip/kernel/trace/trace_clock.c:112: error: implicit declaration of function 'raw_local_irq_restore'
make[3]: *** [kernel/trace/trace_clock.o] Error 1
make[3]: *** Waiting for unfinished jobs....

Is was 'caused by':

18b4a4d: oprofile: remove tracing build dependency

In linux-next this would be pinned to commit 18b4a4d, which would have to be
reverted/fixed.

Where does the _real_ blame lie? Clearly in the M32R and HP/PARISC code: why
dont they, four years after it has been introduced as a core kernel facility
in 2006, _still_ not support raw_local_irq_save()?

( A similar situation occured in this very thread a well - before the subject
of the thread - so it's a real and present problem. We didnt even get _any_
reaction about that particular breakage from the affected architecture ... )

These situations are magnified by how certain linux-next bugs are reported:
the 'blame' is put on the new commit that exposes that laggy nature of certain
architectures. Often the developers even believe this false notion and feel
guilty for 'having broken' an architecture - often an architecture that has
not contributed a single core kernel facility _in its whole existence_.

The usual end result is that the path of least resistance is taken: the commit
is reverted or worked around, while the 'laggy' architecture can continue
business as usual and cause more similar bugs and hickups in the future ...

I.e. there is extra overhead put on clearly 'good' efforts, while 'bad'
behavior (parasitic hanging-on, passivity, indifference) is rewarded.
Rewarding bad behavior is very clearly harmful to Linux in many regards, and i
speak up when i see it.

So i wish linux-next balanced these things more fairly towards those areas of
code that are actually useful: if it ignored build breakages that are due to
architectures being lazy - in fact if it required architectures to _help out_
with the development of the kernel.

The majority of build-bugs i see trigger in cross-builds (90% of which i catch
before they get into linux-next) are of this nature, that's why i raised it in
such a pointed way. Your (and many other people's) experience will differ - so
you might see this as an unjustified criticism.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Rafael J. Wysocki <rjw(a)sisk.pl> wrote:

> > - and without burdening developers to consider cases they have no good
> > ways to test. Why should rare architectures be more important than those
> > other rare forms of Linux usage?
>
> Because the Linus' tree is supposed to build on those architectures. [...]

That's not actually true: Linus on multiple occasions has said that only the
major architectures (x86, powerpc, ARM and a few others) are 'required' to
build and that the others should be left to fail to build and should be
_forced to get their act together_.

> [...] As long as that's the case, linux-next should build on them too.

No, and IMO linux-next is clearly over-interpreting this bit. Linux is not
supposed to build on all architectures. Maybe that's a core bit of a
misunderstanding (on either my or on sfr's side) and it should be clarified
....

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Rafael J. Wysocki <rjw(a)sisk.pl> wrote:

> > In fact those rare ways of building and booting the kernel i mentioned are
> > probably used _more_ than half of the architectures that linux-next
> > build-tests ...
>
> I don't know and you don't know either. That's just pure speculation and
> therefore meaningless.

We know various arch (and hardware) usage stats, such as:

http://smolt.fedoraproject.org/static/stats/stats.html

Today's stats, done amongst users who are willing to opt in to the Smolt
daemon:

x86: 99.7%
powerpc: 0.3%

x86 used to be 99.5 a year ago, so the world has become even more x86-centric.

There's also the kerneloops.org client, which shows in excess of 95% x86 usage
as well. You can also grep the linux-kernel folder for arch signatures, etc.

And yes, there are millions of ARM (and MIPS) CPUs running Linux as well.
(They are only as present as present their developers are: the users almost
never show up on linux-kernel.)

Plus, a kernel subsystem maintainer like me who does lots of kernel
infrastructure work can have a pretty good gut feeling about which
architectures are actively helping out Linux, and which are just hanging on to
the bandwagon.

So i respectfully disagree with your 'pure speculation' bit. Yes, it's
somewhat of a guessing game, as so many things in life - but the trend is very
clear.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stephen Rothwell on
Hi Ingo,

On Sun, 28 Feb 2010 08:14:05 +0100 Ingo Molnar <mingo(a)elte.hu> wrote:
>
> > [...] As long as that's the case, linux-next should build on them too.
>
> No, and IMO linux-next is clearly over-interpreting this bit. Linux is not
> supposed to build on all architectures. Maybe that's a core bit of a
> misunderstanding (on either my or on sfr's side) and it should be clarified
> ...

Well, we have no real problem then. The only architectures for which a
failure will stop new stuff getting into linux-next are the ones I
personally build while constructing the tree (x86, ppc and sparc). Once
something is in linux-next, even if it causes a build failure overnight,
I am loath to remove it again as it can cause pain for Andrew (who bases
-mm on linux-next).

I will still report such failures (if I have time to notice them - I
mostly hope that architecture maintainers will have a glance over the
build results themselves) and others do as well but such failures do not
generally cause any actions on my part (except in rare cases I may
actually fix the problem or put a provided fix patch in linux-next).

I would like to add arm to the mix of the architectures I build during
construction, but there is no wide ranging config that builds for arm and
building a few of the configs would just end up taking too much time.

Thanks for clarifying.
--
Cheers,
Stephen Rothwell sfr(a)canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: Ingo Molnar on

* Stephen Rothwell <sfr(a)canb.auug.org.au> wrote:

> Hi Ingo,
>
> On Sun, 28 Feb 2010 08:14:05 +0100 Ingo Molnar <mingo(a)elte.hu> wrote:
> >
> > > [...] As long as that's the case, linux-next should build on them too.
> >
> > No, and IMO linux-next is clearly over-interpreting this bit. Linux is not
> > supposed to build on all architectures. Maybe that's a core bit of a
> > misunderstanding (on either my or on sfr's side) and it should be clarified
> > ...
>
> Well, we have no real problem then. The only architectures for which a
> failure will stop new stuff getting into linux-next are the ones I
> personally build while constructing the tree (x86, ppc and sparc). Once
> something is in linux-next, even if it causes a build failure overnight, I
> am loath to remove it again as it can cause pain for Andrew (who bases -mm
> on linux-next).

Ok - very good. This has apparently been relaxed some time ago, i know
linux-next used to be more stringent.

> I will still report such failures (if I have time to notice them - I mostly
> hope that architecture maintainers will have a glance over the build results
> themselves) and others do as well but such failures do not generally cause
> any actions on my part (except in rare cases I may actually fix the problem
> or put a provided fix patch in linux-next).

Yeah. Plus it's never black and white - sometimes a rare arch will show some
real crappiness in a commit. So we want to know all bugs.

> I would like to add arm to the mix of the architectures I build during
> construction, but there is no wide ranging config that builds for arm and
> building a few of the configs would just end up taking too much time.

Yeah, ARM is clearly important from a usage share POV IMHO, and it's also
actively driving many areas of interest.

It's also a bit difficult to keep ARM going because there's so many
non-standardized hw variants of ARM, so i'm sure the ARM folks will appreciate
us not breaking them ...

( Alas, ARM doesnt tend to be a big problem, at least as far as the facilities
i'm concerned about go: it has implemented most of the core kernel
infrastructures so there's few if any 'self inflicted' breakages that i can
remember. )

> Thanks for clarifying.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/