From: Al Viro on
On Sun, Feb 28, 2010 at 08:51:05AM +0100, Ingo Molnar wrote:

> ( Alas, ARM doesnt tend to be a big problem, at least as far as the facilities
> i'm concerned about go: it has implemented most of the core kernel
> infrastructures so there's few if any 'self inflicted' breakages that i can
> remember. )

FWIW, it might make sense to run cross-builds for many targets and post
the things that crop up + analysis to linux-arch... Any takers?

I haven't run a lot of cross-builds lately, but IME most of the breakage
tends to be less dramatic - somebody relying on indirect includes in
driver *or* forgetting to add "depends on" to Kconfig used to be the
most frequent case.

"let other targets rot" attitude has a very nasty effect - it snowballs.
At some point people *can't* check that their patches don't break things,
even if they want to. And that, IMO, sucks. At that point architecture
needs to be either removed or brought to the state when it builds in
mainline.

Note that we have filesystems that are built only on some architectures.
I don't know about you, but I *do* care about not leaving half-converted
interfaces in that area. For entirely rational reasons - people tend
to copy b0rken code from random places in the tree. Playing whack-a-mole
gets old pretty soon.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ingo Molnar on

* Al Viro <viro(a)ZenIV.linux.org.uk> wrote:

> On Sun, Feb 28, 2010 at 08:51:05AM +0100, Ingo Molnar wrote:
>
> > ( Alas, ARM doesnt tend to be a big problem, at least as far as the facilities
> > i'm concerned about go: it has implemented most of the core kernel
> > infrastructures so there's few if any 'self inflicted' breakages that i can
> > remember. )
>
> FWIW, it might make sense to run cross-builds for many targets and post the
> things that crop up + analysis to linux-arch... Any takers?
>
> I haven't run a lot of cross-builds lately, but IME most of the breakage
> tends to be less dramatic - somebody relying on indirect includes in driver
> *or* forgetting to add "depends on" to Kconfig used to be the most frequent
> case.
>
> "let other targets rot" attitude has a very nasty effect - it snowballs. At
> some point people *can't* check that their patches don't break things, even
> if they want to. And that, IMO, sucks. At that point architecture needs to
> be either removed or brought to the state when it builds in mainline.

What is happening right now is that our combined _costs_ snowball: generic
changes are burdened with the overhead of a thousand cuts ...

IMO either there's enough interest in keeping an architecture going, rooted in
_that_ architecture's importance (or the enthusiasm/clue of their developers),
or, after a few years of inactivity it really shouldnt be upstream.

Right now we are socializing all the costs, sometimes even pretending that all
architectures are equal. None of the costs really looks particularly large in
isolation, but the sum of them does exist and adds up in certain places of the
kernel.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stephen Rothwell on
Hi Al,

On Sun, 28 Feb 2010 08:19:22 +0000 Al Viro <viro(a)ZenIV.linux.org.uk> wrote:
>
> FWIW, it might make sense to run cross-builds for many targets and post
> the things that crop up + analysis to linux-arch... Any takers?

See http://kisskb.ellerman.id.au/kisskb/branch/9/ ... we just need
someone to read it regularly and post about them. There is a set of
builds of Linus' tree there as well (look under "Branches").

--
Cheers,
Stephen Rothwell sfr(a)canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
From: Rafael J. Wysocki on
On Sunday 28 February 2010, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
>
> > On Saturday 27 February 2010, Ingo Molnar wrote:
> > >
> > > * Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
> > >
> > > > > > Lets see. Over the last 60 days, I have reported 37 build errors. Of
> > > > > > these, 16 were reported against x86, 14 against ppc, 7 against other
> > > > > > archs.
> > > > >
> > > > > So only 43% of them were even relevant on the platform that 95+% of the
> > > > > Linux testers use? Seems to support the points i made.
> > > >
> > > > Well, I hope you don't mean that because the majority of bug reporters (vs
> > > > testers, the number of whom is unknown to me at least) use x86, we are free
> > > > to break the other architectures. ;-)
> > >
> > > It means exactly that: just like we 'can' break compilation with gcc296,
> > > ancient versions of binutils, odd bootloaders, can break the boot via odd
> > > hardware, etc. When someone uses that architectures then the 'easy'
> > > bugfixes will actually flow in very quickly and without much fuss
> >
> > Then I don't understand what the problem with getting them in at the
> > linux-next stage is. They are necessary anyway, so we'll need to add them
> > sooner or later and IMO the sooner the better.
>
> The problem is the dynamics and resulting (non-)cleanliness of code. We have
> architectures that have been conceptually broken for 5 years or more, but
> still those problems get blamed on the last change that 'causes' the breakage:
> the core kernel and the developers who try to make a difference.
>
> I think your perspective and your opinion is correct, while my perspective is
> real and correct as well - there's no contradiction really. Let me try to
> explain how i see it:
>
> You are working in a relatively well-designed piece of code which interfaces
> to the kernel in sane ways - kernel/power/* et al. You might break the
> cross-builds sometimes, but it's not very common, and in those cases it's
> usually your own fault and you are grateful for linux-next to have caught that
> stupidity. (i hope this a fair summary!)

Fair enough.

> I am not criticising that aspect of linux-next _at all_ - it's useful and
> beneficial - and i'd like to thank Stephen for all his hard work. Other
> aspects of linux-next useful as well: such as the patch conflict mediation
> role.

Great.

> But as it happens so often, people tend to talk more about the things that are
> not so rosy, not about the things that work well.
>
> The area i am worried about are new core kernel facilities and their
> development and extension of existing facilities. _Those_ facilities are
> affected by 'many architectures' in a different way from how you experience
> it: often we can do very correct changes to them, which still 'break' on some
> architecture due to _that architecture's conceptual fault_.
>
> Let me give you an example that happened just yesterday. My cross-testing
> found that a change in the tracing infrastructure code broke m32r and parisc.
>
> The breakage:
>
> /home/mingo/tip/kernel/trace/trace_clock.c:86: error: implicit declaration of function 'raw_local_irq_save'
> /home/mingo/tip/kernel/trace/trace_clock.c:112: error: implicit declaration of function 'raw_local_irq_restore'
> make[3]: *** [kernel/trace/trace_clock.o] Error 1
> make[3]: *** Waiting for unfinished jobs....
>
> Is was 'caused by':
>
> 18b4a4d: oprofile: remove tracing build dependency
>
> In linux-next this would be pinned to commit 18b4a4d, which would have to be
> reverted/fixed.
>
> Where does the _real_ blame lie? Clearly in the M32R and HP/PARISC code: why
> dont they, four years after it has been introduced as a core kernel facility
> in 2006, _still_ not support raw_local_irq_save()?

OK, I see your point.

> ( A similar situation occured in this very thread a well - before the subject
> of the thread - so it's a real and present problem. We didnt even get _any_
> reaction about that particular breakage from the affected architecture ... )
>
> These situations are magnified by how certain linux-next bugs are reported:
> the 'blame' is put on the new commit that exposes that laggy nature of certain
> architectures. Often the developers even believe this false notion and feel
> guilty for 'having broken' an architecture - often an architecture that has
> not contributed a single core kernel facility _in its whole existence_.
>
> The usual end result is that the path of least resistance is taken: the commit
> is reverted or worked around, while the 'laggy' architecture can continue
> business as usual and cause more similar bugs and hickups in the future ...
>
> I.e. there is extra overhead put on clearly 'good' efforts, while 'bad'
> behavior (parasitic hanging-on, passivity, indifference) is rewarded.
> Rewarding bad behavior is very clearly harmful to Linux in many regards, and i
> speak up when i see it.
>
> So i wish linux-next balanced these things more fairly towards those areas of
> code that are actually useful: if it ignored build breakages that are due to
> architectures being lazy - in fact if it required architectures to _help out_
> with the development of the kernel.
>
> The majority of build-bugs i see trigger in cross-builds (90% of which i catch
> before they get into linux-next) are of this nature, that's why i raised it in
> such a pointed way. Your (and many other people's) experience will differ - so
> you might see this as an unjustified criticism.

Thanks a lot for the clarification.

Best,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Bowler on
On 08:23 Sun 28 Feb , Ingo Molnar wrote:
>
> * Rafael J. Wysocki <rjw(a)sisk.pl> wrote:
>
> > > In fact those rare ways of building and booting the kernel i mentioned are
> > > probably used _more_ than half of the architectures that linux-next
> > > build-tests ...
> >
> > I don't know and you don't know either. That's just pure speculation and
> > therefore meaningless.
>
> We know various arch (and hardware) usage stats, such as:
>
> http://smolt.fedoraproject.org/static/stats/stats.html
>
> Today's stats, done amongst users who are willing to opt in to the Smolt
> daemon:
>
> x86: 99.7%
> powerpc: 0.3%
>
> x86 used to be 99.5 a year ago, so the world has become even more x86-centric.

This only tells us that _smolt users_ have become even more x86-centric.
As a self-selected sample, it is very likely a poor representative of
"the world" and any such extrapolation is indeed "pure speculation".

--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/