stable? quality assurance? [Kernel]

Prev: KVM: MMU: introduce pte_prefetch_topup_memory_cache()
Next: NET_NS: unregister_netdevice: waiting for lo to become free (after using openvpn) (was Re: sysfs bug when using tun with network namespaces)

From: Stefan Richter on 12 Jul 2010 19:10

Martin Steigerwald wrote:
> I think I wait for 2.6.34.2 or .3 and then try again. If it then happens
> again, hopefully in a moment where I have nerve to deal with such bugs, I
> fire up my second notebook and try to SSH into the machine. If that works I
> at least could look into dmesg and X.org logs.

netconsole might be required.

....
> Is the Linux kernel development really in balance with feature work and
> stabilization work? Currently at least from my personal perception it is
> not. Development goes that fast - can you all cope with that speed? Maybe
> its just time to *slow it down* a bit?

If those who added the regressions are found out and asked to debug and
fix them, the balance should be corrected and perhaps more precautions
being taken in the future. Alas, finding the point in history at which
the kernel regressed might take a lot more time than to actually fix it
then. In that case, maybe give the author of the bug an estimate of the
volunteered hours that were spent on reporting this bug, to put the
repercussions into it into perspective. OTOH I suspect a lack of
responsibility at the developers is not so much an issue here, more so
that the number of people who take the time for -rc tests (not to
mention linux-next tests) _and_ to file reports is rather low. Plus, a
good bug report often requires experience or good intuition, besides
patience and rigor.

There were discussions in the past on how more enthusiasts who are
willing and able to test prereleases could be attracted. But maybe
(just maybe) there are more ways in which the developers themselves
could perform more extensive/ more systematic tests.
--
Stefan Richter
-=====-==-=- -=== -==-=
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Martin Steigerwald on 13 Jul 2010 06:40

Am Dienstag 13 Juli 2010 schrieb Stefan Richter:
> ...
>
> > Is the Linux kernel development really in balance with feature work
> > and stabilization work? Currently at least from my personal
> > perception it is not. Development goes that fast - can you all cope
> > with that speed? Maybe its just time to slow it down a bit?
>
> If those who added the regressions are found out and asked to debug and
> fix them, the balance should be corrected and perhaps more precautions
> being taken in the future. Alas, finding the point in history at which
> the kernel regressed might take a lot more time than to actually fix it
> then. In that case, maybe give the author of the bug an estimate of
> the volunteered hours that were spent on reporting this bug, to put
> the repercussions into it into perspective. OTOH I suspect a lack of
> responsibility at the developers is not so much an issue here, more so
> that the number of people who take the time for -rc tests (not to
> mention linux-next tests) and to file reports is rather low. Plus, a
> good bug report often requires experience or good intuition, besides
> patience and rigor.
>
> There were discussions in the past on how more enthusiasts who are
> willing and able to test prereleases could be attracted. But maybe
> (just maybe) there are more ways in which the developers themselves
> could perform more extensive/ more systematic tests.

Well I reported it now, although it contains not nearly as much
information on how to reproduce it or any other debug information either.
I just did not report it before cause I didn't find the information I can
provide very helpful and until yesterday I thought it might just have been
these two freezes and thats it. But maybe report it early is better than
not to report it at all.

Bug 16376 - random - possibly Radeon DRM KMS related - freezes
https://bugzilla.kernel.org/show_bug.cgi?id=16376

I will look in the logs whether I might have luck and find anything this
afternoon when my students learn vi/vim, but I doubt it.

Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7

From: Alejandro Riveira Fernández on 13 Jul 2010 07:20

El Sun, 11 Jul 2010 16:51:42 +0200
Martin Steigerwald <Martin(a)lichtvoll.de> escribiÃ³:

>
> One reason for a demand for me is best expressed by this question: Does
> the kernel developer community want to encourage that a group of advanced
> Linux users - but mostly non-developers - compile their own vanilla or
> valnilla near kernels, provide wider testing and report a bug now and
> then?
>
> I can live with either answer. If not, I just will be much more reluctant
> to try out new kernels.

I for one stopped booting into -rc kernels.
The fact that still have to patch my kernels with a *one* liner
since 2.6.29 kernel [1] does not give me confidence on the "test
report/bisect and it will be fixed" promise some have made in this
threath

[1] https://bugzilla.kernel.org/show_bug.cgi?id=13362

From: Theodore Tso on 13 Jul 2010 13:00

On Jul 12, 2010, at 11:56 AM, David Newall wrote:

> Thus 2.6.34 is the latest gamma-test kernel. It's not stable and I doubt anybody honestly thinks otherwise.

Stable is relative. Some people are willing to consider
Fedora "stable". Other people will only use a RHEL
kernel, and there are those who are using RHEL 4
or even RHEL 3 because they are extremely risk-adverse.

So arguments about whether or not a specific kernel
version deserves to be called "stable" is going to be
a waste of time and electrons because it's all about
expectations.

But the one huge thing that people are forgetting is that
the fundamental premise behind open source is "scratch
your own itch". That means that people who own a
specific piece of hardware have to collectively be responsible
for making sure that it works. It's not possible for me to
assure that some eSATA PCMCIA card on a T23 laptop
still works, because I don't own the hardware. So the only
way we know whether or not there is a regression is
there is *someone* who owns that hardware which is
willing to try it out, hopefully during -rc3 or -rc4, and let
us known if there is a problem, and hopefully help us
debug the problem.

If you have people saying, "-rc3 isn't stable", I'll wait until
"-rc5" to test things, then it will be that much later before
we discover a potential problem with the T23 laptop, and
before we can fix it. If people say, "2.6.34.0" isn't stable,
I refuse to run a kernel until "2.6.34.4", then if they are the
only person with the T23 eSata device, then we won't hear
about the problem until 2.6.34.4, and it might not get fixed
until 2.6.34.5 or 2.6.34.6!

What this means is yes that stable basically means, "stable
for the core kernel developers". You can say that this isn't
correct, and maybe even dishonest, but if we wait until 2.6.34.N
before we call a release "stable", and this discourages users
from testing 2.6.34.M for M<N, it just delays when bugs will
be found and fixed.

This is why to me, arguing that 2.6.34.0 is not "stable" really
isn't useful. If you really want to frequently update your kernel
and use the latest and greatest, part of the price that you have
to pay is to help us with the testing, bug reporting, and root
cause determination.

If you don't like this, your other choice is to pay $$$ to the
folks who provide support for Solaris and OS X, and accept
the restrictions in hardware implied by Solaris and OS X.
(Hint: neither supports a Thinkpad T23.) But to compare
Linux, especially the non-distribution source code distribution
from kernel.org with operating systems that have very different
business models is to really and fundamentally understand
how things work in the Linux world.

If you want that kind of stability, then you will need to use an
older kernel. Or use a distribution kernel which has a support
and testing and business model compatible with your desires.
Fedora for example uses kernels which are six months out of
date, because during those six months, the people who use the
testing versions of Fedora are doing testing and helping with
the bug fixing. Red Hat uses this free testing pool to improve
the testing and stability of Red Hat Enterprise Linux, so if you
are willing to live with a 2-3 year release cycle, RHEL will be
more stable than Fedora. And if you need to make sure that
bugs are fixed very quickly, and you can call and demand
a developer's attention, you can pay $$$ for a support contract.

I will say once again. There is no such thing as a free lunch.
Linux is a better deal than most, and you have multiple
choices about how frequently you update, whether you let
someone else decide whether or not a particular kernel
release plus patches is "stable", or more accurately,
"stable enough", and you can choose how much you are willing
to pay, either in personal time and effort, or $$$ to some support
organization.

But demanding that kernel.org become "more stable" when it
is supported by purely volunteers is simply not reasonable.

-- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Newall on 13 Jul 2010 16:50

Theodore Tso wrote:
> What this means is yes that stable basically means, "stable
> for the core kernel developers". You can say that this isn't
> correct, and maybe even dishonest, but if we wait until 2.6.34.N
> before we call a release "stable", and this discourages users
> from testing 2.6.34.M for M<N, it just delays when bugs will
> be found and fixed.
>

Calling it stable instils and reinforces a Pavlovian response in typical
users, that recent Linux kernels are dangerous and unreliable; one year
old was suggested as a safe benchmark. Typical users being 99% of the
population, testing hardly begins until a kernel is "sufficiently old."
This Pavlovian response is what really delays finding and fixing bugs.
Being up-front and saying which kernels are likely to fail would help
many users calculate the risk and improve their willingness to try newer
kernels. "Sufficiently old" might well come down to six months, maybe four.

That is to say, instead of taking a year to pass gamma-testing, new
kernels could be passed in six months or less. That would be a big
improvement in stability and quality assurance however you dice it.

> But demanding that kernel.org become "more stable" when it
> is supported by purely volunteers is simply not reasonable.

Let's not be hysterical; nobody made any demands. Semantics aside, the
suggestion is reasonable because it affects developers' workloads not
one whit. The only change is the label that Linus applies to new releases.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: KVM: MMU: introduce pte_prefetch_topup_memory_cache()
Next: NET_NS: unregister_netdevice: waiting for lo to become free (after using openvpn) (was Re: sysfs bug when using tun with network namespaces)