From: Donald Allen on
On Sun, May 16, 2010 at 7:36 PM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
> Donald,
>
> On Sat, 15 May 2010, Donald Allen wrote:
>> Attached. This is from the 2.6.30 kernel on the Arch Linux install cd.
>>
>> Here's another bit of data. As I've said previously, the problems I'm
>> reporting were observed on a Toshiba NB310-305 netbook with a
>> single-core Atom 450 processor. I just built myself a mini-ITX system
>> using the Intel D510MO motherboard, which provides a dual-core D510
>> Atom processor. The other hardware on the board is similar to the
>> Toshiba. I installed the same Slackware snapshot I used on the
>> Toshiba, and did the home directory transfer without any problem at
>> all with the default tickless kernel. The hardware isn't identical,
>> and while I don't know the internals of the Linux kernel at all, my
>> gut, backed up by many years of OS development work in scheduling and
>> memory management, is telling me that the key difference is dual- vs.
>> single-core. Just a guess.
>
> I fear you are wrong.

Please don't be afraid.

>
> The key difference is almost certainly that the BIOS of your netbook
> tries to be overly clever vs. power management and is not aware of the
> fact that the Linux kernel uses timer hardware in a very different way
> than the other OS which comes preinstalled on that machine.
>
> The overly clever BIOS power management which works nicely with the
> vendor provided "drivers" for the other OS is just interfering with
> the kernels way of dealing with the problem.
>
> Can you please boot with "hpet=disable" on the kernel command line ?

I did, and it made no difference.

To be specific, the test I am doing involves booting with the Arch
Linux 2009.08 install/live cd. I then run

fsck.ext2 -f -r /dev/sda3

to do a read-only check of my root filesystem. I watch the
disk-activity light, and it reliably goes out and then you've got a
long wait if you do nothing. Tickling the touchpad gets things moving
again. This happens reliably with or without the boot-time option you
requested above.

I just noticed something else, however, that may lend credence to the
opinion expressed by Arjan van de Ven that this has nothing to do with
tickless. I originally noticed this problem on the Toshiba netbook
when I installed the Slackware 13.1 x86_64 beta on this machine, which
comes with a tickless 2.6.33.3 kernel. The first symptom I observed
was attempting to rsync my home directory from another machine to this
new install and, as previously described, I had to help things along
by activating the touchpad, or pressing the ctrl key (any kind of
external stimulus that would generate an interrupt seemed to work).
Anyway, after some discussion with Patrick Volkerding, I decided to
build a custom kernel for the netbook and disabled tickless in that
kernel. After getting that kernel working, I re-did the tests that
failed with the tickless kernel and they all worked fine, so I thought
we had our culprit. But just now, after doing the test you requested
above, I rebooted the system from its installed kernel (the tickful
kernel I built), and it hung during booting. At first I thought it was
taking awhile to do the dance with the dhcp server, but after waiting
longer than I thought this should take, I touched the touchpad and
forward progress began again immediately (disk light came on, boot
time chatter proceeded, etc.). So, my current guess, for what it's
worth, is that there's a race here that causes the system to miss the
fact that it has a runnable process, and the probability of hitting it
is reduced, but not to zero, by using tickful scheduling.

I will do the experiment suggested by Arjan van de Ven and report the
results of that separately.

/Don


>
> Thanks,
>
> � � � �tglx
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on
On Mon, 17 May 2010 09:44:47 -0400
Donald Allen <donaldcallen(a)gmail.com> wrote:

> I will do the experiment suggested by Arjan van de Ven and report the
> results of that separately.


since you're losing interrupts.. another good option to try is "irqpoll"


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Donald Allen on
On Mon, May 17, 2010 at 9:44 AM, Donald Allen <donaldcallen(a)gmail.com> wrote:
> On Sun, May 16, 2010 at 7:36 PM, Thomas Gleixner <tglx(a)linutronix.de> wrote:
>> Donald,
>>
>> On Sat, 15 May 2010, Donald Allen wrote:
>>> Attached. This is from the 2.6.30 kernel on the Arch Linux install cd.
>>>
>>> Here's another bit of data. As I've said previously, the problems I'm
>>> reporting were observed on a Toshiba NB310-305 netbook with a
>>> single-core Atom 450 processor. I just built myself a mini-ITX system
>>> using the Intel D510MO motherboard, which provides a dual-core D510
>>> Atom processor. The other hardware on the board is similar to the
>>> Toshiba. I installed the same Slackware snapshot I used on the
>>> Toshiba, and did the home directory transfer without any problem at
>>> all with the default tickless kernel. The hardware isn't identical,
>>> and while I don't know the internals of the Linux kernel at all, my
>>> gut, backed up by many years of OS development work in scheduling and
>>> memory management, is telling me that the key difference is dual- vs.
>>> single-core. Just a guess.
>>
>> I fear you are wrong.
>
> Please don't be afraid.
>
>>
>> The key difference is almost certainly that the BIOS of your netbook
>> tries to be overly clever vs. power management and is not aware of the
>> fact that the Linux kernel uses timer hardware in a very different way
>> than the other OS which comes preinstalled on that machine.
>>
>> The overly clever BIOS power management which works nicely with the
>> vendor provided "drivers" for the other OS is just interfering with
>> the kernels way of dealing with the problem.
>>
>> Can you please boot with "hpet=disable" on the kernel command line ?
>
> I did, and it made no difference.
>
> To be specific, the test I am doing involves booting with the Arch
> Linux 2009.08 install/live cd. I then run
>
> fsck.ext2 -f -r /dev/sda3
>
> to do a read-only check of my root filesystem. I watch the
> disk-activity light, and it reliably goes out and then you've got a
> long wait if you do nothing. Tickling the touchpad gets things moving
> again. This happens reliably with or without the boot-time option you
> requested above.
>
> I just noticed something else, however, that may lend credence to the
> opinion expressed by Arjan van de Ven that this has nothing to do with
> tickless. I originally noticed this problem on the Toshiba netbook
> when I installed the Slackware 13.1 x86_64 beta on this machine, which
> comes with a tickless 2.6.33.3 kernel. The first symptom I observed
> was attempting to rsync my home directory from another machine to this
> new install and, as previously described, I had to help things along
> by activating the touchpad, or pressing the ctrl key (any kind of
> external stimulus that would generate an interrupt seemed to work).
> Anyway, after some discussion with Patrick Volkerding, I decided to
> build a custom kernel for the netbook and disabled tickless in that
> kernel. After getting that kernel working, I re-did the tests that
> failed with the tickless kernel and they all worked fine, so I thought
> we had our culprit. But just now, after doing the test you requested
> above, I rebooted the system from its installed kernel (the tickful
> kernel I built), and it hung during booting. At first I thought it was
> taking awhile to do the dance with the dhcp server, but after waiting
> longer than I thought this should take, I touched the touchpad and
> forward progress began again immediately (disk light came on, boot
> time chatter proceeded, etc.). So, my current guess, for what it's
> worth, is that there's a race here that causes the system to miss the
> fact that it has a runnable process, and the probability of hitting it
> is reduced, but not to zero, by using tickful scheduling.
>
> I will do the experiment suggested by Arjan van de Ven and report the
> results of that separately.

I just booted with pci=nomsi and the fsck ran normally without any
help from my finger on the touchpad. So I think Arjan is closing in on
this ...

/Don

>
> /Don
>
>
>>
>> Thanks,
>>
>> � � � �tglx
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Donald Allen on
On Mon, May 17, 2010 at 10:04 AM, Arjan van de Ven <arjan(a)infradead.org> wrote:
> On Mon, 17 May 2010 09:44:47 -0400
> Donald Allen <donaldcallen(a)gmail.com> wrote:
>
>> I will do the experiment suggested by Arjan van de Ven and report the
>> results of that separately.

Just for my own information, is this correct:

I assume that tickless scheduling, rather than relying on periodic
clock interrupts to wake up the scheduler, relies on interrupt
handlers to somehow signal the system that the scheduler needs to run
because they've just processed an event that has changed the state of
the system?

If so, then it looks like using the msi-style device-specific
interrupts isn't working reliably on this hardware? Or somehow the
kernel (or a driver) is failing to handle the interrupts properly with
msi enabled on certain hardware? I mention the latter only because of
the report yesterday from someone else seeing the same symptoms I am
on completely different hardware.

/Don

>
>
> since you're losing interrupts.. another good option to try is "irqpoll"
>
>
> --
> Arjan van de Ven � � � �Intel Open Source Technology Centre
> For development, discussion and tips for power savings,
> visit http://www.lesswatts.org
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Arjan van de Ven on
On Mon, 17 May 2010 10:11:51 -0400
Donald Allen <donaldcallen(a)gmail.com> wrote:

> On Mon, May 17, 2010 at 10:04 AM, Arjan van de Ven
> <arjan(a)infradead.org> wrote:
> > On Mon, 17 May 2010 09:44:47 -0400
> > Donald Allen <donaldcallen(a)gmail.com> wrote:
> >
> >> I will do the experiment suggested by Arjan van de Ven and report
> >> the results of that separately.
>
> Just for my own information, is this correct:
>
> I assume that tickless scheduling, rather than relying on periodic
> clock interrupts to wake up the scheduler, relies on interrupt
> handlers to somehow signal the system that the scheduler needs to run
> because they've just processed an event that has changed the state of
> the system?

well.. it relies on the hardware to signal the kernel that there's work
pending for a specific device.

technically this is true for both tickless and without tickless.
but without tickless there's so much activity in the system that it
never really goes quiet (and in fact, some different power management
decisions may be made because of that)

>
> If so, then it looks like using the msi-style device-specific
> interrupts isn't working reliably on this hardware? Or somehow the

that looks like a correct assumption to me.

> kernel (or a driver) is failing to handle the interrupts properly with
> msi enabled on certain hardware? I mention the latter only because of
> the report yesterday from someone else seeing the same symptoms I am
> on completely different hardware.

BIOSes breaking MSI is not entirely uncommon. Windows XP does not use
MSI for various things Linux does use MSI for, and so machines that come
with XP by default may not have this feature very well tested
unfortunately.

>
> /Don
>
> >
> >
> > since you're losing interrupts.. another good option to try is
> > "irqpoll"
> >
> >
> > --
> > Arjan van de Ven        Intel Open Source Technology Centre
> > For development, discussion and tips for power savings,
> > visit http://www.lesswatts.org
> >


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/