From: Pekka Enberg on
On Thu, Jul 15, 2010 at 10:16 PM, Damien Wyart <damien.wyart(a)free.fr> wrote:
>> > > I also did not have anymore hangs and random bad moods of my CPUs
>> > > that all of a sudden grab 100% of all 8 cores of my CPU power across
>> > > my machine since I disabled
>> > > CONFIG_NO_BOOTMEM:
>
>> * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 18:54]:
>> > Interesting. Damien, does disabling CONFIG_NO_BOOTMEM fix you problem too?
>
>> I will test in the coming hours, and report back tomorrow... Just
>> recompiled 2.6.35-rc5-git1 with this option disabled.
>
> For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ;
> with the option and rc5 the problem was happening quite quickly after
> boot and normal use of the machine. So it seems I can confirme what Zeno
> has seen and I hope this will give a hint to debug the problem. I guess
> this has not been reported that much because many testers might not have
> enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their
> benchmark with a kernel having this option enabled?

To be honest, the bug is bit odd. It's related to boot-time memory
allocator changes but yet it seems to manifest itself as a scheduling
problem. So if you have some spare time and want to speed up the
debugging process, please test v2.6.34 and v2.6.35-rc1 with
CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see
if you can identify the offending commit with "git bisect."

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Damien Wyart on
> > For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ;
> > with the option and rc5 the problem was happening quite quickly after
> > boot and normal use of the machine. So it seems I can confirme what Zeno
> > has seen and I hope this will give a hint to debug the problem. I guess
> > this has not been reported that much because many testers might not have
> > enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their
> > benchmark with a kernel having this option enabled?

* Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 22:50]:
> To be honest, the bug is bit odd. It's related to boot-time memory
> allocator changes but yet it seems to manifest itself as a scheduling
> problem. So if you have some spare time and want to speed up the
> debugging process, please test v2.6.34 and v2.6.35-rc1 with
> CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see
> if you can identify the offending commit with "git bisect."

Not sure I will have enough time in the coming days (doing that remotely
is fishy since ssh access is almost stuck when the problem occurs); if
Zeno can and would like to do it, maybe this could be done faster.

As the scheduler is now very well instrumented (many debugging features
are available), reproducing the bug on a test platform (it happens quite
quickly for me) might also give some hints. So testers, if you have
time, please test 2.6.35-rc5 with CONFIG_NO_BOOTMEM on a Core i7 and see
if you can reproduce the problem!

--
Damien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Zeno Davatz on
Am 15.07.2010 um 22:00 schrieb Damien Wyart <damien.wyart(a)free.fr>:

>>> For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ;
>>> with the option and rc5 the problem was happening quite quickly after
>>> boot and normal use of the machine. So it seems I can confirme what Zeno
>>> has seen and I hope this will give a hint to debug the problem. I guess
>>> this has not been reported that much because many testers might not have
>>> enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their
>>> benchmark with a kernel having this option enabled?
>
> * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 22:50]:
>> To be honest, the bug is bit odd. It's related to boot-time memory
>> allocator changes but yet it seems to manifest itself as a scheduling
>> problem. So if you have some spare time and want to speed up the
>> debugging process, please test v2.6.34 and v2.6.35-rc1 with
>> CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see
>> if you can identify the offending commit with "git bisect."
>
> Not sure I will have enough time in the coming days (doing that remotely
> is fishy since ssh access is almost stuck when the problem occurs); if
> Zeno can and would like to do it, maybe this could be done faster.
>
> As the scheduler is now very well instrumented (many debugging features
> are available), reproducing the bug on a test platform (it happens quite
> quickly for me) might also give some hints. So testers, if you have
> time, please test 2.6.35-rc5 with CONFIG_NO_BOOTMEM on a Core i7 and see
> if you can reproduce the problem!

Will try to do so. Can you point me to the git bisect howto with the versions you want.

Best
Zeno--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pekka Enberg on
On Thu, Jul 15, 2010 at 11:38 PM, Zeno Davatz <zdavatz(a)gmail.com> wrote:
> Am 15.07.2010 um 22:00 schrieb Damien Wyart <damien.wyart(a)free.fr>:
>
>>>> For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ;
>>>> with the option and rc5 the problem was happening quite quickly after
>>>> boot and normal use of the machine. So it seems I can confirme what Zeno
>>>> has seen and I hope this will give a hint to debug the problem. I guess
>>>> this has not been reported that much because many testers might not have
>>>> enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their
>>>> benchmark with a kernel having this option enabled?
>>
>> * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 22:50]:
>>> To be honest, the bug is bit odd. It's related to boot-time memory
>>> allocator changes but yet it seems to manifest itself as a scheduling
>>> problem. So if you have some spare time and want to speed up the
>>> debugging process, please test v2.6.34 and v2.6.35-rc1 with
>>> CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see
>>> if you can identify the offending commit with "git bisect."
>>
>> Not sure I will have enough time in the coming days (doing that remotely
>> is fishy since ssh access is almost stuck when the problem occurs); if
>> Zeno can and would like to do it, maybe this could be done faster.
>>
>> As the scheduler is now very well instrumented (many debugging features
>> are available), reproducing the bug on a test platform (it happens quite
>> quickly for me) might also give some hints. So testers, if you have
>> time, please test 2.6.35-rc5 with CONFIG_NO_BOOTMEM on a Core i7 and see
>> if you can reproduce the problem!
>
> Will try to do so. Can you point me to the git bisect howto with the versions you want.

Cool. So like I said, you first want to test 2.6.34 to find a known
good version. Please remember to make sure you have CONFIG_NO_BOOTMEM
enabled. You can also try to speed up the process by testing
2.6.35-rc1 which is likely to include the offending commit. That's not
strictly necessary as long as you are sure that you have some
2.6.35-rc kernel that's bad.

After that, bisecting is as simple as:

git bisect start
git bisect good v2.6.34
git bisect bad v2.6.31-rc1 # or some other kernel you know to be bad
<compile, boot, and try to trigger the problem>

then

git bisect bad # if you were able to trigger the problem

or

git bisect good # if the problem doesn't exist

git will then find the next revision to test after which you do

<compile, boot, and try to trigger the problem>

and repeat the "git bisect good/bad" step until git tells you it has
found the offending commit.

There's more information on the git bisect man pages:

http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html

Let me know if you need more help with this.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Zeno Davatz on
Am 15.07.2010 um 22:50 schrieb Pekka Enberg <penberg(a)cs.helsinki.fi>:

> On Thu, Jul 15, 2010 at 11:38 PM, Zeno Davatz <zdavatz(a)gmail.com> wrote:
>> Am 15.07.2010 um 22:00 schrieb Damien Wyart <damien.wyart(a)free.fr>:
>>
>>>>> For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ;
>>>>> with the option and rc5 the problem was happening quite quickly after
>>>>> boot and normal use of the machine. So it seems I can confirme what Zeno
>>>>> has seen and I hope this will give a hint to debug the problem. I guess
>>>>> this has not been reported that much because many testers might not have
>>>>> enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their
>>>>> benchmark with a kernel having this option enabled?
>>>
>>> * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 22:50]:
>>>> To be honest, the bug is bit odd. It's related to boot-time memory
>>>> allocator changes but yet it seems to manifest itself as a scheduling
>>>> problem. So if you have some spare time and want to speed up the
>>>> debugging process, please test v2.6.34 and v2.6.35-rc1 with
>>>> CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see
>>>> if you can identify the offending commit with "git bisect."
>>>
>>> Not sure I will have enough time in the coming days (doing that remotely
>>> is fishy since ssh access is almost stuck when the problem occurs); if
>>> Zeno can and would like to do it, maybe this could be done faster.
>>>
>>> As the scheduler is now very well instrumented (many debugging features
>>> are available), reproducing the bug on a test platform (it happens quite
>>> quickly for me) might also give some hints. So testers, if you have
>>> time, please test 2.6.35-rc5 with CONFIG_NO_BOOTMEM on a Core i7 and see
>>> if you can reproduce the problem!
>>
>> Will try to do so. Can you point me to the git bisect howto with the versions you want.
>
> Cool. So like I said, you first want to test 2.6.34 to find a known
> good version. Please remember to make sure you have CONFIG_NO_BOOTMEM
> enabled. You can also try to speed up the process by testing
> 2.6.35-rc1 which is likely to include the offending commit. That's not
> strictly necessary as long as you are sure that you have some
> 2.6.35-rc kernel that's bad.
>
> After that, bisecting is as simple as:
>
> git bisect start
> git bisect good v2.6.34
> git bisect bad v2.6.31-rc1 # or some other kernel you know to be bad
> <compile, boot, and try to trigger the problem>
>
> then
>
> git bisect bad # if you were able to trigger the problem
>
> or
>
> git bisect good # if the problem doesn't exist
>
> git will then find the next revision to test after which you do
>
> <compile, boot, and try to trigger the problem>
>
> and repeat the "git bisect good/bad" step until git tells you it has
> found the offending commit.
>
> There's more information on the git bisect man pages:
>
> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
>
> Let me know if you need more help with this.

Ok, thanks for the guidance, will start some time tomorrow. Hope to make it in the morning.

Best
Zeno--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/