From: Peter Zijlstra on
On Thu, 2010-07-15 at 23:52 +0300, Pekka Enberg wrote:
> On Thu, Jul 15, 2010 at 11:00 PM, Damien Wyart <damien.wyart(a)free.fr> wrote:
> >> > For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ;
> >> > with the option and rc5 the problem was happening quite quickly after
> >> > boot and normal use of the machine. So it seems I can confirme what Zeno
> >> > has seen and I hope this will give a hint to debug the problem. I guess
> >> > this has not been reported that much because many testers might not have
> >> > enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their
> >> > benchmark with a kernel having this option enabled?
> >
> > * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 22:50]:
> >> To be honest, the bug is bit odd. It's related to boot-time memory
> >> allocator changes but yet it seems to manifest itself as a scheduling
> >> problem. So if you have some spare time and want to speed up the
> >> debugging process, please test v2.6.34 and v2.6.35-rc1 with
> >> CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see
> >> if you can identify the offending commit with "git bisect."
> >
> > Not sure I will have enough time in the coming days (doing that remotely
> > is fishy since ssh access is almost stuck when the problem occurs); if
> > Zeno can and would like to do it, maybe this could be done faster.
> >
> > As the scheduler is now very well instrumented (many debugging features
> > are available), reproducing the bug on a test platform (it happens quite
> > quickly for me) might also give some hints. So testers, if you have
> > time, please test 2.6.35-rc5 with CONFIG_NO_BOOTMEM on a Core i7 and see
> > if you can reproduce the problem!
>
> Yeah, there's "perf sched" tool available for that:
>
> http://lwn.net/Articles/353295/
>
> The only problem is that we'd need a scheduler hacker to decipher the
> report and all of them seem to be missing at the moment (probably at
> OLS). Anyway, like I said, git bisect will probably speed up the
> debugging process, that's all.

Vacation.. but now I'm back ;-)

Even something simple as: perf top -r 1 (make sure you're root in order
to run with real-time prios) could give a clue as to what is consuming
all your cpu-time.

Or did the issue get sorted already?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Zeno Davatz on
On Tue, Aug 3, 2010 at 11:15 AM, <damien.wyart(a)free.fr> wrote:
>> > Vacation.. but now I'm back ;-)
>> >
>> > Even something simple as: perf top -r 1 (make sure you're root in order
>> > to run with real-time prios) could give a clue as to what is consuming
>> > all your cpu-time.
>> >
>> > Or did the issue get sorted already?
>>
>> Thank you for the hint.
>>
>> I am on 2.6.35 now and all seems to be fine again.
>
> Are you 100% sure you compiled it with CONFIG_NO_BOOTMEM enabled?
>
> I did not test 2.6.35 yet but I did not see anything related to this bug
> commited since the discussion so I am very surprised the problem disappeared by
> itself...
>
> Will be on vacation very soon, so not sure I will have time to test 2.6.35
> before leaving.

Yes: I got:

# CONFIG_PARAVIRT_SPINLOCKS is not set
CONFIG_PARAVIRT_CLOCK=y
# CONFIG_PARAVIRT_DEBUG is not set
CONFIG_NO_BOOTMEM=y
# CONFIG_MEMTEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set

in my .config.

Linux zenogentoo 2.6.35 #122 SMP Mon Aug 2 10:26:05 CEST 2010 i686
Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz GenuineIntel GNU/Linux

Best
Zeno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: damien.wyart on
> > Vacation.. but now I'm back ;-)
> >
> > Even something simple as: perf top -r 1 (make sure you're root in order
> > to run with real-time prios) could give a clue as to what is consuming
> > all your cpu-time.
> >
> > Or did the issue get sorted already?
>
> Thank you for the hint.
>
> I am on 2.6.35 now and all seems to be fine again.

Are you 100% sure you compiled it with CONFIG_NO_BOOTMEM enabled?

I did not test 2.6.35 yet but I did not see anything related to this bug
commited since the discussion so I am very surprised the problem disappeared by
itself...

Will be on vacation very soon, so not sure I will have time to test 2.6.35
before leaving.

Damien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Zeno Davatz on
On Tue, Aug 3, 2010 at 11:05 AM, Peter Zijlstra <peterz(a)infradead.org> wrote:
> On Thu, 2010-07-15 at 23:52 +0300, Pekka Enberg wrote:
>> On Thu, Jul 15, 2010 at 11:00 PM, Damien Wyart <damien.wyart(a)free.fr> wrote:
>> >> > For now, I can't reproduce the problem with CONFIG_NO_BOOTMEM disabled ;
>> >> > with the option and rc5 the problem was happening quite quickly after
>> >> > boot and normal use of the machine. So it seems I can confirme what Zeno
>> >> > has seen and I hope this will give a hint to debug the problem. I guess
>> >> > this has not been reported that much because many testers might not have
>> >> > enabled CONFIG_NO_BOOTMEM... Maybe the scheduler folks could test their
>> >> > benchmark with a kernel having this option enabled?
>> >
>> > * Pekka Enberg <penberg(a)cs.helsinki.fi> [2010-07-15 22:50]:
>> >> To be honest, the bug is bit odd. It's related to boot-time memory
>> >> allocator changes but yet it seems to manifest itself as a scheduling
>> >> problem. So if you have some spare time and want to speed up the
>> >> debugging process, please test v2.6.34 and v2.6.35-rc1 with
>> >> CONFIG_NO_BOOTMEM and if former is good and latter is bad, try to see
>> >> if you can identify the offending commit with "git bisect."
>> >
>> > Not sure I will have enough time in the coming days (doing that remotely
>> > is fishy since ssh access is almost stuck when the problem occurs); if
>> > Zeno can and would like to do it, maybe this could be done faster.
>> >
>> > As the scheduler is now very well instrumented (many debugging features
>> > are available), reproducing the bug on a test platform (it happens quite
>> > quickly for me) might also give some hints. So testers, if you have
>> > time, please test 2.6.35-rc5 with CONFIG_NO_BOOTMEM on a Core i7 and see
>> > if you can reproduce the problem!
>>
>> Yeah, there's "perf sched" tool available for that:
>>
>> � http://lwn.net/Articles/353295/
>>
>> The only problem is that we'd need a scheduler hacker to decipher the
>> report and all of them seem to be missing at the moment (probably at
>> OLS). Anyway, like I said, git bisect will probably speed up the
>> debugging process, that's all.
>
> Vacation.. but now I'm back ;-)
>
> Even something simple as: perf top -r 1 (make sure you're root in order
> to run with real-time prios) could give a clue as to what is consuming
> all your cpu-time.
>
> Or did the issue get sorted already?

Thank you for the hint.

I am on 2.6.35 now and all seems to be fine again.

Best
Zeno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/