From: Andi Kleen on
On Mon, Feb 15, 2010 at 09:41:35PM +1100, Nick Piggin wrote:
> On Mon, Feb 15, 2010 at 11:32:50AM +0100, Andi Kleen wrote:
> > On Mon, Feb 15, 2010 at 05:15:35PM +1100, Nick Piggin wrote:
> > > On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> > > >
> > > > cache_reap can run before the node is set up and then reference a NULL
> > > > l3 list. Check for this explicitely and just continue. The node
> > > > will be eventually set up.
> > >
> > > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > > run before start_cpu_timer.
> >
> > I'm not fully sure, but I have the oops to prove it :)
>
> Hmm, it would be nice to work out why it's happening. If it's completely
> reproducible then could I send you a debug patch to test?

Looking at it again I suspect it happened this way:

cpuup_prepare fails (e.g. kmalloc_node returns NULL). The later
patches might have cured that. Nothing stops the timer from
starting in this case anyways.

So given that the first patches might not be needed, but it's
safer to have anyways.

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Nick Piggin on
On Mon, Feb 15, 2010 at 11:52:53AM +0100, Andi Kleen wrote:
> On Mon, Feb 15, 2010 at 09:41:35PM +1100, Nick Piggin wrote:
> > On Mon, Feb 15, 2010 at 11:32:50AM +0100, Andi Kleen wrote:
> > > On Mon, Feb 15, 2010 at 05:15:35PM +1100, Nick Piggin wrote:
> > > > On Thu, Feb 11, 2010 at 09:54:04PM +0100, Andi Kleen wrote:
> > > > >
> > > > > cache_reap can run before the node is set up and then reference a NULL
> > > > > l3 list. Check for this explicitely and just continue. The node
> > > > > will be eventually set up.
> > > >
> > > > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > > > run before start_cpu_timer.
> > >
> > > I'm not fully sure, but I have the oops to prove it :)
> >
> > Hmm, it would be nice to work out why it's happening. If it's completely
> > reproducible then could I send you a debug patch to test?
>
> Looking at it again I suspect it happened this way:
>
> cpuup_prepare fails (e.g. kmalloc_node returns NULL). The later
> patches might have cured that. Nothing stops the timer from
> starting in this case anyways.

Hmm, but it should, because if cpuup_prepare fails then the
CPU_ONLINE notifiers should never be called I think.


> So given that the first patches might not be needed, but it's
> safer to have anyways.

I'm just worried there is still an underlying problem here.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on
Nick Piggin <npiggin(a)suse.de> writes:
>
> Hmm, but it should, because if cpuup_prepare fails then the
> CPU_ONLINE notifiers should never be called I think.

That's true.

-Andi
--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on
On Mon, 15 Feb 2010, Andi Kleen wrote:

> > How, may I ask? cpuup_prepare in the hotplug notifier should always
> > run before start_cpu_timer.
>
> I'm not fully sure, but I have the oops to prove it :)

I still suspect that this has something to do with Pekka's changing the
boot order for allocator bootstrap. Can we clarify why these problems
exist before we try band aid?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Christoph Lameter on
On Mon, 15 Feb 2010, Nick Piggin wrote:

> I'm just worried there is still an underlying problem here.

So am I. What caused the breakage that requires this patchset?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/