From: David Rientjes on
On Tue, 2 Mar 2010, Andi Kleen wrote:

> The patch looks far more complicated than my simple fix.
>
> Is more complicated now better?
>

If you still believe these are "fixes," then perhaps you don't fully
understand the issue: slab completely lacked memory hotplug support when a
node is either being onlined or offlined that do not have hotadded or
hotremoved cpus. It's as simple as that.

To be fair, my patch may appear more complex because it implements full
memory hotplug support so that the nodelists are properly drained and
freed when the same memory regions you onlined for memory hot-add are now
offlined. Notice, also, how it touches no other slab code as implementing
new support for something shouldn't. There is no need for additional
hacks to be added in other slab code if you properly allocate and
initialize the nodelists for the memory being added before it is available
for use by the kernel.

If you'd test my patch out on your setup, that would be very helpful. I
can address any additional issues that you may undercover if you post the
oops while doing either memory online or offline.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Tue, 2 Mar 2010 14:20:06 -0600 (CST)
Christoph Lameter <cl(a)linux-foundation.org> wrote:

>
> Not sure how this would sync with slab use during node bootstrap and
> shutdown. Kame-san?
>
> Otherwise
>
> Acked-by: Christoph Lameter <cl(a)linux-foundation.org>
>

What this patch fixes ? Maybe I miss something...

At node hot-add

* pgdat is allocated from other node (because we have no memory for "nid")
* memmap for the first section (and possiby others) will be allocated from
other nodes.
* Once a section for the node is onlined, any memory can be allocated localy.

(Allocating memory from local node requires some new implementation as
bootmem allocater, we didn't that.)

Before this patch, slab's control layer is allocated by cpuhotplug.
So, at least keeping this order,
memory online -> cpu online
slab's control layer is allocated from local node.

When node-hotadd is done in this order
cpu online -> memory online
kmalloc_node() will allocate memory from other node via fallback.

After this patch, slab's control layer is allocated by memory hotplug.
Then, in any order, slab's control will be allocated via fallback routine.

If this patch is an alternative fix for Andi's this logic
==
Index: linux-2.6.32-memhotadd/mm/slab.c
===================================================================
--- linux-2.6.32-memhotadd.orig/mm/slab.c
+++ linux-2.6.32-memhotadd/mm/slab.c
@@ -4093,6 +4093,9 @@ static void cache_reap(struct work_struc
* we can do some work if the lock was obtained.
*/
l3 = searchp->nodelists[node];
+ /* Note node yet set up */
+ if (!l3)
+ break;
==
I'm not sure this really happens.

cache_reap() is for checking local node. The caller is set up by
CPU_ONLINE. searchp->nodelists[] is filled by CPU_PREPARE.

Then, cpu for the node should be onlined. (and it's done under proper mutex.)

I'm sorry if I miss something important. But how anyone cause this ?

Thanks,
-Kame



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Rientjes on
On Wed, 3 Mar 2010, KAMEZAWA Hiroyuki wrote:

> At node hot-add
>
> * pgdat is allocated from other node (because we have no memory for "nid")
> * memmap for the first section (and possiby others) will be allocated from
> other nodes.
> * Once a section for the node is onlined, any memory can be allocated localy.
>

Correct, and the struct kmem_list3 is also alloacted from other nodes with
my patch.

> (Allocating memory from local node requires some new implementation as
> bootmem allocater, we didn't that.)
>
> Before this patch, slab's control layer is allocated by cpuhotplug.
> So, at least keeping this order,
> memory online -> cpu online
> slab's control layer is allocated from local node.
>
> When node-hotadd is done in this order
> cpu online -> memory online
> kmalloc_node() will allocate memory from other node via fallback.
>
> After this patch, slab's control layer is allocated by memory hotplug.
> Then, in any order, slab's control will be allocated via fallback routine.
>

Again, this addresses memory hotplug that requires a new node to be
onlined that do not have corresponding cpus that are being onlined. On
x86, these represent ACPI_SRAT_MEM_HOT_PLUGGABLE regions that are onlined
either by the acpi hotplug or done manually with CONFIG_ARCH_MEMORY_PROBE.
On other architectures such as powerpc, this is done in different ways.

All of this is spelled out in the changelog for the patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Tue, 2 Mar 2010 18:39:20 -0800 (PST)
David Rientjes <rientjes(a)google.com> wrote:

> On Wed, 3 Mar 2010, KAMEZAWA Hiroyuki wrote:
>
> > At node hot-add
> >
> > * pgdat is allocated from other node (because we have no memory for "nid")
> > * memmap for the first section (and possiby others) will be allocated from
> > other nodes.
> > * Once a section for the node is onlined, any memory can be allocated localy.
> >
>
> Correct, and the struct kmem_list3 is also alloacted from other nodes with
> my patch.
>
> > (Allocating memory from local node requires some new implementation as
> > bootmem allocater, we didn't that.)
> >
> > Before this patch, slab's control layer is allocated by cpuhotplug.
> > So, at least keeping this order,
> > memory online -> cpu online
> > slab's control layer is allocated from local node.
> >
> > When node-hotadd is done in this order
> > cpu online -> memory online
> > kmalloc_node() will allocate memory from other node via fallback.
> >
> > After this patch, slab's control layer is allocated by memory hotplug.
> > Then, in any order, slab's control will be allocated via fallback routine.
> >
>
> Again, this addresses memory hotplug that requires a new node to be
> onlined that do not have corresponding cpus that are being onlined. On
> x86, these represent ACPI_SRAT_MEM_HOT_PLUGGABLE regions that are onlined
> either by the acpi hotplug or done manually with CONFIG_ARCH_MEMORY_PROBE.
> On other architectures such as powerpc, this is done in different ways.
>
> All of this is spelled out in the changelog for the patch.
>
Ah, ok. for cpu-less node and kmallco_node() against that node.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>

Thanks,
-Kame



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andi Kleen on
> > The patch looks far more complicated than my simple fix.
>
> I wouldn't exactly call the fallback_alloc() games "simple".

I have to disagree on that. It was the most simple fix I could
come up with, least intrusive to legacy like slab is.

> > Is more complicated now better?
>
> Heh, heh. You can't post the oops, you don't want to rework your

The missing oops was about the timer race, not about this one.

> patches as per review comments, and now you complain about David's
> patch without one bit of technical content. I'm sorry but I must

Well sorry I'm just a bit frustrated about the glacial progress on what
should be relatively straight forward fixes.

IMHO something like my patch should have gone into .33 and any more
complicated reworks like this into .34.

> But anyway, if you have real technical concerns over the patch, please
> make them known; otherwise I'd much appreciate a Tested-by tag from
> you for David's patch.

If it works it would be ok for me. The main concern would be to actually
get it fixed.

-Andi

--
ak(a)linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/