slab: add memory hotplug support [Kernel]

Prev: Don't change direction flags in struct request.
Next: w35und: Update README

From: Nick Piggin on 22 Mar 2010 17:20

On Mon, Mar 22, 2010 at 07:28:54PM +0200, Pekka Enberg wrote:
> Nick Piggin wrote:
> >On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
> >>On Fri, 5 Mar 2010, Nick Piggin wrote:
> >>
> >>>>+#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> >>>>+/*
> >>>>+ * Drains and frees nodelists for a node on each slab cache, used for memory
> >>>>+ * hotplug. Returns -EBUSY if all objects cannot be drained on memory
> >>>>+ * hot-remove so that the node is not removed. When used because memory
> >>>>+ * hot-add is canceled, the only result is the freed kmem_list3.
> >>>>+ *
> >>>>+ * Must hold cache_chain_mutex.
> >>>>+ */
> >>>>+static int __meminit free_cache_nodelists_node(int node)
> >>>>+{
> >>>>+ struct kmem_cache *cachep;
> >>>>+ int ret = 0;
> >>>>+
> >>>>+ list_for_each_entry(cachep, &cache_chain, next) {
> >>>>+ struct array_cache *shared;
> >>>>+ struct array_cache **alien;
> >>>>+ struct kmem_list3 *l3;
> >>>>+
> >>>>+ l3 = cachep->nodelists[node];
> >>>>+ if (!l3)
> >>>>+ continue;
> >>>>+
> >>>>+ spin_lock_irq(&l3->list_lock);
> >>>>+ shared = l3->shared;
> >>>>+ if (shared) {
> >>>>+ free_block(cachep, shared->entry, shared->avail, node);
> >>>>+ l3->shared = NULL;
> >>>>+ }
> >>>>+ alien = l3->alien;
> >>>>+ l3->alien = NULL;
> >>>>+ spin_unlock_irq(&l3->list_lock);
> >>>>+
> >>>>+ if (alien) {
> >>>>+ drain_alien_cache(cachep, alien);
> >>>>+ free_alien_cache(alien);
> >>>>+ }
> >>>>+ kfree(shared);
> >>>>+
> >>>>+ drain_freelist(cachep, l3, l3->free_objects);
> >>>>+ if (!list_empty(&l3->slabs_full) ||
> >>>>+ !list_empty(&l3->slabs_partial)) {
> >>>>+ /*
> >>>>+ * Continue to iterate through each slab cache to free
> >>>>+ * as many nodelists as possible even though the
> >>>>+ * offline will be canceled.
> >>>>+ */
> >>>>+ ret = -EBUSY;
> >>>>+ continue;
> >>>>+ }
> >>>>+ kfree(l3);
> >>>>+ cachep->nodelists[node] = NULL;
> >>>What's stopping races of other CPUs trying to access l3 and array
> >>>caches while they're being freed?
> >>>
> >>numa_node_id() will not return an offlined nodeid and
> >>cache_alloc_node() already does a fallback to other onlined
> >>nodes in case a nodeid is passed to kmalloc_node() that does not
> >>have a nodelist. l3->shared and l3->alien cannot be accessed
> >>without l3->list_lock (drain, cache_alloc_refill,
> >>cache_flusharray) or cache_chain_mutex (kmem_cache_destroy,
> >>cache_reap).
> >
> >Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
> >while it is being accessed by another CPU and concurrently being freed
> >on this one?
> >
> >
> >>>>+ }
> >>>>+ return ret;
> >>>>+}
> >>>>+
> >>>>+/*
> >>>>+ * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> >>>>+ */
> >>>>+static int __meminit slab_node_online(int nid)
> >>>>+{
> >>>>+ int ret;
> >>>>+ mutex_lock(&cache_chain_mutex);
> >>>>+ ret = init_cache_nodelists_node(nid);
> >>>>+ mutex_unlock(&cache_chain_mutex);
> >>>>+ return ret;
> >>>>+}
> >>>>+
> >>>>+/*
> >>>>+ * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> >>>>+ */
> >>>>+static int __meminit slab_node_offline(int nid)
> >>>>+{
> >>>>+ int ret;
> >>>>+ mutex_lock(&cache_chain_mutex);
> >>>>+ ret = free_cache_nodelists_node(nid);
> >>>>+ mutex_unlock(&cache_chain_mutex);
> >>>>+ return ret;
> >>>>+}
> >>>>+
> >>>>+static int __meminit slab_memory_callback(struct notifier_block *self,
> >>>>+ unsigned long action, void *arg)
> >>>>+{
> >>>>+ struct memory_notify *mnb = arg;
> >>>>+ int ret = 0;
> >>>>+ int nid;
> >>>>+
> >>>>+ nid = mnb->status_change_nid;
> >>>>+ if (nid < 0)
> >>>>+ goto out;
> >>>>+
> >>>>+ switch (action) {
> >>>>+ case MEM_GOING_ONLINE:
> >>>>+ case MEM_CANCEL_OFFLINE:
> >>>>+ ret = slab_node_online(nid);
> >>>>+ break;
> >>>This would explode if CANCEL_OFFLINE fails. Call it theoretical and
> >>>put a panic() in here and I don't mind. Otherwise you get corruption
> >>>somewhere in the slab code.
> >>>
> >>MEM_CANCEL_ONLINE would only fail here if a struct kmem_list3
> >>couldn't be allocated anywhere on the system and if that happens
> >>then the node simply couldn't be allocated from (numa_node_id()
> >>would never return it as the cpu's node, so it's possible to
> >>fallback in this scenario).
> >
> >Why would it never return the CPU's node? It's CANCEL_OFFLINE that is
> >the problem.
>
> So I was thinking of pushing this towards Linus but I didn't see
> anyone respond to Nick's concerns. I'm not that familiar with all
> this hotplug stuff so can someone make also Nick happy so we can
> move forward?

I don't mind about the memory failure cases (just add a panic
there that should never really happen anyway, just to document
that a part is still missing).

I am more worried about the races. Maybe I just missed how they
are protected against.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Rientjes on 27 Mar 2010 22:20

On Wed, 10 Mar 2010, Nick Piggin wrote:

> On Mon, Mar 08, 2010 at 03:19:48PM -0800, David Rientjes wrote:
> > On Fri, 5 Mar 2010, Nick Piggin wrote:
> >
> > > > +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> > > > +/*
> > > > + * Drains and frees nodelists for a node on each slab cache, used for memory
> > > > + * hotplug. Returns -EBUSY if all objects cannot be drained on memory
> > > > + * hot-remove so that the node is not removed. When used because memory
> > > > + * hot-add is canceled, the only result is the freed kmem_list3.
> > > > + *
> > > > + * Must hold cache_chain_mutex.
> > > > + */
> > > > +static int __meminit free_cache_nodelists_node(int node)
> > > > +{
> > > > + struct kmem_cache *cachep;
> > > > + int ret = 0;
> > > > +
> > > > + list_for_each_entry(cachep, &cache_chain, next) {
> > > > + struct array_cache *shared;
> > > > + struct array_cache **alien;
> > > > + struct kmem_list3 *l3;
> > > > +
> > > > + l3 = cachep->nodelists[node];
> > > > + if (!l3)
> > > > + continue;
> > > > +
> > > > + spin_lock_irq(&l3->list_lock);
> > > > + shared = l3->shared;
> > > > + if (shared) {
> > > > + free_block(cachep, shared->entry, shared->avail, node);
> > > > + l3->shared = NULL;
> > > > + }
> > > > + alien = l3->alien;
> > > > + l3->alien = NULL;
> > > > + spin_unlock_irq(&l3->list_lock);
> > > > +
> > > > + if (alien) {
> > > > + drain_alien_cache(cachep, alien);
> > > > + free_alien_cache(alien);
> > > > + }
> > > > + kfree(shared);
> > > > +
> > > > + drain_freelist(cachep, l3, l3->free_objects);
> > > > + if (!list_empty(&l3->slabs_full) ||
> > > > + !list_empty(&l3->slabs_partial)) {
> > > > + /*
> > > > + * Continue to iterate through each slab cache to free
> > > > + * as many nodelists as possible even though the
> > > > + * offline will be canceled.
> > > > + */
> > > > + ret = -EBUSY;
> > > > + continue;
> > > > + }
> > > > + kfree(l3);
> > > > + cachep->nodelists[node] = NULL;
> > >
> > > What's stopping races of other CPUs trying to access l3 and array
> > > caches while they're being freed?
> > >
> >
> > numa_node_id() will not return an offlined nodeid and cache_alloc_node()
> > already does a fallback to other onlined nodes in case a nodeid is passed
> > to kmalloc_node() that does not have a nodelist. l3->shared and l3->alien
> > cannot be accessed without l3->list_lock (drain, cache_alloc_refill,
> > cache_flusharray) or cache_chain_mutex (kmem_cache_destroy, cache_reap).
>
> Yeah, but can't it _have_ a nodelist (ie. before it is set to NULL here)
> while it is being accessed by another CPU and concurrently being freed
> on this one?
>

You're right, we can't free cachep->nodelists[node] for any node that is
being hot-removed to avoid a race in cache_alloc_node(). I thought we had
protection for this under cache_chain_mutex for most dereferences and
could disregard cache_alloc_refill() because numa_node_id() would never
return a node being removed under memory hotplug, that would be the
responsibility of cpu hotplug instead (offline the cpu first, then ensure
numa_node_id() can't return a node under hot-remove).

Thanks for pointing that out, it's definitely broken here.

As an alternative, I think we should do something like this on
MEM_GOING_OFFLINE:

int ret = 0;

mutex_lock(&cache_chain_mutex);
list_for_each_entry(cachep, &cache_chain, next) {
struct kmem_list3 *l3;

l3 = cachep->nodelists[node];
if (!l3)
continue;
drain_freelist(cachep, l3, l3->free_objects);

ret = list_empty(&l3->slabs_full) &&
list_empty(&l3->slabs_partial);
if (ret)
break;
}
mutex_unlock(&cache_chain_mutex);
return ret ? NOTIFY_BAD : NOTIFY_OK;

to preempt hot-remove of a node where there are slabs on the partial or
free list that can't be freed.

Then, for MEM_OFFLINE, we leave cachep->nodelists[node] to be valid in
case there are cache_alloc_node() racers or the node ever comes back
online; susbequent callers to kmalloc_node() for the offlined node would
actually return objects from fallback_alloc() since kmem_getpages() would
fail for a node without present pages.

If slab is allocated after the drain_freelist() above, we'll never
actually get MEM_OFFLINE since all pages can't be isolated for memory
hot-remove, thus, the node will never be offlined. kmem_getpages() can't
allocate isolated pages, so this race must happen after drain_freelist()
and prior to the pageblock being isolated.

So the MEM_GOING_OFFLINE check above is really more of a convenience to
short-circuit the hot-remove if we know we can't free all slab on that
node to avoid all the subsequent work that would happen only to run into
isolation failure later.

We don't need to do anything for MEM_CANCEL_OFFLINE since the only affect
of MEM_GOING_OFFLINE is to drain the freelist.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Pekka Enberg on 30 Mar 2010 05:10

On Sun, Mar 28, 2010 at 5:40 AM, David Rientjes <rientjes(a)google.com> wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged. �This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node. �It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
>
> When a node is hotadded, a nodelist for that node is allocated and
> initialized for each slab cache. �If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.
>
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
>
> When an entire node would be offlined, its nodelists are subsequently
> drained. �If slab objects still exist and cannot be freed, the offline is
> aborted. �It is possible that objects will be allocated between this
> drain and page isolation, so it's still possible that the offline will
> still fail, however.
>
> Signed-off-by: David Rientjes <rientjes(a)google.com>

Nick, Christoph, lets make a a deal: you ACK, I merge. How does that
sound to you?

> ---
> �mm/slab.c | �157 ++++++++++++++++++++++++++++++++++++++++++++++++------------
> �1 files changed, 125 insertions(+), 32 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -115,6 +115,7 @@
> �#include � � � <linux/reciprocal_div.h>
> �#include � � � <linux/debugobjects.h>
> �#include � � � <linux/kmemcheck.h>
> +#include � � � <linux/memory.h>
>
> �#include � � � <asm/cacheflush.h>
> �#include � � � <asm/tlbflush.h>
> @@ -1102,6 +1103,52 @@ static inline int cache_free_alien(struct kmem_cache *cachep, void *objp)
> �}
> �#endif
>
> +/*
> + * Allocates and initializes nodelists for a node on each slab cache, used for
> + * either memory or cpu hotplug. �If memory is being hot-added, the kmem_list3
> + * will be allocated off-node since memory is not yet online for the new node.
> + * When hotplugging memory or a cpu, existing nodelists are not replaced if
> + * already in use.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int init_cache_nodelists_node(int node)
> +{
> + � � � struct kmem_cache *cachep;
> + � � � struct kmem_list3 *l3;
> + � � � const int memsize = sizeof(struct kmem_list3);
> +
> + � � � list_for_each_entry(cachep, &cache_chain, next) {
> + � � � � � � � /*
> + � � � � � � � �* Set up the size64 kmemlist for cpu before we can
> + � � � � � � � �* begin anything. Make sure some other cpu on this
> + � � � � � � � �* node has not already allocated this
> + � � � � � � � �*/
> + � � � � � � � if (!cachep->nodelists[node]) {
> + � � � � � � � � � � � l3 = kmalloc_node(memsize, GFP_KERNEL, node);
> + � � � � � � � � � � � if (!l3)
> + � � � � � � � � � � � � � � � return -ENOMEM;
> + � � � � � � � � � � � kmem_list3_init(l3);
> + � � � � � � � � � � � l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
> + � � � � � � � � � � � � � ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> +
> + � � � � � � � � � � � /*
> + � � � � � � � � � � � �* The l3s don't come and go as CPUs come and
> + � � � � � � � � � � � �* go. �cache_chain_mutex is sufficient
> + � � � � � � � � � � � �* protection here.
> + � � � � � � � � � � � �*/
> + � � � � � � � � � � � cachep->nodelists[node] = l3;
> + � � � � � � � }
> +
> + � � � � � � � spin_lock_irq(&cachep->nodelists[node]->list_lock);
> + � � � � � � � cachep->nodelists[node]->free_limit =
> + � � � � � � � � � � � (1 + nr_cpus_node(node)) *
> + � � � � � � � � � � � cachep->batchcount + cachep->num;
> + � � � � � � � spin_unlock_irq(&cachep->nodelists[node]->list_lock);
> + � � � }
> + � � � return 0;
> +}
> +
> �static void __cpuinit cpuup_canceled(long cpu)
> �{
> � � � �struct kmem_cache *cachep;
> @@ -1172,7 +1219,7 @@ static int __cpuinit cpuup_prepare(long cpu)
> � � � �struct kmem_cache *cachep;
> � � � �struct kmem_list3 *l3 = NULL;
> � � � �int node = cpu_to_node(cpu);
> - � � � const int memsize = sizeof(struct kmem_list3);
> + � � � int err;
>
> � � � �/*
> � � � � * We need to do this right in the beginning since
> @@ -1180,35 +1227,9 @@ static int __cpuinit cpuup_prepare(long cpu)
> � � � � * kmalloc_node allows us to add the slab to the right
> � � � � * kmem_list3 and not this cpu's kmem_list3
> � � � � */
> -
> - � � � list_for_each_entry(cachep, &cache_chain, next) {
> - � � � � � � � /*
> - � � � � � � � �* Set up the size64 kmemlist for cpu before we can
> - � � � � � � � �* begin anything. Make sure some other cpu on this
> - � � � � � � � �* node has not already allocated this
> - � � � � � � � �*/
> - � � � � � � � if (!cachep->nodelists[node]) {
> - � � � � � � � � � � � l3 = kmalloc_node(memsize, GFP_KERNEL, node);
> - � � � � � � � � � � � if (!l3)
> - � � � � � � � � � � � � � � � goto bad;
> - � � � � � � � � � � � kmem_list3_init(l3);
> - � � � � � � � � � � � l3->next_reap = jiffies + REAPTIMEOUT_LIST3 +
> - � � � � � � � � � � � � � ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> -
> - � � � � � � � � � � � /*
> - � � � � � � � � � � � �* The l3s don't come and go as CPUs come and
> - � � � � � � � � � � � �* go. �cache_chain_mutex is sufficient
> - � � � � � � � � � � � �* protection here.
> - � � � � � � � � � � � �*/
> - � � � � � � � � � � � cachep->nodelists[node] = l3;
> - � � � � � � � }
> -
> - � � � � � � � spin_lock_irq(&cachep->nodelists[node]->list_lock);
> - � � � � � � � cachep->nodelists[node]->free_limit =
> - � � � � � � � � � � � (1 + nr_cpus_node(node)) *
> - � � � � � � � � � � � cachep->batchcount + cachep->num;
> - � � � � � � � spin_unlock_irq(&cachep->nodelists[node]->list_lock);
> - � � � }
> + � � � err = init_cache_nodelists_node(node);
> + � � � if (err < 0)
> + � � � � � � � goto bad;
>
> � � � �/*
> � � � � * Now we can go ahead with allocating the shared arrays and
> @@ -1331,11 +1352,75 @@ static struct notifier_block __cpuinitdata cpucache_notifier = {
> � � � �&cpuup_callback, NULL, 0
> �};
>
> +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> +/*
> + * Drains freelist for a node on each slab cache, used for memory hot-remove.
> + * Returns -EBUSY if all objects cannot be drained so that the node is not
> + * removed.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int __meminit drain_cache_nodelists_node(int node)
> +{
> + � � � struct kmem_cache *cachep;
> + � � � int ret = 0;
> +
> + � � � list_for_each_entry(cachep, &cache_chain, next) {
> + � � � � � � � struct kmem_list3 *l3;
> +
> + � � � � � � � l3 = cachep->nodelists[node];
> + � � � � � � � if (!l3)
> + � � � � � � � � � � � continue;
> +
> + � � � � � � � drain_freelist(cachep, l3, l3->free_objects);
> +
> + � � � � � � � if (!list_empty(&l3->slabs_full) ||
> + � � � � � � � � � !list_empty(&l3->slabs_partial)) {
> + � � � � � � � � � � � ret = -EBUSY;
> + � � � � � � � � � � � break;
> + � � � � � � � }
> + � � � }
> + � � � return ret;
> +}
> +
> +static int __meminit slab_memory_callback(struct notifier_block *self,
> + � � � � � � � � � � � � � � � � � � � unsigned long action, void *arg)
> +{
> + � � � struct memory_notify *mnb = arg;
> + � � � int ret = 0;
> + � � � int nid;
> +
> + � � � nid = mnb->status_change_nid;
> + � � � if (nid < 0)
> + � � � � � � � goto out;
> +
> + � � � switch (action) {
> + � � � case MEM_GOING_ONLINE:
> + � � � � � � � mutex_lock(&cache_chain_mutex);
> + � � � � � � � ret = init_cache_nodelists_node(nid);
> + � � � � � � � mutex_unlock(&cache_chain_mutex);
> + � � � � � � � break;
> + � � � case MEM_GOING_OFFLINE:
> + � � � � � � � mutex_lock(&cache_chain_mutex);
> + � � � � � � � ret = drain_cache_nodelists_node(nid);
> + � � � � � � � mutex_unlock(&cache_chain_mutex);
> + � � � � � � � break;
> + � � � case MEM_ONLINE:
> + � � � case MEM_OFFLINE:
> + � � � case MEM_CANCEL_ONLINE:
> + � � � case MEM_CANCEL_OFFLINE:
> + � � � � � � � break;
> + � � � }
> +out:
> + � � � return ret ? notifier_from_errno(ret) : NOTIFY_OK;
> +}
> +#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
> +
> �/*
> �* swap the static kmem_list3 with kmalloced memory
> �*/
> -static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> - � � � � � � � � � � � int nodeid)
> +static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> + � � � � � � � � � � � � � � � int nodeid)
> �{
> � � � �struct kmem_list3 *ptr;
>
> @@ -1580,6 +1665,14 @@ void __init kmem_cache_init_late(void)
> � � � � */
> � � � �register_cpu_notifier(&cpucache_notifier);
>
> +#ifdef CONFIG_NUMA
> + � � � /*
> + � � � �* Register a memory hotplug callback that initializes and frees
> + � � � �* nodelists.
> + � � � �*/
> + � � � hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> +#endif
> +
> � � � �/*
> � � � � * The reap timers are started later, with a module init call: That part
> � � � � * of the kernel is not yet operational.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at �http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at �http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Lameter on 30 Mar 2010 12:50

On Tue, 30 Mar 2010, Pekka Enberg wrote:

> Nick, Christoph, lets make a a deal: you ACK, I merge. How does that
> sound to you?

I looked through the patch before and slabwise this seems to beok but I am
still not very sure how this interacts with the node and cpu bootstrap.
You can have the ack with this caveat.

Acked-by: Christoph Lameter <cl(a)linux-foundation.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: David Rientjes on 4 Apr 2010 16:50

On Tue, 30 Mar 2010, Christoph Lameter wrote:

> > Nick, Christoph, lets make a a deal: you ACK, I merge. How does that
> > sound to you?
>
> I looked through the patch before and slabwise this seems to beok but I am
> still not very sure how this interacts with the node and cpu bootstrap.
> You can have the ack with this caveat.
>
> Acked-by: Christoph Lameter <cl(a)linux-foundation.org>
>

Thanks.

I tested this for node hotplug by setting ACPI_SRAT_MEM_HOT_PLUGGABLE
regions and then setting up a new memory section with
/sys/devices/system/memory/probe. I onlined the new memory section, which
mapped to an offline node, and verified that the nwe nodelists were
initialized correctly. This is done before the MEM_ONLINE notifier and
the bit being set in node_states[N_HIGH_MEMORY]. So, for node hot-add, it
works.

MEM_GOING_OFFLINE is more interesting, but there's nothing harmful about
draining the freelist and reporting whether there are existing full or
partial slabs back to the memory hotplug layer to preempt a hot-remove
since those slabs cannot be freed. I don't consider that to be a risky
change.

As far as the interactions between memory and cpu hotplug, they are really
different things with many of the same implications for the slab layer.
Both have the possibility of bringing new nodes online or offline and they
must be dealt with accordingly. We lack support for offlining an entire
node at a time since we must hotplug first by adding a new memory section,
so these notifiers won't be called simultaneously. Even if they were,
draining the freelist and checking if a nodelist needs to be initialized
is not going to be harmful since both notifiers have the same checks for
existing nodelists (which is not only necessary if we _did_ have
simultaneous cpu and memory hot-add, but also if a node transitioned from
online to offline and back to online).

I hope this patch is merged because it obviously fixed a problem on my box
where a memory section could be added, a node onlined, and then no slab
metadata being initialized for that memory.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: Don't change direction flags in struct request.
Next: w35und: Update README