slab: add memory hotplug support [Kernel]

Prev: Don't change direction flags in struct request.
Next: w35und: Update README

From: Christoph Lameter on 3 Mar 2010 11:00

On Wed, 3 Mar 2010, Andi Kleen wrote:

> > But anyway, if you have real technical concerns over the patch, please
> > make them known; otherwise I'd much appreciate a Tested-by tag from
> > you for David's patch.
>
> If it works it would be ok for me. The main concern would be to actually
> get it fixed.

You do not have a testcase? This is a result of code review?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Nick Piggin on 5 Mar 2010 01:30

On Mon, Mar 01, 2010 at 02:24:43AM -0800, David Rientjes wrote:
> Slab lacks any memory hotplug support for nodes that are hotplugged
> without cpus being hotplugged. This is possible at least on x86
> CONFIG_MEMORY_HOTPLUG_SPARSE kernels where SRAT entries are marked
> ACPI_SRAT_MEM_HOT_PLUGGABLE and the regions of RAM represent a seperate
> node. It can also be done manually by writing the start address to
> /sys/devices/system/memory/probe for kernels that have
> CONFIG_ARCH_MEMORY_PROBE set, which is how this patch was tested, and
> then onlining the new memory region.
>
> When a node is hotadded, a nodelist for that node is allocated and
> initialized for each slab cache. If this isn't completed due to a lack
> of memory, the hotadd is aborted: we have a reasonable expectation that
> kmalloc_node(nid) will work for all caches if nid is online and memory is
> available.
>
> Since nodelists must be allocated and initialized prior to the new node's
> memory actually being online, the struct kmem_list3 is allocated off-node
> due to kmalloc_node()'s fallback.
>
> When an entire node is offlined (or an online is aborted), these
> nodelists are subsequently drained and freed. If objects still exist
> either on the partial or full lists for those nodes, the offline is
> aborted. This scenario will not occur for an aborted online, however,
> since objects can never be allocated from those nodelists until the
> online has completed.
>
> Signed-off-by: David Rientjes <rientjes(a)google.com>

This looks OK to me in general. Couple of questions though:

> +#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
> +/*
> + * Drains and frees nodelists for a node on each slab cache, used for memory
> + * hotplug. Returns -EBUSY if all objects cannot be drained on memory
> + * hot-remove so that the node is not removed. When used because memory
> + * hot-add is canceled, the only result is the freed kmem_list3.
> + *
> + * Must hold cache_chain_mutex.
> + */
> +static int __meminit free_cache_nodelists_node(int node)
> +{
> + struct kmem_cache *cachep;
> + int ret = 0;
> +
> + list_for_each_entry(cachep, &cache_chain, next) {
> + struct array_cache *shared;
> + struct array_cache **alien;
> + struct kmem_list3 *l3;
> +
> + l3 = cachep->nodelists[node];
> + if (!l3)
> + continue;
> +
> + spin_lock_irq(&l3->list_lock);
> + shared = l3->shared;
> + if (shared) {
> + free_block(cachep, shared->entry, shared->avail, node);
> + l3->shared = NULL;
> + }
> + alien = l3->alien;
> + l3->alien = NULL;
> + spin_unlock_irq(&l3->list_lock);
> +
> + if (alien) {
> + drain_alien_cache(cachep, alien);
> + free_alien_cache(alien);
> + }
> + kfree(shared);
> +
> + drain_freelist(cachep, l3, l3->free_objects);
> + if (!list_empty(&l3->slabs_full) ||
> + !list_empty(&l3->slabs_partial)) {
> + /*
> + * Continue to iterate through each slab cache to free
> + * as many nodelists as possible even though the
> + * offline will be canceled.
> + */
> + ret = -EBUSY;
> + continue;
> + }
> + kfree(l3);
> + cachep->nodelists[node] = NULL;

What's stopping races of other CPUs trying to access l3 and array
caches while they're being freed?

> + }
> + return ret;
> +}
> +
> +/*
> + * Onlines nid either as the result of memory hot-add or canceled hot-remove.
> + */
> +static int __meminit slab_node_online(int nid)
> +{
> + int ret;
> + mutex_lock(&cache_chain_mutex);
> + ret = init_cache_nodelists_node(nid);
> + mutex_unlock(&cache_chain_mutex);
> + return ret;
> +}
> +
> +/*
> + * Offlines nid either as the result of memory hot-remove or canceled hot-add.
> + */
> +static int __meminit slab_node_offline(int nid)
> +{
> + int ret;
> + mutex_lock(&cache_chain_mutex);
> + ret = free_cache_nodelists_node(nid);
> + mutex_unlock(&cache_chain_mutex);
> + return ret;
> +}
> +
> +static int __meminit slab_memory_callback(struct notifier_block *self,
> + unsigned long action, void *arg)
> +{
> + struct memory_notify *mnb = arg;
> + int ret = 0;
> + int nid;
> +
> + nid = mnb->status_change_nid;
> + if (nid < 0)
> + goto out;
> +
> + switch (action) {
> + case MEM_GOING_ONLINE:
> + case MEM_CANCEL_OFFLINE:
> + ret = slab_node_online(nid);
> + break;

This would explode if CANCEL_OFFLINE fails. Call it theoretical and
put a panic() in here and I don't mind. Otherwise you get corruption
somewhere in the slab code.

> + case MEM_GOING_OFFLINE:
> + case MEM_CANCEL_ONLINE:
> + ret = slab_node_offline(nid);
> + break;
> + case MEM_ONLINE:
> + case MEM_OFFLINE:
> + break;
> + }
> +out:
> + return ret ? notifier_from_errno(ret) : NOTIFY_OK;
> +}
> +#endif /* CONFIG_NUMA && CONFIG_MEMORY_HOTPLUG */
> +
> /*
> * swap the static kmem_list3 with kmalloced memory
> */
> -static void init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> - int nodeid)
> +static void __init init_list(struct kmem_cache *cachep, struct kmem_list3 *list,
> + int nodeid)
> {
> struct kmem_list3 *ptr;
>
> @@ -1583,6 +1713,14 @@ void __init kmem_cache_init_late(void)
> */
> register_cpu_notifier(&cpucache_notifier);
>
> +#ifdef CONFIG_NUMA
> + /*
> + * Register a memory hotplug callback that initializes and frees
> + * nodelists.
> + */
> + hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> +#endif
> +
> /*
> * The reap timers are started later, with a module init call: That part
> * of the kernel is not yet operational.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Anca Emanuel on 5 Mar 2010 07:50

Dumb question: it is possible to hot remove the (bad) memory ? And add
an good one ?
Where is the detection code for the bad module ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Anca Emanuel on 5 Mar 2010 09:00

You can contact Samuel Demeulemeester for help, memtest(a)memtest.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Christoph Lameter on 5 Mar 2010 09:20

On Fri, 5 Mar 2010, Anca Emanuel wrote:

> Dumb question: it is possible to hot remove the (bad) memory ? And add
> an good one ?

Under certain conditions this is possible. If the bad memory was modified
then you have a condition that requires termination of all processes that
are using the memory. If its the kernel then you need to reboot.

If the memory contains a page from disk then the memory can be moved
elsewhere.

If you can clean up a whole range like that then its possible to replace
the memory.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: Don't change direction flags in struct request.
Next: w35und: Update README