From: Jonathan Corbet on
On Tue, 27 Jul 2010 16:53:03 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:

> This virt-array allocates a virtally contiguous array via get_vm_area()
> and allows object allocation per an element of array.

Quick question: this looks a lot like the "flexible array" mechanism
which went in around a year ago, and which is documented in
Documentation/flexible-arrays.txt. I'm not sure we need two of
these... That said, it appears that there are still no users of
flexible arrays. If your virtually-indexed arrays provide something
that flexible arrays don't, perhaps your implementation should replace
flexible arrays?

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Tue, 27 Jul 2010 12:29:49 -0600
Jonathan Corbet <corbet(a)lwn.net> wrote:

> On Tue, 27 Jul 2010 16:53:03 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
>
> > This virt-array allocates a virtally contiguous array via get_vm_area()
> > and allows object allocation per an element of array.
>
> Quick question: this looks a lot like the "flexible array" mechanism
> which went in around a year ago, and which is documented in
> Documentation/flexible-arrays.txt. I'm not sure we need two of
> these... That said, it appears that there are still no users of
> flexible arrays. If your virtually-indexed arrays provide something
> that flexible arrays don't, perhaps your implementation should replace
> flexible arrays?

Hmm. As Documentatin/flexible-arrays.txt says,

"The down sides are that the arrays cannot be indexed directly, individual object
size cannot exceed the system page size, and putting data into a flexible array
requires a copy operation. "

This virtually-indexed array is

- the arrays can be indexed directly.
- individual object size can be defined arbitrary.
- puttind data into virt-array requires memory allocation via alloc_varray_item().

But, virtyally-indexed array has its own down side, too.

- It uses vmalloc() area. This can be very limited in 32bit archs.
- It cannot be used in !MMU archs.
- It may consume much TLBs because vmalloc area tends not to be backed by hugepage.

Especially, I think !MMU case is much different. So, there are functional
difference. I implemented this to do quick direct access to objects by indexes.
Otherwise, flex-array may be able to provide more generic frameworks.

Then, I myself don't think virt-array is a replacemento for flex-array.

A discussion "flex-array should be dropped or not" is out of my scope, sorry.
I think you can ask to drop it just because it's almost dead without mentioning
virt-array.

Thanks,
-Kame



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Tue, 27 Jul 2010 16:53:03 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:

> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
>
> This virt-array allocates a virtally contiguous array via get_vm_area()
> and allows object allocation per an element of array.
> Physical pages are used only for used items in the array.
>
> - At first, the user has to create an array by create_virt_array().
> - At using an element, virt_array_alloc_index(index) should be called.
> - At freeing an element, virt_array_free_index(index) should be called.
> - At destroying, destroy_virt_array() should be called.
>
> Item used/unused status is controlled by bitmap and back-end physical
> pages are automatically allocated/freed. This is useful when you
> want to access objects by index in light weight. For example,
>
> create_virt_array(va);
> struct your_struct *objmap = va->address;
> Then, you can access your objects by objmap[i].
>
> In usual case, holding reference by index rather than pointer can save memory.
> But index -> object lookup cost cannot be negligible. In such case,
> this virt-array may be helpful. Ah yes, if lookup performance is not important,
> using radix-tree will be better (from TLB point of view). This virty-array
> may consume VMALLOC area too much. and alloc/free routine is very slow.
>
> Changelog:
> - fixed bugs in bitmap ops.
> - add offset for find_free_index.
>

My gut reaction to this sort of thing is "run away in terror". It
encourages kernel developers to operate like lackadaisical userspace
developers and to assume that underlying code can perform heroic and
immortal feats. But it can't. This is the kernel and the kernel is a
tough and hostile place and callers should be careful and defensive and
take great efforts to minimise the strain they put upon other systems.

IOW, can we avoid doing this?

>
> ...
>
> +void free_varray_item(struct virt_array *v, int idx)
> +{
> + mutex_lock(&v->mutex);
> + __free_unmap_entry(v, idx);
> + mutex_unlock(&v->mutex);
> +}

It's generally a bad idea for library code to perform its own locking.
In this case we've just made this whole facility inaccessible to code
which runs from interrupt or atomic contexts.

> + pg[0] = alloc_page(GFP_KERNEL);

And hard-wiring GFP_KERNEL makes this facility inaccessible to GFP_NOIO
and GFP_NOFS contexts as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Wed, 28 Jul 2010 12:45:13 -0700
Andrew Morton <akpm(a)linux-foundation.org> wrote:

> On Tue, 27 Jul 2010 16:53:03 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:
>
> > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com>
> >
> > This virt-array allocates a virtally contiguous array via get_vm_area()
> > and allows object allocation per an element of array.
> > Physical pages are used only for used items in the array.
> >
> > - At first, the user has to create an array by create_virt_array().
> > - At using an element, virt_array_alloc_index(index) should be called.
> > - At freeing an element, virt_array_free_index(index) should be called.
> > - At destroying, destroy_virt_array() should be called.
> >
> > Item used/unused status is controlled by bitmap and back-end physical
> > pages are automatically allocated/freed. This is useful when you
> > want to access objects by index in light weight. For example,
> >
> > create_virt_array(va);
> > struct your_struct *objmap = va->address;
> > Then, you can access your objects by objmap[i].
> >
> > In usual case, holding reference by index rather than pointer can save memory.
> > But index -> object lookup cost cannot be negligible. In such case,
> > this virt-array may be helpful. Ah yes, if lookup performance is not important,
> > using radix-tree will be better (from TLB point of view). This virty-array
> > may consume VMALLOC area too much. and alloc/free routine is very slow.
> >
> > Changelog:
> > - fixed bugs in bitmap ops.
> > - add offset for find_free_index.
> >
>
> My gut reaction to this sort of thing is "run away in terror". It
> encourages kernel developers to operate like lackadaisical userspace
> developers and to assume that underlying code can perform heroic and
> immortal feats. But it can't. This is the kernel and the kernel is a
> tough and hostile place and callers should be careful and defensive and
> take great efforts to minimise the strain they put upon other systems.
>
> IOW, can we avoid doing this?
>

Hmm. To pack more information into page_cgroup, I'd like reduce the size of it.
One candidate is pc->mem_cgroup, a pointer. mem_cgroup has its own ID already.

If we replace pc->mem_cgroup from a pointer to ID,
- struct mem_cgroup *mem = pc->mem_cgroup;
+ struct mem_cgroup *mem = id_to_mem_cgroup(pc->mem_cgroup);

call will be added.

The problem is that we have to call id_to_mem_cgroup routine even in
add_to_lru() and delete_from_lru(). Any kind of "lookup" routines aren't
enough fast. So, I'd like to use an array[].
(mem_cgroup_add_to_lru(), delete_from_lru() has been slow codes already...
andI hear there is a brave guy who uses 2000+ memory cgroup on a host.

With this, we just pay pointer calculation cost but have no more memory access.

The basic idea of this virt-array[] implemenation is vmemmap(virtual memmap).
But it's highly hard-coded per architecture and use very-very big space.
So, I wrote a generic library routine.

One another idea I thought of a technique making this kind of lookup
is to prepare per-cpu lookup cache. But it adds much more complicated
controls and can't garantee "works well always".

IOW, I love this virt-array[]. But it's okay to move this routine to
memcontrol.c and don't show any interface to others if you have concerns.

If you recommened more 16bytes cost per page rather than implementing a
thing like this, okay, it's much easier.



> >
> > ...
> >
> > +void free_varray_item(struct virt_array *v, int idx)
> > +{
> > + mutex_lock(&v->mutex);
> > + __free_unmap_entry(v, idx);
> > + mutex_unlock(&v->mutex);
> > +}
>
> It's generally a bad idea for library code to perform its own locking.
> In this case we've just made this whole facility inaccessible to code
> which runs from interrupt or atomic contexts.
>
hmm, IIUC, because this use codes like vmalloc, this can't be used in atomic
context. But ok, I'll move this lock and adds comment.


> > + pg[0] = alloc_page(GFP_KERNEL);
>
> And hard-wiring GFP_KERNEL makes this facility inaccessible to GFP_NOIO
> and GFP_NOFS contexts as well.
>
ok, I'll pass gfp mask.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: KAMEZAWA Hiroyuki on
On Thu, 29 Jul 2010 09:32:26 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu(a)jp.fujitsu.com> wrote:

> On Wed, 28 Jul 2010 12:45:13 -0700
> Andrew Morton <akpm(a)linux-foundation.org> wrote:

> > My gut reaction to this sort of thing is "run away in terror". It
> > encourages kernel developers to operate like lackadaisical userspace
> > developers and to assume that underlying code can perform heroic and
> > immortal feats. But it can't. This is the kernel and the kernel is a
> > tough and hostile place and callers should be careful and defensive and
> > take great efforts to minimise the strain they put upon other systems.
> >
> > IOW, can we avoid doing this?
> >
>

I'll use pre-allocated pointer array in the next version. It's simple even
if a bit slow.

==
struct mem_cgroup *mem_cgroups[CONFIG_MAX_MEM_CGROUPS] __read_mostly;
#define id_to_memcg(id) mem_cgroups[id];
==
But this can use hugepage-mapping-for-kernel...so, can be quick.

Thanks,
-Kame



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/