From: Pekka Enberg on
On Fri, Jul 30, 2010 at 12:47 AM, John Johansen
<john.johansen(a)canonical.com> wrote:
> +/**
> + * kvmalloc - do allocation preferring kmalloc but falling back to vmalloc
> + * @size: size of allocation
> + *
> + * Return: allocated buffer or NULL if failed
> + *
> + * It is possible that policy being loaded from the user is larger than
> + * what can be allocated by kmalloc, in those cases fall back to vmalloc.
> + */
> +void *kvmalloc(size_t size)
> +{
> + � � � void *buffer = NULL;
> +
> + � � � if (size == 0)
> + � � � � � � � return NULL;
> +
> + � � � /* do not attempt kmalloc if we need more than 16 pages at once */
> + � � � if (size <= (16*PAGE_SIZE))
> + � � � � � � � buffer = kmalloc(size, GFP_NOIO | __GFP_NOWARN);

16 pages is a lot of memory for 64 K pages. What's the purpose of
GFP_NOIO here? vmalloc() will do GFP_KERNEL allocations anyway.

> + � � � if (!buffer) {
> + � � � � � � � /* see kvfree for why size must be at least work_struct size
> + � � � � � � � �* when allocated via vmalloc
> + � � � � � � � �*/
> + � � � � � � � if (size < sizeof(struct work_struct))
> + � � � � � � � � � � � size = sizeof(struct work_struct);
> + � � � � � � � buffer = vmalloc(size);
> + � � � }
> + � � � return buffer;
> +}

Please don't hide this into apparmor internals. People have invented
this function in the past so maybe it's time to put it in mm/util.c?

> +
> +/**
> + * do_vfree - workqueue routine for freeing vmalloced memory
> + * @work: data to be freed
> + *
> + * The work_struct is overlaid to the data being freed, as at the point
> + * the work is scheduled the data is no longer valid, be its freeing
> + * needs to be delayed until safe.
> + */
> +static void do_vfree(struct work_struct *work)
> +{
> + � � � vfree(work);
> +}
> +
> +/**
> + * kvfree - free an allocation do by kvmalloc
> + * @buffer: buffer to free (MAYBE_NULL)
> + *
> + * Free a buffer allocated by kvmalloc
> + */
> +void kvfree(void *buffer)
> +{
> + � � � if (is_vmalloc_addr(buffer)) {
> + � � � � � � � /* Data is no longer valid so just use the allocated space
> + � � � � � � � �* as the work_struct
> + � � � � � � � �*/
> + � � � � � � � struct work_struct *work = (struct work_struct *) buffer;
> + � � � � � � � INIT_WORK(work, do_vfree);
> + � � � � � � � schedule_work(work);

I don't understand this part here. Is it needed for interrupt contexts
or does vfree() sleep somewhere? If it's for the former, I think we
can just add a comment saying that kvmalloc/kvfree is not safe from
interrupt context and remove the schedule_work() parts here.

> + � � � } else
> + � � � � � � � kfree(buffer);
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: John Johansen on
On 07/30/2010 02:20 AM, Pekka Enberg wrote:
> On Fri, Jul 30, 2010 at 12:47 AM, John Johansen
> <john.johansen(a)canonical.com> wrote:
>> +/**
>> + * kvmalloc - do allocation preferring kmalloc but falling back to vmalloc
>> + * @size: size of allocation
>> + *
>> + * Return: allocated buffer or NULL if failed
>> + *
>> + * It is possible that policy being loaded from the user is larger than
>> + * what can be allocated by kmalloc, in those cases fall back to vmalloc.
>> + */
>> +void *kvmalloc(size_t size)
>> +{
>> + void *buffer = NULL;
>> +
>> + if (size == 0)
>> + return NULL;
>> +
>> + /* do not attempt kmalloc if we need more than 16 pages at once */
>> + if (size <= (16*PAGE_SIZE))
>> + buffer = kmalloc(size, GFP_NOIO | __GFP_NOWARN);
>
> 16 pages is a lot of memory for 64 K pages. What's the purpose of
yes it is, and I don't expect it will every allocate that much, though it
will occassionally with large policies do allocations larger than 16*4K.
The figure here is some what arbitrary, and I would certainly be willing
to shrink it. Basically it is there to put a clamp on allocating precious
physically contiguous memory.

> GFP_NOIO here? vmalloc() will do GFP_KERNEL allocations anyway.
>
yep, and it used to be GFP_KERNEL too, looking back GFP_NOIO happend when
poking at a bug where apparmor was trigger a IO when it was allocating its
memory. Turned out the bug wasn't apparmor related just being triggered
while apparmor was loading policy, but the GFP_NOIO flag stuck here.
I am more than willing to flip it back.

>> + if (!buffer) {
>> + /* see kvfree for why size must be at least work_struct size
>> + * when allocated via vmalloc
>> + */
>> + if (size < sizeof(struct work_struct))
>> + size = sizeof(struct work_struct);
>> + buffer = vmalloc(size);
>> + }
>> + return buffer;
>> +}
>
> Please don't hide this into apparmor internals. People have invented
> this function in the past so maybe it's time to put it in mm/util.c?
>
sure, I would be more than willing to replace this with a generic
system fn. The last attempt I saw at adding generic routines of this
nature was here
http://www.spinics.net/lists/linux-fsdevel/msg31407.html

>> +
>> +/**
>> + * do_vfree - workqueue routine for freeing vmalloced memory
>> + * @work: data to be freed
>> + *
>> + * The work_struct is overlaid to the data being freed, as at the point
>> + * the work is scheduled the data is no longer valid, be its freeing
>> + * needs to be delayed until safe.
>> + */
>> +static void do_vfree(struct work_struct *work)
>> +{
>> + vfree(work);
>> +}
>> +
>> +/**
>> + * kvfree - free an allocation do by kvmalloc
>> + * @buffer: buffer to free (MAYBE_NULL)
>> + *
>> + * Free a buffer allocated by kvmalloc
>> + */
>> +void kvfree(void *buffer)
>> +{
>> + if (is_vmalloc_addr(buffer)) {
>> + /* Data is no longer valid so just use the allocated space
>> + * as the work_struct
>> + */
>> + struct work_struct *work = (struct work_struct *) buffer;
>> + INIT_WORK(work, do_vfree);
>> + schedule_work(work);
>
> I don't understand this part here. Is it needed for interrupt contexts
> or does vfree() sleep somewhere? If it's for the former, I think we
> can just add a comment saying that kvmalloc/kvfree is not safe from
> interrupt context and remove the schedule_work() parts here.
>
vfree can sleep, and skipping the schedule_work parts won't work for
apparmor as many of these allocations are being freed via rcu callbacks
as most of our object life cycles are dependent on cred refcounting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pekka Enberg on
On Fri, Jul 30, 2010 at 1:01 PM, John Johansen
<john.johansen(a)canonical.com> wrote:
>>> +/**
>>> + * kvfree - free an allocation do by kvmalloc
>>> + * @buffer: buffer to free (MAYBE_NULL)
>>> + *
>>> + * Free a buffer allocated by kvmalloc
>>> + */
>>> +void kvfree(void *buffer)
>>> +{
>>> + � � � if (is_vmalloc_addr(buffer)) {
>>> + � � � � � � � /* Data is no longer valid so just use the allocated space
>>> + � � � � � � � �* as the work_struct
>>> + � � � � � � � �*/
>>> + � � � � � � � struct work_struct *work = (struct work_struct *) buffer;
>>> + � � � � � � � INIT_WORK(work, do_vfree);
>>> + � � � � � � � schedule_work(work);
>>
>> I don't understand this part here. Is it needed for interrupt contexts
>> or does vfree() sleep somewhere? If it's for the former, I think we
>> can just add a comment saying that kvmalloc/kvfree is not safe from
>> interrupt context and remove the schedule_work() parts here.
>
> vfree can sleep, and skipping the schedule_work parts won't work for
> apparmor as many of these allocations are being freed via rcu callbacks
> as most of our object life cycles are dependent on cred refcounting.

Can someone point me to where vfree() actually sleeps? I'm unable to
find the exact spot.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Changli Gao on
On Fri, Jul 30, 2010 at 6:53 PM, Pekka Enberg <penberg(a)cs.helsinki.fi> wrote:
> On Fri, Jul 30, 2010 at 1:01 PM, John Johansen
>>
>> vfree can sleep, and skipping the schedule_work parts won't work for
>> apparmor as many of these allocations are being freed via rcu callbacks
>> as most of our object life cycles are dependent on cred refcounting.
>
> Can someone point me to where vfree() actually sleeps? I'm unable to
> find the exact spot.
>

http://lxr.linux.no/linux+v2.6.34.1/mm/vmalloc.c#L1405 . vfree ->
__vunmap, vunmap -> __vunmap, and there is a might_sleep() function in
vunmap.

BTW: I'll respin the kvmalloc patch later.

Thanks.

--
Regards,
Changli Gao(xiaosuo(a)gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Pekka Enberg on
On Fri, Jul 30, 2010 at 5:24 PM, Changli Gao <xiaosuo(a)gmail.com> wrote:
> On Fri, Jul 30, 2010 at 6:53 PM, Pekka Enberg <penberg(a)cs.helsinki.fi> wrote:
>> On Fri, Jul 30, 2010 at 1:01 PM, John Johansen
>>>
>>> vfree can sleep, and skipping the schedule_work parts won't work for
>>> apparmor as many of these allocations are being freed via rcu callbacks
>>> as most of our object life cycles are dependent on cred refcounting.
>>
>> Can someone point me to where vfree() actually sleeps? I'm unable to
>> find the exact spot.
>
> http://lxr.linux.no/linux+v2.6.34.1/mm/vmalloc.c#L1405 . vfree ->
> __vunmap, vunmap -> __vunmap, and there is a might_sleep() function in
> vunmap.

Yes, but that doesn't answer my question. Where's the actual call-site
that can sleep? I can't find it!

> BTW: I'll respin the kvmalloc patch later.

Great!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/