From: Peter Crosthwaite on
[Resend of earlier email due to first email bouncing]

Hi,

I'm currently experiencing a kernel bug when munmap'ing a UIO memory
region. The uio memory region is a large (up to 48MB) buffer allocated
by a UIO driver at boot time using alloc_bootmem_low_pages(). The idea
is once the large buffer is allocated, devices can DMA directly to the
buffer which is user space accessible. The system is tested as
working, with the DMA device being able to fill the buffer and user
space being able to see the correct data, except that it throws a bug
once user space munmaps the UIO region. The bug is a "bad page state".
I have summarized the kernel space the driver, the user space program
and the bug below. My first question is - is there anything
fundamentally incorrect with this approach / is there a better way?

The kernel version is (2.6.31.11) and architecture is MicroBlaze.

What happens in the kernel space driver:

� �-The buffer is allocated at boot time using alloc_bootmem_low_pages()

� � � �unsigned buf_size = 0x00010000; /*size of 64k */
� � � �b_virt = alloc_bootmem_low_pages(PAGE_
ALIGN(buf_size));

� �-The address returned is set as the base address for a UIO memory
region and the UIO device is created:

� � � �struct uio_info * usdma_uio_info;
� � � �... //name version and IRQ are set
� � � �usdma_uio_info->mem[0].addr =b_virt; //This is the address
returned by alloc_bootmem_low_pages()
� � � �usdma_uio_info->mem[0].size = buf_size;
� � � �usdma_uio_info->mem[0].memtype = UIO_MEM_LOGICAL;
� � � �usdma_uio_info->mem[0].internal_addr = b_virt;
� � � �uio_register_device(dev, usdma_uio_info);

What happens in the user space program:

� �-The UIO device is opened and mmap'ed (to in_ptr)

� � � �in_fd=open("/dev/uio0",O_RDWR);
� � � �char * in_ptr=mmap(NULL, size, PROT_READ, MAP_SHARED, in_fd, 0);
� � � �if(!in_ptr) {
� � � � � �perror("mmap:");
� � � � � �return -1;
� � � �}

� �-Write the buffer out to some random file (out_fd)

� � � �for (bytes_written = 0; bytes_written < size;) {
� � � � � �bytes_written += write(out_fd, in_ptr+bytes_written, size);
� � � �}

� �-The UIO memory region is unmap (this is when the error occurs)

� � � �munmap(in_ptr, size);

The bug:

The output from dmesg (after the user space program is run) is below.
This output happens multiple times, i.e. the bug is replicated for all
the mapped pages. Curiously, the bug only happens when the pages are
touched by the user space program, e.g. if the example user space
program given above does not write() the buffer contents out to file,
the bug does not occur (and the munmap completes successfully).

Further investigation revealed that the reason the bad_page function
was being called is that free_hot_cold_pages (mm/page_alloc.c) does
not like pages with either the PG_slab or PG_buddy flags set. The bug
will always show one of these flags being set (PG_slab = 0x00000080 in
the case below), for the page that is being freed. Which flag is set
depends on the size of the buffer - small buffers its PG_slab large
buffers its PG_buddy.

My second question is should the kernel be trying to free these pages
(using free_hot_cold_page) at all?? - Considering my kernel space
driver still has them mapped locally??

BUG: Bad page state in process mmunmap_bug_hun �pfn:4ee0f
page:c09ff1e0 flags:00000084 count:0 mapcount:0 mapping:(null) index:0

Stack:
�c0044150 c023f330 c6e85d9c 00002095 44591000 00002095 c6e85db8 c0044958
�c01e0c10 c09ff1e0 00000084 00000000 00000000 00000000 00000000 c09ff1e0
�c0044b6c 00010000 00000000 c0048f08 c7468cf4 c6e85e60 00000000 c09ff1e0
Call Trace:

[<c0044150>] bad_page+0x12c/0x160
[<c0044958>] free_hot_cold_page+0x94/0x224
[<c0044b6c>] free_hot_page+0x8/0x1c
[<c0048f08>] ____pagevec_lru_add+0x194/0x1cc
[<c004935c>] put_page+0x164/0x178
[<c00fda5c>] process_output+0x40/0x74
[<c00fda6c>] process_output+0x50/0x74
[<c00511e8>] unmap_vmas+0x31c/0x5ac
[<c0051230>] unmap_vmas+0x364/0x5ac
[<c005577c>] unmap_region+0xb0/0x168
[<c0049164>] lru_add_drain+0x34/0x84
[<c005661c>] do_munmap+0x200/0x298
[<c00566f0>] sys_munmap+0x3c/0x74
[<c01d3054>] down_write+0xc/0x20
[<c00635e8>] sys_write+0x54/0xa4
[<c000568c>] sys_mmap2+0x108/0x13c
[<c00076e8>] _user_exception+0x228/0x230
[<c0008550>] irq_call+0x0/0x8

Thanks in Advance,
Peter Crosthwaite
Petalogix
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Peter Crosthwaite on
Hi Greg,

Thanks for your reply on the mmunmap issue. Sorry about the delay on
this correspondance.

I have looked into this bug in more detail. The
alloc_bootmem_low_pages() call is falling back to a call to kzalloc(),
so the address passed to UIO when used in UIO_MEM_LOGICAL is a return
from kmalloc(). So my first question is, is kmalloc'ed memory
supported by UIO?

With regards to the copying the data from the buffer to file, yes it
is showing the correct data.

I have since resolved the BUG() by manually modifying the usage
counters for the buffer pages from kernel space. i.e. Once the memory
is kmalloc'ed the driver will iterate through all the pages and
increment the _count field of the struct page. This will cause the
pages to have a user count of 2 when mmaped (by user space) which
reverts to 1 when unmapped. Now this fixes the bug, but should this
manual increment be necessary? Is there a cleaner way in the kernel
API for kernel space to mark itself as a user of a memory range or
user space VMA?

You mentioned linking you up with the source code for my driver. Im
trying to put together a minimal driver that replicates this bug, but
it seems UIO enforces the need for a parent device when initialised.
Considering this bug requires no actual hardware to replicate, is
there a way to get a UIO device without a physical device to be able
to test this behaviour in isolation?

Regards
Peter Crosthwaite


On Fri, Jul 9, 2010 at 9:39 AM, Greg KH <gregkh(a)suse.de> wrote:
> On Wed, Jul 07, 2010 at 04:36:02PM +1000, Peter Crosthwaite wrote:
>> Hi,
>>
>> I'm currently experiencing a kernel bug when munmap'ing a UIO memory region.
>> The uio memory region is a large (up to 48MB) buffer allocated by a UIO
>> driver at boot time using alloc_bootmem_low_pages(). The idea is once the
>> large buffer is allocated, devices can DMA directly to the buffer which is
>> user space accessible. The system is tested as working, with the DMA device
>> being able to fill the buffer and user space being able to see the correct
>> data, except that it throws a bug once user space munmaps the UIO region.
>> The bug is a "bad page state". I have summarized the kernel space the
>> driver, the user space program and the bug below. My first question is - is
>> there anything fundamentally incorrect with this approach / is there a
>> better way?
>>
>> The kernel version is (2.6.31.11) and architecture is MicroBlaze.
>>
>> What happens in the kernel space driver:
>>
>> � � -The buffer is allocated at boot time using alloc_bootmem_low_pages()
>>
>> � � � � unsigned buf_size = 0x00010000; /*size of 64k */
>> � � � � b_virt = alloc_bootmem_low_pages(PAGE_
>> ALIGN(buf_size));
>>
>> � � -The address returned is set as the base address for a UIO memory region
>> and the UIO device is created:
>>
>> � � � � struct uio_info * usdma_uio_info;
>> � � � � ... //name version and IRQ are set
>> � � � � usdma_uio_info->mem[0].addr =b_virt; //This is the address returned
>> by alloc_bootmem_low_pages()
>
> Yeah, but is this a valid address that userspace has access to? �Or is
> this a "virtual" address? �I thought you had to "remap" this memory to
> properly access it but I don't know this architecture good enough to be
> sure about that.
>
> Have a pointer to your whole kernel driver anywhere?
>
>> � � � � usdma_uio_info->mem[0].size = buf_size;
>> � � � � usdma_uio_info->mem[0].memtype = UIO_MEM_LOGICAL;
>> � � � � usdma_uio_info->mem[0].internal_addr = b_virt;
>> � � � � uio_register_device(dev, usdma_uio_info);
>>
>> What happens in the user space program:
>>
>> � � -The UIO device is opened and mmap'ed (to in_ptr)
>>
>> � � � � in_fd=open("/dev/uio0",O_RDWR);
>> � � � � char * in_ptr=mmap(NULL, size, PROT_READ, MAP_SHARED, in_fd, 0);
>> � � � � if(!in_ptr) {
>> � � � � � � perror("mmap:");
>> � � � � � � return -1;
>> � � � � }
>>
>> � � -Write the buffer out to some random file (out_fd)
>>
>> � � � � for (bytes_written = 0; bytes_written < size;) {
>> � � � � � � bytes_written += write(out_fd, in_ptr+bytes_written, size);
>> � � � � }
>
> Is this showing the correct data?
>
>> � � -The UIO memory region is unmap (this is when the error occurs)
>>
>> � � � � munmap(in_ptr, size);
>>
>> The bug:
>>
>> The output from dmesg (after the user space program is run) is below. This
>> output happens multiple times, i.e. the bug is replicated for all the mapped
>> pages. Curiously, the bug only happens when the pages are touched by the
>> user space program, e.g. if the example user space program given above does
>> not write() the buffer contents out to file, the bug does not occur (and the
>> munmap completes successfully).
>>
>> Further investigation revealed that the reason the bad_page function was
>> being called is that free_hot_cold_pages (mm/page_alloc.c) does not like
>> pages with either the PG_slab or PG_buddy flags set. The bug will always
>> show one of these flags being set (PG_slab = 0x00000080 in the case below),
>> for the page that is being freed. Which flag is set depends on the size of
>> the buffer - small buffers its PG_slab large buffers its PG_buddy.
>>
>> My second question is should the kernel be trying to free these pages (using
>> free_hot_cold_page) at all?? - Considering my kernel space driver still has
>> them mapped locally??
>
> Good question, who is trying to free them?
>
> wierd.
>
> greg k-h
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Hans J. Koch on
On Wed, Jul 21, 2010 at 05:17:17PM +1000, Peter Crosthwaite wrote:
> Hi Greg,
>
> Thanks for your reply on the mmunmap issue. Sorry about the delay on
> this correspondance.
>
> I have looked into this bug in more detail. The
> alloc_bootmem_low_pages() call is falling back to a call to kzalloc(),
> so the address passed to UIO when used in UIO_MEM_LOGICAL is a return
> from kmalloc(). So my first question is, is kmalloc'ed memory
> supported by UIO?

Yes, of course. UIO_MEM_LOGICAL is the correct memtype for that. But
that applies only to memory you get _directly_ from kmalloc().
For example, dma_alloc_coherent() on ARM internally gets its memory from
kmalloc, too. But it needs a completely different mapping routine,
trying to map it using UIO_MEM_LOGICAL will fail.

>
> With regards to the copying the data from the buffer to file, yes it
> is showing the correct data.
>
> I have since resolved the BUG() by manually modifying the usage
> counters for the buffer pages from kernel space. i.e. Once the memory
> is kmalloc'ed the driver will iterate through all the pages and
> increment the _count field of the struct page. This will cause the
> pages to have a user count of 2 when mmaped (by user space) which
> reverts to 1 when unmapped. Now this fixes the bug, but should this
> manual increment be necessary? Is there a cleaner way in the kernel
> API for kernel space to mark itself as a user of a memory range or
> user space VMA?
>
> You mentioned linking you up with the source code for my driver. Im
> trying to put together a minimal driver that replicates this bug, but
> it seems UIO enforces the need for a parent device when initialised.
> Considering this bug requires no actual hardware to replicate, is
> there a way to get a UIO device without a physical device to be able
> to test this behaviour in isolation?

There once was the uio_dummy driver. It maps some kmalloc'ed memory
to userspace and uses a timer to simulate interrupts. Just google
for "uio_dummy" and you'll find it. It's quite old (2.6.23), so it
will need some fixing. But it should give you the idea.

Thanks,
Hans
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/