From: Matt Turner on
Michael Cree and I have been debugging FDO bug 26403 [1]. I tried
booting with `radeon.test=1` and found this, which I think is related:

> [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x202000
> [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x302000
[snip]
> [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfd02000
> [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfe02000
> pci_map_single failed: could not allocate dma page tables
> [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 128 pages at 0x0FF02000
> [TTM] Couldn't bind backend.
> radeon 0000:00:07.0: object_init failed for (1048576, 0x00000002)
> [drm:radeon_test_moves] *ERROR* Failed to create GTT object 253
> Error while testing BO move.

From what I can see, the call chain is
radeon_test_moves
(radeon_ttm_backend_bind called through callback function)
- radeon_ttm.c:radeon_ttm_backend_bind calls radeon_gart_bind
- radeon_gart.c:radeon_gart_bind calls pci_map_page
- pci_map_page is alpha_pci_map_page, which calls...
- alpha_pci_map_page calls pci_iommu.c:pci_map_single_1
- pci_map_single_1 calls iommu_arena_alloc
- iommu_arena_alloc calls iommu_arena_find_pages
- iommu_arena_find_pages returns non-0
- iommu_arena_alloc returns non-0
- pci_map_single_1 returns 0 after printing
"could not allocate dma page tables" error
- alpha_pci_map_page returns 0 from pci_map_single_1
- radeon_gart_bind returns non-0, error path prints
"*ERROR* failed to bind 128 pages at 0x0FF02000"

Is this the cause of the bug we're seeing in the report [1]?

Anyone know what's going wrong here?

Thanks!
Matt Turner

[1] https://bugs.freedesktop.org/show_bug.cgi?id=26403
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: FUJITA Tomonori on
On Mon, 21 Jun 2010 17:19:43 -0400
Matt Turner <mattst88(a)gmail.com> wrote:

> Michael Cree and I have been debugging FDO bug 26403 [1]. I tried
> booting with `radeon.test=1` and found this, which I think is related:
>
> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x202000
> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x302000
> [snip]
> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfd02000
> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfe02000
> > pci_map_single failed: could not allocate dma page tables
> > [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 128 pages at 0x0FF02000
> > [TTM] Couldn't bind backend.
> > radeon 0000:00:07.0: object_init failed for (1048576, 0x00000002)
> > [drm:radeon_test_moves] *ERROR* Failed to create GTT object 253
> > Error while testing BO move.
>
> From what I can see, the call chain is
> radeon_test_moves
> (radeon_ttm_backend_bind called through callback function)
> - radeon_ttm.c:radeon_ttm_backend_bind calls radeon_gart_bind
> - radeon_gart.c:radeon_gart_bind calls pci_map_page
> - pci_map_page is alpha_pci_map_page, which calls...
> - alpha_pci_map_page calls pci_iommu.c:pci_map_single_1
> - pci_map_single_1 calls iommu_arena_alloc
> - iommu_arena_alloc calls iommu_arena_find_pages
> - iommu_arena_find_pages returns non-0
> - iommu_arena_alloc returns non-0
> - pci_map_single_1 returns 0 after printing
> "could not allocate dma page tables" error
> - alpha_pci_map_page returns 0 from pci_map_single_1
> - radeon_gart_bind returns non-0, error path prints
> "*ERROR* failed to bind 128 pages at 0x0FF02000"

This happens in the latest git, right?

Is this a regression (what kernel version worked)?


Seems that the IOMMU can't find 128 pages. It's likely due to:

- out of the IOMMU space (possibly someone doesn't free the IOMMU
space).

or

- the mapping parameters (such as align) aren't appropriate so the
IOMMU can't find space.


> Is this the cause of the bug we're seeing in the report [1]?
>
> Anyone know what's going wrong here?


I've attached a patch to print the debug info about the mapping
parameters.


diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index d1dbd9a..17cf0d8 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -187,6 +187,10 @@ iommu_arena_alloc(struct device *dev, struct pci_iommu_arena *arena, long n,
/* Search for N empty ptes */
ptes = arena->ptes;
mask = max(align, arena->align_entry) - 1;
+
+ printk("%s: %p, %p, %d, %ld, %lx, %u\n", __func__, dev, arena, arena->size,
+ n, mask, align);
+
p = iommu_arena_find_pages(dev, arena, n, mask);
if (p < 0) {
spin_unlock_irqrestore(&arena->lock, flags);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Dave Airlie on
On Tue, Jun 22, 2010 at 3:59 PM, FUJITA Tomonori
<fujita.tomonori(a)lab.ntt.co.jp> wrote:
> On Mon, 21 Jun 2010 17:19:43 -0400
> Matt Turner <mattst88(a)gmail.com> wrote:
>
>> Michael Cree and I have been debugging FDO bug 26403 [1]. I tried
>> booting with `radeon.test=1` and found this, which I think is related:
>>
>> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x202000
>> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x302000
>> [snip]
>> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfd02000
>> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfe02000
>> > pci_map_single failed: could not allocate dma page tables
>> > [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 128 pages at 0x0FF02000
>> > [TTM] Couldn't bind backend.
>> > radeon 0000:00:07.0: object_init failed for (1048576, 0x00000002)
>> > [drm:radeon_test_moves] *ERROR* Failed to create GTT object 253
>> > Error while testing BO move.
>>
>> From what I can see, the call chain is
>> radeon_test_moves
>> �(radeon_ttm_backend_bind called through callback function)
>> �- radeon_ttm.c:radeon_ttm_backend_bind calls radeon_gart_bind
>> � - radeon_gart.c:radeon_gart_bind calls pci_map_page
>> � �- pci_map_page is alpha_pci_map_page, which calls...
>> � � - alpha_pci_map_page calls pci_iommu.c:pci_map_single_1
>> � � �- pci_map_single_1 calls iommu_arena_alloc
>> � � � - iommu_arena_alloc calls iommu_arena_find_pages
>> � � � �- iommu_arena_find_pages returns non-0
>> � � � - iommu_arena_alloc returns non-0
>> � � �- pci_map_single_1 returns 0 after printing
>> � � � �"could not allocate dma page tables" error
>> � � - alpha_pci_map_page returns 0 from pci_map_single_1
>> � - radeon_gart_bind returns non-0, error path prints
>> � � "*ERROR* failed to bind 128 pages at 0x0FF02000"
>
> This happens in the latest git, right?
>
> Is this a regression (what kernel version worked)?
>
>
> Seems that the IOMMU can't find 128 pages. It's likely due to:
>
> - out of the IOMMU space (possibly someone doesn't free the IOMMU
> �space).
>
> or
>
> - the mapping parameters (such as align) aren't appropriate so the
> �IOMMU can't find space.

I don't think KMS drivers have ever worked on alpha so its not a
regression, they are working fine on x86 + powerpc and sparc has been
run at least once.

I suspect we are simply hitting the limits of the iommu, how big an
address space does it handle? since generally graphics drivers try to
bind a lot of things to the GART.

It might be worth limiting the PCIGART in radeon to 32MB to see if the
lower limit helps.

Dave.

>
>
>> Is this the cause of the bug we're seeing in the report [1]?
>>
>> Anyone know what's going wrong here?
>
>
> I've attached a patch to print the debug info about the mapping
> parameters.
>
>
> diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
> index d1dbd9a..17cf0d8 100644
> --- a/arch/alpha/kernel/pci_iommu.c
> +++ b/arch/alpha/kernel/pci_iommu.c
> @@ -187,6 +187,10 @@ iommu_arena_alloc(struct device *dev, struct pci_iommu_arena *arena, long n,
> � � � �/* Search for N empty ptes */
> � � � �ptes = arena->ptes;
> � � � �mask = max(align, arena->align_entry) - 1;
> +
> + � � � printk("%s: %p, %p, %d, %ld, %lx, %u\n", __func__, dev, arena, arena->size,
> + � � � � � � �n, mask, align);
> +
> � � � �p = iommu_arena_find_pages(dev, arena, n, mask);
> � � � �if (p < 0) {
> � � � � � � � �spin_unlock_irqrestore(&arena->lock, flags);
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Michael Cree on
On 22/06/10 20:32, Dave Airlie wrote:
> On Tue, Jun 22, 2010 at 3:59 PM, FUJITA Tomonori
> <fujita.tomonori(a)lab.ntt.co.jp> wrote:
>> On Mon, 21 Jun 2010 17:19:43 -0400
>> Matt Turner<mattst88(a)gmail.com> wrote:
>>
>>> Michael Cree and I have been debugging FDO bug 26403 [1]. I tried
>>> booting with `radeon.test=1` and found this, which I think is related:

Note that my radeon card is PCI whereas I think Matt may be using an AGP
card.

My logs are very similar to Matt's except I don't see the following line:

>>>> pci_map_single failed: could not allocate dma page tables


>> This happens in the latest git, right?

Indeed, testing 2.6.35-rc3 (plus a couple or so extra patches to fix
unrelated compile errors).

>> Is this a regression (what kernel version worked)?
>>
>> Seems that the IOMMU can't find 128 pages. It's likely due to:
>>
>> - out of the IOMMU space (possibly someone doesn't free the IOMMU
>> space).
>>
>> or
>>
>> - the mapping parameters (such as align) aren't appropriate so the
>> IOMMU can't find space.
>
> I don't think KMS drivers have ever worked on alpha so its not a
> regression, they are working fine on x86 + powerpc and sparc has been
> run at least once.

KMS on the console boot up has worked since about 2.6.32, but starting
up the X server has always failed and, in my case, the system becomes
unstable and eventually OOPs.

> I suspect we are simply hitting the limits of the iommu, how big an
> address space does it handle? since generally graphics drivers try to
> bind a lot of things to the GART.

No idea on the address space limit. I applied the patch of Fujita that
logs all IOMMU allocations, and also inserted some extra printks in the
ttm kernel code so that I could see which routines failed and the error
code returned. Running the radeon test on boot exhibits the following:

[ 238.712768] [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset
0x1a312000
[ 239.281127] [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset
0x1a412000
[ 239.281127] ttm_tt_bind belched -12
[ 239.282104] ttm_bo_handle_move_mem belched -12
[ 239.282104] ttm_bo_move_buffer belched -12
[ 239.282104] ttm_bo_validate belched -12
[ 239.282104] radeon 0000:01:00.0: object_init failed for (1048576,
0x00000002) err=-12
[ 239.282104] [drm:radeon_test_moves] *ERROR* Failed to create GTT
object 419
[ 239.399291] Error while testing BO move.

Note that no IOMMU allocations are printed while radeon_test_moves is
running so iommu_arena_alloc doesn't appear to be called. Also the
error code returned up to radeon_test_moves is -12 which is ENOMEM. So
does appear to be some memory limit.

> It might be worth limiting the PCIGART in radeon to 32MB to see if the
> lower limit helps.

So, how does one do that?

Cheers
Michael.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Matt Turner on
On Tue, Jun 22, 2010 at 1:59 AM, FUJITA Tomonori
<fujita.tomonori(a)lab.ntt.co.jp> wrote:
> On Mon, 21 Jun 2010 17:19:43 -0400
> Matt Turner <mattst88(a)gmail.com> wrote:
>
>> Michael Cree and I have been debugging FDO bug 26403 [1]. I tried
>> booting with `radeon.test=1` and found this, which I think is related:
>>
>> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x202000
>> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0x302000
>> [snip]
>> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfd02000
>> > [drm] Tested GTT->VRAM and VRAM->GTT copy for GTT offset 0xfe02000
>> > pci_map_single failed: could not allocate dma page tables
>> > [drm:radeon_ttm_backend_bind] *ERROR* failed to bind 128 pages at 0x0FF02000
>> > [TTM] Couldn't bind backend.
>> > radeon 0000:00:07.0: object_init failed for (1048576, 0x00000002)
>> > [drm:radeon_test_moves] *ERROR* Failed to create GTT object 253
>> > Error while testing BO move.
>>
>> From what I can see, the call chain is
>> radeon_test_moves
>>  (radeon_ttm_backend_bind called through callback function)
>>  - radeon_ttm.c:radeon_ttm_backend_bind calls radeon_gart_bind
>>   - radeon_gart.c:radeon_gart_bind calls pci_map_page
>>    - pci_map_page is alpha_pci_map_page, which calls...
>>     - alpha_pci_map_page calls pci_iommu.c:pci_map_single_1
>>      - pci_map_single_1 calls iommu_arena_alloc
>>       - iommu_arena_alloc calls iommu_arena_find_pages
>>        - iommu_arena_find_pages returns non-0
>>       - iommu_arena_alloc returns non-0
>>      - pci_map_single_1 returns 0 after printing
>>        "could not allocate dma page tables" error
>>     - alpha_pci_map_page returns 0 from pci_map_single_1
>>   - radeon_gart_bind returns non-0, error path prints
>>     "*ERROR* failed to bind 128 pages at 0x0FF02000"
>
> This happens in the latest git, right?

I'm using 2.6.35-rc2, but I could try rc3 if you think it would make a
difference.

> Is this a regression (what kernel version worked)?

The framebuffer console has always worked, but I've never known X on
KMS to work. The radeon.test parameter hasn't existed the entire time,
but I could try still previous kernels.

> Seems that the IOMMU can't find 128 pages. It's likely due to:
>
> - out of the IOMMU space (possibly someone doesn't free the IOMMU
>  space).
>
> or
>
> - the mapping parameters (such as align) aren't appropriate so the
>  IOMMU can't find space.
>
>
>> Is this the cause of the bug we're seeing in the report [1]?
>>
>> Anyone know what's going wrong here?
>
>
> I've attached a patch to print the debug info about the mapping
> parameters.
>
>
> diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
> index d1dbd9a..17cf0d8 100644
> --- a/arch/alpha/kernel/pci_iommu.c
> +++ b/arch/alpha/kernel/pci_iommu.c
> @@ -187,6 +187,10 @@ iommu_arena_alloc(struct device *dev, struct pci_iommu_arena *arena, long n,
>        /* Search for N empty ptes */
>        ptes = arena->ptes;
>        mask = max(align, arena->align_entry) - 1;
> +
> +       printk("%s: %p, %p, %d, %ld, %lx, %u\n", __func__, dev, arena, arena->size,
> +              n, mask, align);
> +
>        p = iommu_arena_find_pages(dev, arena, n, mask);
>        if (p < 0) {
>                spin_unlock_irqrestore(&arena->lock, flags);

Using this patch, I log the attached output.

Thanks for your help so far. :)

Matt