From: Jaswinder Singh Rajput on
Hello,

With latest git kernel, I am getting following DRM error and not
getting XWindows :

[ 45.269075] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[ 45.269111] ------------[ cut here ]------------
[ 45.269139] WARNING: at mm/highmem.c:453 debug_kmap_atomic+0xa9/0x11e()
[ 45.269150] Hardware name: Aspire one
[ 45.269158] Modules linked in: nf_conntrack_ftp ath9k ath9k_common
battery ath9k_hw [last unloaded: scsi_wait_scan]
[ 45.269198] Pid: 0, comm: swapper Not tainted 2.6.34-rc7-netbook #6
[ 45.269208] Call Trace:
[ 45.269231] [<c1030ecb>] warn_slowpath_common+0x65/0x7c
[ 45.269249] [<c108ce5d>] ? debug_kmap_atomic+0xa9/0x11e
[ 45.269267] [<c1030eef>] warn_slowpath_null+0xd/0x10
[ 45.269284] [<c108ce5d>] debug_kmap_atomic+0xa9/0x11e
[ 45.269304] [<c10207c9>] kmap_atomic_prot+0x4d/0xb2
[ 45.269321] [<c102083c>] kmap_atomic+0xe/0x10
[ 45.269341] [<c11f7d64>] i915_error_object_create+0xea/0x14f
[ 45.269359] [<c11f8132>] i915_handle_error+0x369/0x868
[ 45.269380] [<c11f86d0>] i915_hangcheck_elapsed+0x9f/0xdf
[ 45.269399] [<c103ab6e>] run_timer_softirq+0x1c9/0x269
[ 45.269417] [<c11f8631>] ? i915_hangcheck_elapsed+0x0/0xdf
[ 45.269435] [<c1035b7b>] __do_softirq+0xc6/0x186
[ 45.269451] [<c1035c61>] do_softirq+0x26/0x2b
[ 45.269466] [<c1035dd2>] irq_exit+0x29/0x66
[ 45.269484] [<c101681f>] smp_apic_timer_interrupt+0x6e/0x7c
[ 45.269504] [<c141f826>] apic_timer_interrupt+0x2a/0x30
[ 45.269524] [<c104007b>] ? ftrace_raw_event_signal_generate+0x6d/0xd4
[ 45.269542] [<c11bed9d>] ? acpi_idle_enter_simple+0x13b/0x168
[ 45.269563] [<c12dd2b9>] cpuidle_idle_call+0x6b/0xda
[ 45.269580] [<c1001a3c>] cpu_idle+0x44/0x74
[ 45.269598] [<c141a041>] start_secondary+0x1b2/0x1b7
[ 45.269612] ---[ end trace ce01d7ca0ae214f4 ]---
[ 45.269631] ------------[ cut here ]------------
[ 45.269647] WARNING: at mm/highmem.c:453 debug_kmap_atomic+0xa9/0x11e()
[ 45.269657] Hardware name: Aspire one
[ 45.269665] Modules linked in: nf_conntrack_ftp ath9k ath9k_common
battery ath9k_hw [last unloaded: scsi_wait_scan]
[ 45.269700] Pid: 0, comm: swapper Tainted: G W 2.6.34-rc7-netbook #6
[ 45.269710] Call Trace:
[ 45.269726] [<c1030ecb>] warn_slowpath_common+0x65/0x7c
[ 45.269743] [<c108ce5d>] ? debug_kmap_atomic+0xa9/0x11e
[ 45.269760] [<c1030eef>] warn_slowpath_null+0xd/0x10
[ 45.269777] [<c108ce5d>] debug_kmap_atomic+0xa9/0x11e
[ 45.269795] [<c10207c9>] kmap_atomic_prot+0x4d/0xb2
[ 45.269812] [<c102083c>] kmap_atomic+0xe/0x10
[ 45.269829] [<c11f7d64>] i915_error_object_create+0xea/0x14f
[ 45.269848] [<c11f8132>] i915_handle_error+0x369/0x868
[ 45.269868] [<c11f86d0>] i915_hangcheck_elapsed+0x9f/0xdf
[ 45.269885] [<c103ab6e>] run_timer_softirq+0x1c9/0x269
[ 45.269903] [<c11f8631>] ? i915_hangcheck_elapsed+0x0/0xdf
[ 45.269920] [<c1035b7b>] __do_softirq+0xc6/0x186
[ 45.269937] [<c1035c61>] do_softirq+0x26/0x2b
[ 45.269952] [<c1035dd2>] irq_exit+0x29/0x66
[ 45.269968] [<c101681f>] smp_apic_timer_interrupt+0x6e/0x7c
[ 45.269985] [<c141f826>] apic_timer_interrupt+0x2a/0x30
[ 45.270004] [<c104007b>] ? ftrace_raw_event_signal_generate+0x6d/0xd4
[ 45.270051] [<c11bed9d>] ? acpi_idle_enter_simple+0x13b/0x168
[ 45.270071] [<c12dd2b9>] cpuidle_idle_call+0x6b/0xda
[ 45.270087] [<c1001a3c>] cpu_idle+0x44/0x74
[ 45.270104] [<c141a041>] start_secondary+0x1b2/0x1b7
[ 45.270117] ---[ end trace ce01d7ca0ae214f5 ]---
[ 45.270135] ------------[ cut here ]------------

dmesg : http://userweb.kernel.org/~jaswinder/acer_netbook/dmesg_2634-rc7.txt
..config : http://userweb.kernel.org/~jaswinder/acer_netbook/config_2634-rc7.txt

How can I fix these errors.

Thanks,
--
Jaswinder Singh.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Chris Wilson on
On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput <jaswinderlinux(a)gmail.com> wrote:
> Hello,
>
> With latest git kernel, I am getting following DRM error and not
> getting XWindows :

[snip]

Hmm, there are still patches for capturing error state that haven't gone
upstream, shame on me.

That error is a secondary issue to the GPU hang that is being reported. If
it is a regression caused by a kernel update it would be very useful if
you could bisect to the erroneous commit.

--
Chris Wilson, Intel Open Source Technology Centre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jaswinder Singh Rajput on
Hello Chris,

On Tue, May 11, 2010 at 9:40 PM, Chris Wilson <chris(a)chris-wilson.co.uk> wrote:
> On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput <jaswinderlinux(a)gmail.com> wrote:
>> Hello,
>>
>> With latest git kernel, I am getting following DRM error and not
>> getting XWindows :
>
> [snip]
>
> Hmm, there are still patches for capturing error state that haven't gone
> upstream, shame on me.
>
> That error is a secondary issue to the GPU hang that is being reported. If
> it is a regression caused by a kernel update it would be very useful if
> you could bisect to the erroneous commit.
>

Earlier I was using Moblin, I switched to Fedora and start getting
this error. I have also tested different kernel versions but getting
same error, so I do not think this is a regression.

moblin dmesg : http://userweb.kernel.org/~jaswinder/moblin/dmesg-moblin_2633rc5.txt

Thanks,
--
Jaswinder Singh.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andrew Morton on
On Tue, 11 May 2010 17:10:53 +0100 Chris Wilson <chris(a)chris-wilson.co.uk> wrote:

> On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput <jaswinderlinux(a)gmail.com> wrote:
> > Hello,
> >
> > With latest git kernel, I am getting following DRM error and not
> > getting XWindows :
>
> [snip]
>
> Hmm, there are still patches for capturing error state that haven't gone
> upstream, shame on me.
>
> That error is a secondary issue to the GPU hang that is being reported. If
> it is a regression caused by a kernel update it would be very useful if
> you could bisect to the erroneous commit.

It helps if one reads the code and the trace...

i915_error_object_create() is using KM_USER0 from softirq context.
That's a bug, and a pretty serious one. If some innocent civilian is
writing highmem data to disk and this timer interrupt fires and trashes
his KM_USER0 slot, the disk contents will be corrupted.

Something like this...

--- a/drivers/gpu/drm/i915/i915_irq.c~a
+++ a/drivers/gpu/drm/i915/i915_irq.c
@@ -456,11 +456,15 @@ i915_error_object_create(struct drm_devi

for (page = 0; page < page_count; page++) {
void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC);
+ unsigned long flags;
+
if (d == NULL)
goto unwind;
- s = kmap_atomic(src_priv->pages[page], KM_USER0);
+ local_irq_save(flags);
+ s = kmap_atomic(src_priv->pages[page], KM_IRQ0);
memcpy(d, s, PAGE_SIZE);
- kunmap_atomic(s, KM_USER0);
+ kunmap_atomic(s, KM_IRQ0);
+ local_irq_restore(flags);
dst->pages[page] = d;
}
dst->page_count = page_count;
_

Please let's get a tested fix for this into 2.6.34.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jaswinder Singh Rajput on
Hello Andrew,

On Tue, May 11, 2010 at 8:18 PM, Andrew Morton
<akpm(a)linux-foundation.org> wrote:
> On Tue, 11 May 2010 17:10:53 +0100 Chris Wilson <chris(a)chris-wilson.co.uk> wrote:
>
>> On Tue, 11 May 2010 20:30:07 +0530, Jaswinder Singh Rajput <jaswinderlinux(a)gmail.com> wrote:
>> > Hello,
>> >
>> > With latest git kernel, I am getting following DRM error and not
>> > getting XWindows :
>>
>> [snip]
>>
>> Hmm, there are still patches for capturing error state that haven't gone
>> upstream, shame on me.
>>
>> That error is a secondary issue to the GPU hang that is being reported. If
>> it is a regression caused by a kernel update it would be very useful if
>> you could bisect to the erroneous commit.
>
> It helps if one reads the code and the trace...
>
> i915_error_object_create() is using KM_USER0 from softirq context.
> That's a bug, and a pretty serious one. �If some innocent civilian is
> writing highmem data to disk and this timer interrupt fires and trashes
> his KM_USER0 slot, the disk contents will be corrupted.
>
> Something like this...
>
> --- a/drivers/gpu/drm/i915/i915_irq.c~a
> +++ a/drivers/gpu/drm/i915/i915_irq.c
> @@ -456,11 +456,15 @@ i915_error_object_create(struct drm_devi
>
> � � � �for (page = 0; page < page_count; page++) {
> � � � � � � � �void *s, *d = kmalloc(PAGE_SIZE, GFP_ATOMIC);
> + � � � � � � � unsigned long flags;
> +
> � � � � � � � �if (d == NULL)
> � � � � � � � � � � � �goto unwind;
> - � � � � � � � s = kmap_atomic(src_priv->pages[page], KM_USER0);
> + � � � � � � � local_irq_save(flags);
> + � � � � � � � s = kmap_atomic(src_priv->pages[page], KM_IRQ0);
> � � � � � � � �memcpy(d, s, PAGE_SIZE);
> - � � � � � � � kunmap_atomic(s, KM_USER0);
> + � � � � � � � kunmap_atomic(s, KM_IRQ0);
> + � � � � � � � local_irq_restore(flags);
> � � � � � � � �dst->pages[page] = d;
> � � � �}
> � � � �dst->page_count = page_count;
> _
>
> Please let's get a tested fix for this into 2.6.34.
>

I tested your patch with latest linus git and it works, it fixes the
softirq error.

Now I am only getting DRM errors :

[ 42.276059] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
elapsed... GPU hung
[ 42.276398] render error detected, EIR: 0x00000000
[ 42.276460] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request
returns -5 (awaiting 18 at 17)

Thanks,
--
Jaswinder Singh.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/