From: Phillip Pi on
Hello,

The last few weeks, I noticed my old Linux/Debian box (2.6.30-2) keeps
getting random and rare high CPU due to Xorg and sometimes crashes. My
box, even via SSH2, felt slow. I checked the processes and saw:

$ w
11:53:37 up 6 days, 4:19, 3 users, load average: 6.26, 6.04, 6.19
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
ant tty1 Wed03 5days 9.79s 0.00s
/bin/bash /usr/bin/start
ant pts/3 [deleted IP addy]10:37 0.00s 0.10s 0.00s w
ant pts/4 foobar:S.0 05Jan10 10:30 16.00s 16.00s BitchX Ant...

$ top
top - 11:55:08 up 6 days, 4:20, 3 users, load average: 6.13, 5.91, 6.12
Tasks: 132 total, 3 running, 129 sleeping, 0 stopped, 0 zombie
Cpu0 : 6.9%us, 2.4%sy, 1.3%ni, 88.1%id, 0.6%wa, 0.1%hi, 0.6%si,
0.0%st
Mem: 2594748k total, 2168336k used, 426412k free, 64348k buffers
Swap: 2361512k total, 6452k used, 2355060k free, 1847020k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15529 root 20 0 101m 73m 2992 R 99.7 2.9 225:08.76 Xorg
20840 ant 20 0 2468 1180 892 R 0.2 0.0 0:00.01 top
1 root 20 0 2036 348 324 S 0.0 0.0 0:02.62 init
....

I tried to kill startx and Xorg processes, and my box froze (still
pingable, remote SSH2 connection frozen but not connectable, and IRC
connections lost). I have tried recompling the latest stable NVIDIA
(from nvidia.com) driver for GeForce FX 5200 (AGP), redoing my
/etc/X11/xorg.conf with NVIDIA's script help, disabling Compiz, etc.

I checked logs. In /var/log/X11, I saw a bunch of:
(EE) NVIDIA(0): Error recovery failed.
(EE) NVIDIA(0): *** Aborting ***
(II) NVIDIA(0): Initialized AGP GART.

This sounds bad? What does that mean? End of dmesg showed these lines:
....
[72619.360521] NVRM: loading NVIDIA UNIX x86 Kernel Module 173.14.22
Sun Nov 8 20:26:31 PST 2009
....
[72833.815914] NVRM: loading NVIDIA UNIX x86 Kernel Module 173.14.22
Sun Nov 8 20:26:31 PST 2009
[72833.947202] agpgart-amd64 0000:00:00.0: AGP 3.5 bridge
[72833.947218] agpgart-amd64 0000:00:00.0: putting AGP V3 device into 8x
mode
[72833.947284] nvidia 0000:01:00.0: putting AGP V3 device into 8x mode
....
[99432.775115] NVRM: Xid (0001:00): 6, PE0002 06bc 3f800000 0008fd14
00000000 3f800000
[99469.794940] NVRM: Xid (0001:00): 6, PE0002 06bc 3f800000 0008fd14
00000000 3f800000
[99469.836150] NVRM: Xid (0001:00): 7, Ch 00000002 M 00000a64 D 00000000
intr 00010000
[224756.205022] NVRM: Xid (0001:00): 6, PE0002 06bc 3f800000 0008fd14
00000000 3f800000
[224756.251066] NVRM: Xid (0001:00): 7, Ch 00000002 M 0000069c D
471229dd intr 00010000
[225085.201829] NVRM: Xid (0001:00): 6, PE0002 0000 40000000 0010a7bc
c0000000 3f800000
[225085.246217] NVRM: Xid (0001:00): 7, Ch 00000002 M 00001d7c D
ffff0000 intr 00010000
....
[526347.572029] NVRM: Xid (0001:00): 8, Channel 00000000

I posted more complete and other logs at, including sensors -f:
http://pastie.org/774029 ... My old Debian machine specifications can be
found in http://alpha.zimage.com/~ant/antfarm/about/computers.txt
(Secondary/Backup Computer section).

Any ideas? I do keep my Debian updated daily with apt-get update and
upgrade commands. I do not recall any recent X changes.

Thank you in advance. :)
--
Phillip Pi
Senior Software Quality Assurance Analyst
Partner Engineering/Internet Service Provider/Symantec Online Services,
Consumer Business Unit
Symantec Corporation
www.symantec.com
-----------------------------------------------------
Email: phillip_pi(a)symantec.comSYMC (remove SYMC to reply by e-mail)
-----------------------------------------------------
Please do NOT e-mail me for technical support. DISCLAIMER: The views
expressed in this posting are mine, and do not necessarily reflect the
views of my employer. Thank you.
From: Phillip Pi on
http://pastebin.ca/1747442 (whole dmesg) if needed.

On 1/11/2010 12:33 PM PT, Phillip Pi typed:

> The last few weeks, I noticed my old Linux/Debian box (2.6.30-2) keeps
> getting random and rare high CPU due to Xorg and sometimes crashes. My
> box, even via SSH2, felt slow. I checked the processes and saw:
>
> $ w
> 11:53:37 up 6 days, 4:19, 3 users, load average: 6.26, 6.04, 6.19
> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
> ant tty1 Wed03 5days 9.79s 0.00s /bin/bash /usr/bin/start
> ant pts/3 [deleted IP addy]10:37 0.00s 0.10s 0.00s w
> ant pts/4 foobar:S.0 05Jan10 10:30 16.00s 16.00s BitchX Ant...
>
> $ top
> top - 11:55:08 up 6 days, 4:20, 3 users, load average: 6.13, 5.91, 6.12
> Tasks: 132 total, 3 running, 129 sleeping, 0 stopped, 0 zombie
> Cpu0 : 6.9%us, 2.4%sy, 1.3%ni, 88.1%id, 0.6%wa, 0.1%hi, 0.6%si, 0.0%st
> Mem: 2594748k total, 2168336k used, 426412k free, 64348k buffers
> Swap: 2361512k total, 6452k used, 2355060k free, 1847020k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 15529 root 20 0 101m 73m 2992 R 99.7 2.9 225:08.76 Xorg
> 20840 ant 20 0 2468 1180 892 R 0.2 0.0 0:00.01 top
> 1 root 20 0 2036 348 324 S 0.0 0.0 0:02.62 init
> ...
>
> I tried to kill startx and Xorg processes, and my box froze (still
> pingable, remote SSH2 connection frozen but not connectable, and IRC
> connections lost). I have tried recompling the latest stable NVIDIA
> (from nvidia.com) driver for GeForce FX 5200 (AGP), redoing my
> /etc/X11/xorg.conf with NVIDIA's script help, disabling Compiz, etc.
>
> I checked logs. In /var/log/X11, I saw a bunch of:
> (EE) NVIDIA(0): Error recovery failed.
> (EE) NVIDIA(0): *** Aborting ***
> (II) NVIDIA(0): Initialized AGP GART.
>
> This sounds bad? What does that mean? End of dmesg showed these lines:
> ...
> [72619.360521] NVRM: loading NVIDIA UNIX x86 Kernel Module 173.14.22 Sun
> Nov 8 20:26:31 PST 2009
> ...
> [72833.815914] NVRM: loading NVIDIA UNIX x86 Kernel Module 173.14.22 Sun
> Nov 8 20:26:31 PST 2009
> [72833.947202] agpgart-amd64 0000:00:00.0: AGP 3.5 bridge
> [72833.947218] agpgart-amd64 0000:00:00.0: putting AGP V3 device into 8x
> mode
> [72833.947284] nvidia 0000:01:00.0: putting AGP V3 device into 8x mode
> ...
> [99432.775115] NVRM: Xid (0001:00): 6, PE0002 06bc 3f800000 0008fd14
> 00000000 3f800000
> [99469.794940] NVRM: Xid (0001:00): 6, PE0002 06bc 3f800000 0008fd14
> 00000000 3f800000
> [99469.836150] NVRM: Xid (0001:00): 7, Ch 00000002 M 00000a64 D 00000000
> intr 00010000
> [224756.205022] NVRM: Xid (0001:00): 6, PE0002 06bc 3f800000 0008fd14
> 00000000 3f800000
> [224756.251066] NVRM: Xid (0001:00): 7, Ch 00000002 M 0000069c D
> 471229dd intr 00010000
> [225085.201829] NVRM: Xid (0001:00): 6, PE0002 0000 40000000 0010a7bc
> c0000000 3f800000
> [225085.246217] NVRM: Xid (0001:00): 7, Ch 00000002 M 00001d7c D
> ffff0000 intr 00010000
> ...
> [526347.572029] NVRM: Xid (0001:00): 8, Channel 00000000
>
> I posted more complete and other logs at, including sensors -f:
> http://pastie.org/774029 ... My old Debian machine specifications can be
> found in http://alpha.zimage.com/~ant/antfarm/about/computers.txt
> (Secondary/Backup Computer section).
>
> Any ideas? I do keep my Debian updated daily with apt-get update and
> upgrade commands. I do not recall any recent X changes.
--
Phillip Pi
Senior Software Quality Assurance Analyst
Partner Engineering/Internet Service Provider/Symantec Online Services,
Consumer Business Unit
Symantec Corporation
www.symantec.com
-----------------------------------------------------
Email: phillip_pi(a)symantec.comSYMC (remove SYMC to reply by e-mail)
-----------------------------------------------------
Please do NOT e-mail me for technical support. DISCLAIMER: The views
expressed in this posting are mine, and do not necessarily reflect the
views of my employer. Thank you.
From: thunder8 on
From: Phillip Pi <phillip_pi(a)symantec.comSYMC>
Date: Mon, 11 Jan 2010 13:44:25 -0800
> http://pastebin.ca/1747442 (whole dmesg) if needed.
>
> On 1/11/2010 12:33 PM PT, Phillip Pi typed:
>
>> The last few weeks, I noticed my old Linux/Debian box (2.6.30-2) keeps
>> getting random and rare high CPU due to Xorg and sometimes crashes. My
>> box, even via SSH2, felt slow. I checked the processes and saw:
>>
>> I checked logs. In /var/log/X11, I saw a bunch of:
>> (EE) NVIDIA(0): Error recovery failed.
>> (EE) NVIDIA(0): *** Aborting ***
>> (II) NVIDIA(0): Initialized AGP GART.
>>
>> This sounds bad? What does that mean? End of dmesg showed these lines:
>> ...
The problem is, nobody knows what's going on inside the binary nvidia
module except Nvidia. That's why it's called 'closed source'. So the
best option is to either use the opensource driver, or take your
problems to Nvidia.

I realize this may sound harsh, but that is one of the problems of
closed source, after all.

Kind regards,
Jurriaan
--
prachtige geschenken, exclusieve cadeaus: handgemaakte houten schalen

http://www.houtenschalen.nl
From: Phillip Pi on
On 1/12/2010 1:52 AM PT, thunder8 typed:

>>> The last few weeks, I noticed my old Linux/Debian box (2.6.30-2) keeps
>>> getting random and rare high CPU due to Xorg and sometimes crashes. My
>>> box, even via SSH2, felt slow. I checked the processes and saw:
>>>
>>> I checked logs. In /var/log/X11, I saw a bunch of:
>>> (EE) NVIDIA(0): Error recovery failed.
>>> (EE) NVIDIA(0): *** Aborting ***
>>> (II) NVIDIA(0): Initialized AGP GART.
>>>
>>> This sounds bad? What does that mean? End of dmesg showed these lines:
>>> ...

>> http://pastebin.ca/1747442 (whole dmesg) if needed.

> The problem is, nobody knows what's going on inside the binary nvidia
> module except Nvidia. That's why it's called 'closed source'. So the
> best option is to either use the opensource driver, or take your
> problems to Nvidia.
>
> I realize this may sound harsh, but that is one of the problems of
> closed source, after all.

Even NVIDIA folks don't seem to know so far due to lack of replies:
http://www.nvnews.net/vbulletin/showthread.php?p=2162526 ... :(
--
Phillip Pi
Senior Software Quality Assurance Analyst
Partner Engineering/Internet Service Provider/Symantec Online Services,
Consumer Business Unit
Symantec Corporation
www.symantec.com
-----------------------------------------------------
Email: phillip_pi(a)symantec.comSYMC (remove SYMC to reply by e-mail)
-----------------------------------------------------
Please do NOT e-mail me for technical support. DISCLAIMER: The views
expressed in this posting are mine, and do not necessarily reflect the
views of my employer. Thank you.
From: Ant on
I got another one again about 30 minutes ago while using it. I tried
disabling AMD's Cool'n'Quiet in CMOS and powernow in Debian/Linux. They
did not fix it.

I noticed a pattern that I didn't mentioned before. If I am using the
computer and the issue comes up, it shows a screen blink and then the
CPU goes up and X stops responding. Some logs bits:

dmesg:
....
[526179.772020] NVRM: Xid (0001:00): 8, Channel 00000000
[526187.923984] Clocksource tsc unstable (delta = 4686847433 ns)
[526195.932026] NVRM: Xid (0001:00): 8, Channel 00000000
[526207.944026] NVRM: Xid (0001:00): 8, Channel 00000000
[526219.956030] NVRM: Xid (0001:00): 8, Channel 00000000
[526231.972025] NVRM: Xid (0001:00): 8, Channel 00000000
[526243.984030] NVRM: Xid (0001:00): 8, Channel 00000000
[526255.996025] NVRM: Xid (0001:00): 8, Channel 00000000
[526268.008028] NVRM: Xid (0001:00): 8, Channel 00000000

$ sensors -f
k8temp-pci-00c3
Adapter: PCI adapter
Core0 Temp: +134.6�F

GKrellM showed frozen state with:
Vcor1 = 1.50
+3.3V = 3.33
+12V = 11.3
-12V = 2.11
-5V = 5.10
V5SB = 5.54
VBat = 3.17

I was able to use Terminal very slowly via an existing SSH2 connection:
$ sensors -f
w83697hf-isa-0290
Adapter: ISA adapter
in0: +1.50 V (min = +0.19 V, max = +0.13 V) ALARM
in2: +3.33 V (min = +0.43 V, max = +0.02 V) ALARM
in3: +3.01 V (min = +0.02 V, max = +0.13 V) ALARM
in4: +2.96 V (min = +0.06 V, max = +2.86 V) ALARM
in5: +3.30 V (min = +0.06 V, max = +2.24 V) ALARM
in6: +4.08 V (min = +2.56 V, max = +0.00 V) ALARM
in7: +3.30 V (min = +0.08 V, max = +0.03 V) ALARM
in8: +3.17 V (min = +0.00 V, max = +1.28 V) ALARM
fan1: 0 RPM (min = 73 RPM, div = 128) ALARM
fan2: 2410 RPM (min = 2109 RPM, div = 4)
temp1: +91.4�F (high = +172.4°F, hyst = +105.8°F) sensor =
thermistor
temp2: +128.3�F (high = +176.0°F, hyst = +167.0°F) sensor =
thermistor
beep_enable:enabled

Do those power flows look correct? It is with a new Antec Basiq BP550
Plus 550W Continuous Power ATX12V V2.2 Modular Active PFC power supply
too unless it is defected? Or maybe the GeForce FX is bad now?

$ top
top - 22:44:31 up 6 days, 2:14, 3 users, load average: 7.72, 4.93, 2.32
Tasks: 142 total, 3 running, 139 sleeping, 0 stopped, 0 zombie
Cpu0 : 4.8%us, 1.4%sy, 0.7%ni, 92.2%id, 0.8%wa, 0.1%hi, 0.1%si,
0.0%st
Mem: 2594748k total, 1474872k used, 1119876k free, 227992k buffers
Swap: 2361512k total, 5820k used, 2355692k free, 761668k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

565 root 20 0 101m 66m 5420 R 99.9 2.6 59:05.45 Xorg

14954 ant 20 0 2468 1180 892 R 0.3 0.0 0:00.02 top

1 root 20 0 2036 368 316 S 0.0 0.0 0:02.06 init
....

I looked at my ~/.xsession-errors file to the end:
Xsession: X session started for ant at Wed Jan 13 06:24:06 PST 2010
startkde: Starting up...
kbuildsycoca running...
/tmp/kde-ant/kcminitizrDCa.tmp:1:2: error: invalid preprocessing
directive #http
Gtk-Message: Failed to load module "canberra-gtk-module":
libcanberra-gtk-module.so: cannot open shared object file: No such file
or directory
....
(seamonkey-bin:26879): Gdk-WARNING **: XID collision, trouble ahead
....
X Error: BadWindow (invalid Window parameter) 3
Major opcode: 19
Minor opcode: 0
Resource id: 0x1a6a36e
X Error: BadWindow (invalid Window parameter) 3
Major opcode: 19
Minor opcode: 0
Resource id: 0x3400008
X Error: BadWindow (invalid Window parameter) 3
Major opcode: 19
Minor opcode: 0
Resource id: 0x3017201
X Error: BadWindow (invalid Window parameter) 3
Major opcode: 19
Minor opcode: 0
Resource id: 0x3000024
kwin: X_SetInputFocus(0x282d216): BadMatch (invalid parameter attributes)
X Error: BadWindow (invalid Window parameter) 3
Major opcode: 19
Minor opcode: 0
Resource id: 0x3200008
I saw a bunch of "(seamonkey-bin:#): Gdk-WARNING **: XID collision,
trouble ahead" lines. I did a quick search in Google and saw Firefox
users having them too, so I assume this is unrelated to my crashes?

http://pastie.org/782807 for /var/log/X11/Xorg.0.log since the forum
said my reply was too long. :P

I just tried another idea was to uninstall NVIDIA drivers with
/usr/bin/nvidia-uninstall (never did that in the past), recompiled,
reinstall, and restart X. I wonder if that will fix my issue.


On 1/11/2010 1:44 PM PT, Phillip Pi typed:

> http://pastebin.ca/1747442 (whole dmesg) if needed.
>
> On 1/11/2010 12:33 PM PT, Phillip Pi typed:
>
>> The last few weeks, I noticed my old Linux/Debian box (2.6.30-2) keeps
>> getting random and rare high CPU due to Xorg and sometimes crashes. My
>> box, even via SSH2, felt slow. I checked the processes and saw:
>>
>> $ w
>> 11:53:37 up 6 days, 4:19, 3 users, load average: 6.26, 6.04, 6.19
>> USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
>> ant tty1 Wed03 5days 9.79s 0.00s /bin/bash /usr/bin/start
>> ant pts/3 [deleted IP addy]10:37 0.00s 0.10s 0.00s w
>> ant pts/4 foobar:S.0 05Jan10 10:30 16.00s 16.00s BitchX Ant...
>>
>> $ top
>> top - 11:55:08 up 6 days, 4:20, 3 users, load average: 6.13, 5.91, 6.12
>> Tasks: 132 total, 3 running, 129 sleeping, 0 stopped, 0 zombie
>> Cpu0 : 6.9%us, 2.4%sy, 1.3%ni, 88.1%id, 0.6%wa, 0.1%hi, 0.6%si, 0.0%st
>> Mem: 2594748k total, 2168336k used, 426412k free, 64348k buffers
>> Swap: 2361512k total, 6452k used, 2355060k free, 1847020k cached
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 15529 root 20 0 101m 73m 2992 R 99.7 2.9 225:08.76 Xorg
>> 20840 ant 20 0 2468 1180 892 R 0.2 0.0 0:00.01 top
>> 1 root 20 0 2036 348 324 S 0.0 0.0 0:02.62 init
>> ...
>>
>> I tried to kill startx and Xorg processes, and my box froze (still
>> pingable, remote SSH2 connection frozen but not connectable, and IRC
>> connections lost). I have tried recompling the latest stable NVIDIA
>> (from nvidia.com) driver for GeForce FX 5200 (AGP), redoing my
>> /etc/X11/xorg.conf with NVIDIA's script help, disabling Compiz, etc.
>>
>> I checked logs. In /var/log/X11, I saw a bunch of:
>> (EE) NVIDIA(0): Error recovery failed.
>> (EE) NVIDIA(0): *** Aborting ***
>> (II) NVIDIA(0): Initialized AGP GART.
>>
>> This sounds bad? What does that mean? End of dmesg showed these lines:
>> ...
>> [72619.360521] NVRM: loading NVIDIA UNIX x86 Kernel Module 173.14.22 Sun
>> Nov 8 20:26:31 PST 2009
>> ...
>> [72833.815914] NVRM: loading NVIDIA UNIX x86 Kernel Module 173.14.22 Sun
>> Nov 8 20:26:31 PST 2009
>> [72833.947202] agpgart-amd64 0000:00:00.0: AGP 3.5 bridge
>> [72833.947218] agpgart-amd64 0000:00:00.0: putting AGP V3 device into 8x
>> mode
>> [72833.947284] nvidia 0000:01:00.0: putting AGP V3 device into 8x mode
>> ...
>> [99432.775115] NVRM: Xid (0001:00): 6, PE0002 06bc 3f800000 0008fd14
>> 00000000 3f800000
>> [99469.794940] NVRM: Xid (0001:00): 6, PE0002 06bc 3f800000 0008fd14
>> 00000000 3f800000
>> [99469.836150] NVRM: Xid (0001:00): 7, Ch 00000002 M 00000a64 D 00000000
>> intr 00010000
>> [224756.205022] NVRM: Xid (0001:00): 6, PE0002 06bc 3f800000 0008fd14
>> 00000000 3f800000
>> [224756.251066] NVRM: Xid (0001:00): 7, Ch 00000002 M 0000069c D
>> 471229dd intr 00010000
>> [225085.201829] NVRM: Xid (0001:00): 6, PE0002 0000 40000000 0010a7bc
>> c0000000 3f800000
>> [225085.246217] NVRM: Xid (0001:00): 7, Ch 00000002 M 00001d7c D
>> ffff0000 intr 00010000
>> ...
>> [526347.572029] NVRM: Xid (0001:00): 8, Channel 00000000
>>
>> I posted more complete and other logs at, including sensors -f:
>> http://pastie.org/774029 ... My old Debian machine specifications can be
>> found in http://alpha.zimage.com/~ant/antfarm/about/computers.txt
>> (Secondary/Backup Computer section).
>>
>> Any ideas? I do keep my Debian updated daily with apt-get update and
>> upgrade commands. I do not recall any recent X changes.
--
"All the best work is done the way that ants do things -- by tiny but
untiring and regular additions." --Lafcadio Hearn
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: philpi(a)earthlink.netANT
( ) or ANTant(a)zimage.com