Prev: Any monthly web hosting companies?
Next: "TLB parity error in virtual array; TLB error 'instruction"?(acpidump)
From: Yousuf Khan on 12 Mar 2010 11:43 ANTant(a)zimage.com wrote: > So far no errors (no TLB errors and crashes within eight hours. I will > keep it running for another 3-4 hours and then I am going to killall > those processes so I can use the machine. > > It seems like the issue only comes up if my box is not idled? What the > frak? Just a guess here, the version of Linux you are running. I don't remember what the version numbers were, but are they somewhat out of date? Have you considered an update to the kernel? Yousuf Khan
From: ANTant on 12 Mar 2010 15:32 >> So far no errors (no TLB errors and crashes within eight hours. I will >> keep it running for another 3-4 hours and then I am going to killall >> those processes so I can use the machine. >> >> It seems like the issue only comes up if my box is not idled? What the >> frak? > > Just a guess here, the version of Linux you are running. I don't > remember what the version numbers were, but are they somewhat out of > date? Have you considered an update to the kernel? I had Debian's Kernel 2.6.30 and am currently using 2.6.32. Both did not make differences. :( I just need to reproduce this problem out of my Debian via a LiveCD, memtest86, something! -- "We are anthill men upon an anthill world." --Ray Bradbury /\___/\ / /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net \ _ / Please remove ANT if replying by e-mail. ( )
From: Robert Redelmeier on 12 Mar 2010 16:46 ANTant(a)zimage.com wrote in part: > Too much trouble as in what? Swap partition going crazy > like when I ran 40 processes? :D Thrashing (continuous swapping in/out) is an obvious sign of an overloaded system. Useful to see if you haven't seen it before. > BTW, so far no crashes and errors with 33 processes. I > think it has been over six hours. I can abort them now if > you think this is enough. :P Probably is. -- Robert
From: ANTant on 12 Mar 2010 17:13 >> Too much trouble as in what? Swap partition going crazy >> like when I ran 40 processes? :D > > Thrashing (continuous swapping in/out) is an obvious > sign of an overloaded system. Useful to see if you > haven't seen it before. Yeah, never seen it with 40 cpuburn processes. ;) >> BTW, so far no crashes and errors with 33 processes. I >> think it has been over six hours. I can abort them now if >> you think this is enough. :P > > Probably is. OK, I will abort it after this post. FYI for almost this 33 processes' 8.25 hours test: $ top top - 14:10:29 up 8:25, 1 user, load average: 32.89, 32.48, 32.54 Tasks: 176 total, 34 running, 142 sleeping, 0 stopped, 0 zombie Cpu0 : 0.3%us, 0.7%sy, 99.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 75.1%ni, 24.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2595064k total, 2347444k used, 247620k free, 3380k buffers Swap: 2361512k total, 181152k used, 2180360k free, 19972k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4135 ant 39 19 65632 64m 4 R 18 2.5 29:46.00 burnMMX 4170 ant 39 19 65632 64m 4 R 16 2.5 29:41.27 burnMMX 4171 ant 39 19 65632 64m 4 R 8 2.5 29:46.18 burnMMX 4148 ant 39 19 65632 64m 4 R 8 2.5 29:44.35 burnMMX 4139 ant 39 19 65632 64m 4 R 8 2.5 29:59.29 burnMMX 4149 ant 39 19 65632 64m 4 R 7 2.5 29:47.52 burnMMX 4152 ant 39 19 65632 64m 4 R 7 2.5 29:41.53 burnMMX 4165 ant 39 19 65632 64m 4 R 7 2.5 29:39.02 burnMMX 4166 ant 39 19 65632 64m 4 R 7 2.5 29:47.40 burnMMX 4167 ant 39 19 65632 64m 4 R 7 2.5 29:56.93 burnMMX 4186 ant 39 19 65632 64m 4 R 7 2.5 29:52.83 burnMMX 4192 ant 39 19 65632 64m 4 R 7 2.5 29:51.10 burnMMX 4145 ant 39 19 65632 64m 4 R 7 2.5 29:46.08 burnMMX 4153 ant 39 19 65632 64m 4 R 7 2.5 29:37.84 burnMMX 4187 ant 39 19 65632 64m 4 R 7 2.5 29:24.71 burnMMX 4164 ant 39 19 65632 64m 4 R 5 2.5 29:25.25 burnMMX 4169 ant 39 19 65632 64m 4 R 5 2.5 29:27.45 burnMMX 4196 ant 39 19 65632 64m 4 R 5 2.5 29:28.98 burnMMX 4162 ant 39 19 65632 64m 4 R 5 2.5 29:29.86 burnMMX 4163 ant 39 19 65632 64m 4 R 5 2.5 29:40.07 burnMMX 4191 ant 39 19 65632 64m 4 R 5 2.5 29:43.16 burnMMX 4193 ant 39 19 65632 64m 4 R 5 2.5 29:30.78 burnMMX 4195 ant 39 19 65632 64m 4 R 4 2.5 29:47.66 burnMMX 4151 ant 39 19 65632 64m 4 R 3 2.5 29:29.78 burnMMX 4194 ant 39 19 65632 64m 4 R 3 2.5 29:48.43 burnMMX 4142 ant 39 19 65632 64m 4 R 3 2.5 29:45.24 burnMMX 4150 ant 39 19 65632 64m 4 R 3 2.5 29:53.71 burnMMX 4168 ant 39 19 65632 64m 4 R 3 2.5 29:36.53 burnMMX 4188 ant 39 19 65632 64m 4 R 3 2.5 29:32.41 burnMMX 4189 ant 39 19 65632 64m 4 R 3 2.5 29:11.58 burnMMX 4190 ant 39 19 65632 64m 4 R 3 2.5 29:24.99 burnMMX 4197 ant 39 19 65632 64m 4 R 3 2.5 29:39.87 burnMMX 4198 ant 39 19 65632 64m 4 R 3 2.5 29:37.67 burnMMX 5108 ant 40 0 2464 1204 888 R 0 0.0 0:00.04 top 1 root 40 0 2036 152 132 S 0 0.0 0:00.82 init 2 root 40 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0 4 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1 7 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1 8 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1 9 root 20 0 0 0 0 S 0 0.0 0:00.00 events/0 10 root 20 0 0 0 0 S 0 0.0 0:00.01 events/1 11 root 20 0 0 0 0 S 0 0.0 0:00.00 cpuset 12 root 20 0 0 0 0 S 0 0.0 0:00.00 khelper 13 root 20 0 0 0 0 S 0 0.0 0:00.00 netns 14 root 20 0 0 0 0 S 0 0.0 0:00.00 async/mgr 15 root 20 0 0 0 0 S 0 0.0 0:00.00 pm $ sensors -f acpitz-virtual-0 Adapter: Virtual device temp1: +71.2?F (crit = +206.2?F) k8temp-pci-00c3 Adapter: PCI adapter Core0 Temp: +120.2?F Core1 Temp: +91.4?F I noticed that swap partition usage grew from this early morning. :) -- "We are anthill men upon an anthill world." --Ray Bradbury /\___/\ / /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net \ _ / Please remove ANT if replying by e-mail. ( )
From: ANTant on 12 Mar 2010 18:37
> OK, I will abort it after this post. FYI for almost this 33 processes' > 8.25 hours test: > > I noticed that swap partition usage grew from this early morning. :) Bah, the error came back again after my tests: dmesg: [32399.988020] Machine check events logged From /var/log/messages: Mar 12 14:45:16 foobar kernel: [32399.988020] Machine check events logged Mar 12 14:45:16 foobar mcelog: HARDWARE ERROR. This is *NOT* a software problem! Mar 12 14:45:16 foobar mcelog: Please contact your hardware vendor Mar 12 14:45:16 foobar mcelog: MCE 0 Mar 12 14:45:16 foobar mcelog: CPU 1 1 instruction cache Mar 12 14:45:16 foobar mcelog: ADDR c11b6ff0 Mar 12 14:45:16 foobar mcelog: TIME 1268433916 Fri Mar 12 14:45:16 2010 Mar 12 14:45:16 foobar mcelog: TLB parity error in virtual array Mar 12 14:45:16 foobar mcelog: TLB error 'instruction transaction, level 1' Mar 12 14:45:16 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 Mar 12 14:45:16 foobar mcelog: MCGCAP 105 APICID 1 SOCKETID 0 Mar 12 14:45:16 foobar mcelog: CPUID Vendor AMD Family 15 Model 43 :( -- "We are anthill men upon an anthill world." --Ray Bradbury /\___/\ / /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net \ _ / Please remove ANT if replying by e-mail. ( ) |