Prev: Any monthly web hosting companies?
Next: "TLB parity error in virtual array; TLB error 'instruction"?(acpidump)
From: Ant on 13 Mar 2010 16:34 On 3/13/2010 10:51 AM PT, Ant typed: >>>> OTOH, I would rmmod cpufreq* because they get loaded in kernel space. >> >> Just make sure they're not running. Look at the >> module dependences with depmod or moddep, then >> rmmod them in the correct order. > > # lsmod |grep cpufreq > cpufreq_powersave 602 0 > cpufreq_userspace 1444 0 > cpufreq_stats 1940 0 > cpufreq_conservative 4018 0 > > For kicks, I removed these four cpufreq modules in that order to see if > I still get errors and/or kernel panics. NOPE! Still happening: Mar 13 12:36:53 foobar mcelog: HARDWARE ERROR. This is *NOT* a software problem! Mar 13 12:36:53 foobar mcelog: Please contact your hardware vendor Mar 13 12:36:53 foobar mcelog: MCE 0 Mar 13 12:36:53 foobar mcelog: CPU 1 1 instruction cache Mar 13 12:36:53 foobar mcelog: ADDR c11b6ff0 Mar 13 12:36:53 foobar mcelog: TIME 1268512613 Sat Mar 13 12:36:53 2010 Mar 13 12:36:53 foobar mcelog: TLB parity error in virtual array Mar 13 12:36:53 foobar mcelog: TLB error 'instruction transaction, level 1' Mar 13 12:36:53 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 Mar 13 12:36:53 foobar mcelog: MCGCAP 105 APICID 1 SOCKETID 0 Mar 13 12:36:53 foobar mcelog: CPUID Vendor AMD Family 15 Model 43 Mar 13 12:36:53 foobar kernel: [45599.988029] Machine check events logged :( -- "Don't be no Ant-Man. An Ant-Man has very low horizons." --Forrest Gump /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: philpi(a)earthlink.netANT ( ) or ANTant(a)zimage.com Ant is currently not listening to any songs on his home computer.
From: Ant on 13 Mar 2010 16:41 On 3/13/2010 1:32 PM PT, Robert Redelmeier typed: >> # lsmod |grep cpufreq >> cpufreq_powersave 602 0 >> cpufreq_userspace 1444 0 >> cpufreq_stats 1940 0 >> cpufreq_conservative 4018 0 > > The 0 is good because these modules are independant OK cool. >> Is Kernel autoloading these modules even if I don't use >> AMD's Cool'n'Quiet and powernow? > > I don't think so -- such dependencies should show on the right > (iso 0) for some other mod. But daemons do strange things, > and acpid is one of the strangest. Hmm, lsmod |grep acpid showed nothing. -- "The antics begin!" --SimAnt Game /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: philpi(a)earthlink.netANT ( ) or ANTant(a)zimage.com Ant is currently not listening to any songs on his home computer.
From: Robert Redelmeier on 14 Mar 2010 00:18 Ant <ant(a)zimage.comant> wrote in part: > Hmm, lsmod |grep acpid showed nothing. It won't -- acpid is a daemon, not a module. It shows on the process taks list (ps/top), not lsmod . But it is unlikely to be the cause if cpufreq modules aren't loaded. -- Robert
From: Robert Redelmeier on 14 Mar 2010 00:28 Robert Redelmeier <redelm(a)ev1.net.invalid> wrote in part: >>>> Bah, the error came back again after my tests: >>>> >>>> dmesg: >>>> [32399.988020] Machine check events logged >>>> >>>> From /var/log/messages: >>>> Mar 12 14:45:16 foobar kernel: [32399.988020] Machine check events logged >>>> Mar 12 14:45:16 foobar mcelog: HARDWARE ERROR. This is *NOT* a software problem! >>>> Mar 12 14:45:16 foobar mcelog: Please contact your hardware vendor >>>> Mar 12 14:45:16 foobar mcelog: MCE 0 >>>> Mar 12 14:45:16 foobar mcelog: CPU 1 1 instruction cache >>>> Mar 12 14:45:16 foobar mcelog: ADDR c11b6ff0 >>>> Mar 12 14:45:16 foobar mcelog: TIME 1268433916 Fri Mar 12 14:45:16 2010 >>>> Mar 12 14:45:16 foobar mcelog: TLB parity error in virtual array >>>> Mar 12 14:45:16 foobar mcelog: TLB error 'instruction transaction, level 1' >>>> Mar 12 14:45:16 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 >>>> Mar 12 14:45:16 foobar mcelog: MCGCAP 105 APICID 1 SOCKETID 0 >>>> Mar 12 14:45:16 foobar mcelog: CPUID Vendor AMD Family 15 Model 43 >>> >>> Noting the addr is in kernel space and the instruction cache, >>> this is going to take much ingenuity to replicate :( >> >> You just gave me an idea: # cat /var/log/messages |grep MCGSTATUS >> Mar 6 08:52:09 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 >> Mar 6 08:52:09 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 >> Mar 6 08:52:09 foobar mcelog: STATUS 9400000000010011 MCGSTATUS 0 > > [snip] no, this is the status word where the bits have meanings. > The ADDR line tells you where the error occurred. 0xC+ is kernel space > on most kernels. Having a better look through your logs, I see this addr is very common (almost all errs are at this addr). Aren't you curious about the instruction that produced the errors? /boot/System.map should contain the addr of all kernel fns, and there should be some way to lookup modules. -- Robert
From: Ant on 14 Mar 2010 04:40
On 3/13/2010 9:28 PM PT, Robert Redelmeier typed: > Having a better look through your logs, I see this addr is > very common (almost all errs are at this addr). Aren't > you curious about the instruction that produced the errors? > /boot/System.map should contain the addr of all kernel fns, > and there should be some way to lookup modules. I did a "cat /var/log/messages |grep ADDR" and found these addresses: c104e3f0 c106e8c0 c11b6ff0 (most common) But none of them matched to /boot/System.map-2.6.32-trunk-686. Here are close addresses around them for each one: c104e2f9 T tick_handle_periodic c104e360 T tick_get_broadcast_device c1063e1b t stop_cpu c1063ec6 T stop_machine_destroy c11b6fb8 T acpi_pm_read_verified c11b6ffc t acpi_pm_read For the common one, it is ACPI. Hmm! # locate acpi_pm /usr/src/linux-headers-2.6.30-2-common/include/linux/acpi_pmtmr.h /usr/src/linux-headers-2.6.32-trunk-common/include/linux/acpi_pmtmr.h # more /usr/src/linux-headers-2.6.32-trunk-common/include/linux/acpi_pmtmr.h #ifndef _ACPI_PMTMR_H_ #define _ACPI_PMTMR_H_ #include <linux/clocksource.h> /* Number of PMTMR ticks expected during calibration run */ #define PMTMR_TICKS_PER_SEC 3579545 /* limit it to 24 bits */ #define ACPI_PM_MASK CLOCKSOURCE_MASK(24) /* Overrun value */ #define ACPI_PM_OVRRUN (1<<24) #ifdef CONFIG_X86_PM_TIMER extern u32 acpi_pm_read_verified(void); extern u32 pmtmr_ioport; static inline u32 acpi_pm_read_early(void) { if (!pmtmr_ioport) return 0; /* mask the output to 24 bits */ return acpi_pm_read_verified() & ACPI_PM_MASK; } extern void pmtimer_wait(unsigned); #else static inline u32 acpi_pm_read_early(void) { return 0; } #endif #endif Hmm, what is using ACPI then? # lsof |grep acpi kacpid 22 root cwd DIR 3,1 1024 2 / kacpid 22 root rtd DIR 3,1 1024 2 / kacpid 22 root txt unknown /proc/22/exe kacpi_not 23 root cwd DIR 3,1 1024 2 / kacpi_not 23 root rtd DIR 3,1 1024 2 / kacpi_not 23 root txt unknown /proc/23/exe kacpi_hot 24 root cwd DIR 3,1 1024 2 / kacpi_hot 24 root rtd DIR 3,1 1024 2 / kacpi_hot 24 root txt unknown /proc/24/exe acpid 1986 root cwd DIR 3,1 1024 2 / acpid 1986 root rtd DIR 3,1 1024 2 / acpid 1986 root txt REG 3,6 34684 353719 /usr/sbin/acpid acpid 1986 root mem REG 3,1 1331496 14245 /lib/libc-2.10.2.so acpid 1986 root mem REG 3,1 117416 14243 /lib/ld-2.10.2.so acpid 1986 root 0u CHR 1,3 0t0 1344 /dev/null acpid 1986 root 1u CHR 1,3 0t0 1344 /dev/null acpid 1986 root 2u CHR 1,3 0t0 1344 /dev/null acpid 1986 root 3r CHR 13,64 0t0 4005 /dev/input/event0 acpid 1986 root 4r CHR 13,65 0t0 4012 /dev/input/event1 acpid 1986 root 5r CHR 13,66 0t0 4016 /dev/input/event2 acpid 1986 root 6r DIR 0,10 0 1 inotify acpid 1986 root 7u sock 0,6 0t0 5680 can't identify protocol acpid 1986 root 8u unix 0xf5749c00 0t0 5681 /var/run/acpid.socket acpid 1986 root 9u unix 0xf52ad400 0t0 7044 /var/run/acpid.socket acpid 1986 root 10u unix 0xf6fef800 0t0 5683 socket acpid 1986 root 11u unix 0xf5eb1200 0t0 1585927 /var/run/acpid.socket acpid 1986 root 12u unix 0xf543a000 0t0 1585931 /var/run/acpid.socket hald-addo 2632 haldaemon txt REG 3,6 11604 401855 /usr/lib/hal/hald-addon-acpi I looked around on my Debian's installation, and found an acpid package so I uninstalled it to see what happens... FYI: # apt-get remove acpi Reading package lists... Done Building dependency tree Reading state information... Done Package acpi is not installed, so not removed 0 upgraded, 0 newly installed, 0 to remove and 126 not upgraded. foobar:/home/ant/download# apt-cache show acpid Package: acpid Priority: optional Section: admin Installed-Size: 196 Maintainer: Debian Acpi Team <pkg-acpi-devel(a)lists.alioth.debian.org> Architecture: i386 Version: 1:2.0.2-1 Depends: libc6 (>= 2.4), lsb-base (>= 3.2-14), module-init-tools (>> 3.1-rel-2) Recommends: acpi-support-base (>= 0.114-1) Filename: pool/main/a/acpid/acpid_2.0.2-1_i386.deb Size: 48204 MD5sum: f7a607fe746c5503f364ef82cd47cbd8 SHA1: 7fac7cedade5d17f6644da1cff1bdafc10d798b3 SHA256: 852fe7a6ac15d4c11a0d9df2739b34dab3307a3b96ffb9a96029a1b0e23cca81 Description: Advanced Configuration and Power Interface event daemon Modern computers support the Advanced Configuration and Power Interface (ACPI) to allow intelligent power management on your system and to query battery and configuration status. . ACPID is a completely flexible, totally extensible daemon for delivering ACPI events. It listens on netlink interface (or on the deprecated file /proc/acpi/event), and when an event occurs, executes programs to handle the event. The programs it executes are configured through a set of configuration files, which can be dropped into place by packages or by the admin. Homepage: http://acpid.sourceforge.net/ Tag: admin::power-management, hardware::power, hardware::power:acpi, interface::daemon, role::program Task: laptop I am not sure why that is installed since this is a desktop. :P So, now we wait again... At least we're getting more clues. :) -- "To the gods I am an ant, but to the ants, I am a god." --unknown /\___/\ / /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site) | |o o| | Ant's Quality Foraged Links: http://aqfl.net \ _ / Nuke ANT from e-mail address: philpi(a)earthlink.netANT ( ) or ANTant(a)zimage.com Ant is currently not listening to any songs on his home computer. |