From: Ant on
>> I was using the same Kernel 2.6.30 before and after the PSU
>> incident. I never had problems before, but started having
>> problems after. Unless something else like related kernel
>> updates (modules or whatever) started them.
>
> This really points towards a hardware failure. As a general
> rule, the modules are only updated when the kernel changes.
> I suppose someone could try the MS approach of "device drivers"
> on a more-or-less static kernel, but that historically has not been
> the Linux approach. New kernels come out relatively frequently, so
> it is not a big deal to wait an upgrade everything. Note this does
> not apply for foreign modules (like nvidia), but you did not mention
> upgrading -- or did you do something when you changed vidcard?

Oops, I didn't answer your other question for foreign modules. I always
use the latest NVIDIA drivers (beta and stable) from NVIDIA.com. I
compile them. But I never had problems with them before issues came up.
I already saw kernel panics and errors without running X as well.

There's something else I noticed the last few days (this week so far)
that might be related?
Mar 14 21:11:53
Mar 16 05:41:16

/var/log/messages showed only two machine errors for this week so far.
Also, I haven't had kernel pancis for a while too, but then it is
probably because I manually rebooted a lot. I currently only have almost
three days of uptime and they usually come when I have about a week or
so.

The only thing different is the weather and temperatures are much
higher. My room has been about 80F degrees lately (yeah, too warm)
without the windows and fan opened.

Before this week since the issues started, it was much cooler (mid
60-70F degrees in my room). Remember how I said my issues usually come
up during idle times and not during stress times? I wonder if there is a
relationship with temperatures. I checked weather.com's calendar showing
past temperatures for my city, and they seem to match. It doesn't seem
like weather will be cold again for a while too since spring is here. I
am going to keep watching this pattern.
--
"We are anthill men upon an anthill world." --Ray Bradbury
/\___/\
/ /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
From: Ant on
Weird. I just noticed this in my dmesg and have no idea if this is bad
or not:

[246348.660025] Clocksource tsc unstable (delta = -62500120 ns)
I checked previous logs, and none of them have it so it might had been a
hiccup?

# cat /var/log/messages* |grep clocksource (all the way to 2/28/2010
6:47:02 AM PST)
Mar 5 06:41:19 foobar kernel: [ 0.241186] Switching to clocksource
jiffies
Mar 5 06:41:19 foobar kernel: [ 0.281777] Switching to clocksource
acpi_pm
Mar 5 21:05:19 foobar kernel: [ 0.241193] Switching to clocksource
jiffies
Mar 5 21:05:19 foobar kernel: [ 0.281790] Switching to clocksource
acpi_pm
Mar 7 07:30:45 foobar kernel: [ 0.241186] Switching to clocksource
jiffies
Mar 7 07:30:45 foobar kernel: [ 0.281778] Switching to clocksource
acpi_pm
Mar 8 07:43:15 foobar kernel: [ 0.241194] Switching to clocksource
jiffies
Mar 8 07:43:15 foobar kernel: [ 0.281782] Switching to clocksource
acpi_pm
Mar 11 00:29:19 foobar kernel: [ 0.240922] Switching to clocksource
jiffies
Mar 11 00:29:19 foobar kernel: [ 0.281516] Switching to clocksource
acpi_pm
Mar 12 05:45:36 foobar kernel: [ 0.237194] Switching to clocksource
jiffies
Mar 12 05:45:36 foobar kernel: [ 0.277790] Switching to clocksource
acpi_pm
Mar 12 23:57:13 foobar kernel: [ 0.241187] Switching to clocksource
jiffies
Mar 12 23:57:13 foobar kernel: [ 0.281779] Switching to clocksource
acpi_pm
Mar 15 00:32:48 foobar kernel: [ 0.237192] Switching to clocksource
jiffies
Mar 15 00:32:48 foobar kernel: [ 0.277782] Switching to clocksource
acpi_pm
Mar 15 01:16:00 foobar kernel: [ 0.237290] Switching to clocksource
jiffies
Mar 15 01:16:00 foobar kernel: [ 0.277886] Switching to clocksource
acpi_pm
Mar 15 08:25:09 foobar kernel: [ 0.242800] Switching to clocksource
jiffies
Mar 15 08:25:09 foobar kernel: [ 0.283406] Switching to clocksource
acpi_pm
Mar 15 08:31:58 foobar kernel: [ 0.242802] Switching to clocksource
jiffies
Mar 15 08:31:58 foobar kernel: [ 0.283405] Switching to clocksource
acpi_pm

I did a quick Google research and found
https://lists.ubuntu.com/archives/ubuntu-users/2009-February/175828.html
with commands:
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm
# cat /sys/devices/system/clocksource/clocksource0/available_clocksource
acpi_pm

I don't know if this something to worry about or a new clue.
--
"An anthill increases by accumulation. / Medicine is consumed by
distribution. / That which is feared lessens by association. / This is
the thing to understand." --Siddha Nagarjuna
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: philpi(a)earthlink.netANT
( ) or ANTant(a)zimage.com
Ant is currently not listening to any songs on his home computer.
From: Yousuf Khan on
Robert Redelmeier wrote:
> Jerry Peters <jerry(a)example.invalid> wrote in part:
>> Wrong, Linux implements the configuration features also. Some
>> machines, probably newer laptops, can't be configured without ACPI.
>
> While I cannot say that _none_ of the 1000s of device modules use ACPI,
> I can say that most do not need it. Not to say BIOS didn't use it.
> I've compiled lots of kernels and never needed CONFIG_ACPI_*. Nor did
> it help when I couldn't get a device working -- something fairly
> frequent under Linux, especially for wireless. Very frustrating when
> `lspci` shows it. I presume some sort of device code IPL is required.
>
> I have no problem squirting arbitrary bytes at known PCI addr[s], nor
> do I imagine Linus does either, although Stallman might. But giving
> execution over to foreign code in ring0 is a recipe for insecurity.
> You wanna get Theo de Raadt even hotter under the collar? :)


I've found that ACPI has its tentacles into nearly everything these
days, not just power management. It's responsible for assigning IRQ's,
for example. Modern PC's have more than the traditional 15 IRQ's of the
older PC-AT's, and those extra IRQ's are only available to you if you
use the ACPI API. There are actually 100's of IRQ channels these days,
so you should never have to need to share IRQ's.

In fact, it was because of OS stupidity about ACPI which hastened my
exit from XP: I was suffering from constant system panics under XP,
because it was sharing IRQ's between devices that had nothing to with
each other. For example, it was sharing the same IRQ channel between my
1Gbps Ethernet, and my video card, as well as five other minor system
board functions. The same machine has had a dual-boot to Ubuntu Linux
for a long time, and I could see how Linux was able to properly assign
several dozen IRQ's using its implementation of ACPI, with barely any
sharing at all. Similarly, after I upgraded to Windows 7, all of those
panics went away, and you can see that it's using several dozen IRQ's
just like Linux does. So it looks like Windows XP's ACPI implementation
is fundamentally flawed, at least when it comes to IRQ assignments.

So I found that Linux was much better at using ACPI, than Windows XP. So
I don't think Linux is necessarily having any problems with ACPI here.

Yousuf Khan
From: Darren Salt on
I demand that Ant may or may not have written...

> Weird. I just noticed this in my dmesg and have no idea if this is bad
> or not:

> [246348.660025] Clocksource tsc unstable (delta = -62500120 ns)
> I checked previous logs, and none of them have it so it might had been a
> hiccup?

That's harmless.

--
| Darren Salt | linux at youmustbejoking | nr. Ashington, | Doon
| using Debian GNU/Linux | or ds ,demon,co,uk | Northumberland | Army
| + http://www.youmustbejoking.demon.co.uk/ & http://tartarus.org/ds/

Locutus 1-2-3 - a Borg spreadsheet program.
From: Ant on
>> Weird. I just noticed this in my dmesg and have no idea if this is bad
>> or not:
>
>> [246348.660025] Clocksource tsc unstable (delta = -62500120 ns)
>> I checked previous logs, and none of them have it so it might had been a
>> hiccup?
>
> That's harmless.

OK. :) Weird, still no new machine errors for the last two days and no
kernel panics. I am not going to try other tests (e.g., Ubuntu liveCD)
unless kernel panics occur again. It seems like temperature related now?
:/
--
"We are anthill men upon an anthill world." --Ray Bradbury
/\___/\
/ /\ /\ \ Phillip (Ant) @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )