From: Kai Harrekilde-Petersen on
Yousuf Khan <bbbl67(a)spammenot.yahoo.com> writes:

> Mark Hobley wrote:
>>
>> Is this a new system?
>
> No, it's a pretty mature system now. I built it and upgrade it
> myself. It's an AMD A64X2-4200+ w/ 4GB RAM, and it runs in either
> 32-bit WinXP SP3 or 64-bit Ubuntu 9.10.

Are you using ECC-RAM? I've seen 'unexplainable' crashes on an old
non-ECC machine that was caused by memory corruption. The problem
increased over time until I replaced the system with an ECC-enabled
system.

If you don't use ECC, try memtest86 and/or unplugging some of the RAM
modules.


Kai
--
Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>
From: Jose on
On Jan 10, 11:48 pm, Yousuf Khan <bbb...(a)spammenot.yahoo.com> wrote:
> Jose wrote:
> > If you are using the small memory dump you will have that message.
>
>  >
>  > You need to adjust your Startup and Recovery Debugging information to
>  > do a complete memory dump and try again with a new dump file.
>
> Ah, I see, okay, then I'll go change that then.
>
> > Did you get nothing useful from !analyze -v
>
> Well yes, I found out that NTOSKRNL is involved in all of them. :-)
>
>         Yousuf Khan

The ntoskrnl.exe will show up as the "Probably caused by" frequently
but that in itself is generally not the problem.

If you suspect ntoskrnl.exe, replace it then you will know what it is
not. If you suspect your other files, replace them too.

I would be looking more in the Bugcheck Analysis STACK TEXT section.
From: Jose on
On Jan 11, 12:19 am, Kai Harrekilde-Petersen <k...(a)harrekilde.dk>
wrote:
> Yousuf Khan <bbb...(a)spammenot.yahoo.com> writes:
> > Mark Hobley wrote:
>
> >> Is this a new system?
>
> > No, it's a pretty mature system now. I built it and upgrade it
> > myself. It's an AMD A64X2-4200+ w/ 4GB RAM, and it runs in either
> > 32-bit WinXP SP3 or 64-bit Ubuntu 9.10.
>
> Are you using ECC-RAM? I've seen 'unexplainable' crashes on an old
> non-ECC machine that was caused by memory corruption.  The problem
> increased over time until I replaced the system with an ECC-enabled
> system.
>
> If you don't use ECC, try memtest86 and/or unplugging some of the RAM
> modules.
>
> Kai
> --
> Kai Harrekilde-Petersen <khp(at)harrekilde(dot)dk>

Hopefully you mean memtest86+ which will certainly not hurt to run!

If someone says to run memtest86, you can say that you know memtest86+
supercedes memtest86 and here's why:

http://en.wikipedia.org/wiki/Memtest86

The file and instructions are here:

http://www.memtest.org/
From: Yousuf Khan on
Jose wrote:
> The ntoskrnl.exe will show up as the "Probably caused by" frequently
> but that in itself is generally not the problem.

I agree, actually my main purpose in finding out the root cause of this
is find out if it is caused by hardware rather than software.

I recently added an external USB hard drive to my system, and the
problem started a few days afterward. But there is nothing special about
this external drive, it is just a bog standard drive using the bog
standard Microsoft Mass Storage drivers. And there was a previous bog
standard external drive that is also running on the system which was not
causing a problem.

I'm also looking at the possibility that the problem is caused by the
chipset, an Nvidia Nforce model, which has had nothing but weird issues
with USB devices since I got this motherboard. Ever since I got this
motherboard, I've seen that some devices get recognized as USB 2.0 while
others which should be recognized as USB 2.0 get recognized as USB 1.1.
I've tried the same peripherals on another computer of mine, using an
ATI chipset, and they get recognized properly. So I think the chipset
itself has a faulty implementation of the USB specs.

> If you suspect ntoskrnl.exe, replace it then you will know what it is
> not. If you suspect your other files, replace them too.

In the past when I've had BSODs, it was relatively easy to narrow the
source of the problem down to some third party driver, and update that
driver. But now these are the actual core Windows kernel and related
files, so I am having to do more indepth analysis than I normally would do.

> I would be looking more in the Bugcheck Analysis STACK TEXT section.

I actually previously posted a message on one these newsgroups, where I
posted the summaries of the first three Stop errors I got, but there was
little help that came back. I'll post them again right now (don't have
access to the latest crash summary, since I'm posting this from a
different system).

Yousuf Khan

***
The following are the summaries of each mini-dump:

(1) 31/12/2009 9:27:06 PM
Bug Check String : PAGE_FAULT_IN_NONPAGED_AREA
Bug Check Code : 0x10000050
Parameter 1 : 0x8b55ffaf
Parameter 2 : 0x00000000
Parameter 3 : 0x804f1b2c
Parameter 4 : 0x00000000
Caused By Driver : hal.dll
Caused By Address : hal.dll+2aa8
File Description : Hardware Abstraction Layer DLL

Stack:
hal.dll+2aa8
ntoskrnl.exe+1db2c

(2) 02/01/2010 9:49:05 PM
Bug Check String : BAD_POOL_HEADER
Bug Check Code : 0x00000019
Parameter 1 : 0x00000020
Parameter 2 : 0x8942aab8
Parameter 3 : 0x8942af40
Parameter 4 : 0x8a915628
Caused By Driver : ntoskrnl.exe
Caused By Address : ntoskrnl.exe+6067a

Stack:
Ntfs.sys+212aa
ntoskrnl.exe+6067a

(3) 06/01/2010 11:22:38 PM
Bug Check String : BAD_POOL_CALLER
Bug Check Code : 0x000000c2
Parameter 1 : 0x00000007
Parameter 2 : 0x00000c3e
Parameter 3 : 0x000027ca
Parameter 4 : 0x8ab31114
Caused By Driver : fltmgr.sys
Caused By Address : fltmgr.sys+14e3f

Stack:
fltmgr.sys+14e3f
hal.dll+2900
ntoskrnl.exe+909b4
From: Yousuf Khan on
Kai Harrekilde-Petersen wrote:
> Are you using ECC-RAM? I've seen 'unexplainable' crashes on an old
> non-ECC machine that was caused by memory corruption. The problem
> increased over time until I replaced the system with an ECC-enabled
> system.
>
> If you don't use ECC, try memtest86 and/or unplugging some of the RAM
> modules.

That was on my list of things to try. Memtest86 is automatically part of
my multi-boot options since I run Ubuntu. However, so far the problem
hasn't really occurred under Ubuntu, just under Windows. Mind you I
don't run Ubuntu long enough on this system to get an adequate idea. The
machine pretty much stays on 24 hours, so it's difficult to take it down
and run a memtest on it for several hours.

Another reason I don't totally suspect it's RAM-related is because the
problems began happening after I installed a new external USB hard drive
to the system. So I'm going to investigate if that contributed to it.

Yousuf Khan