HDD problems causing Kernel panics in Linux/Debian? [Linux Hardware]

Prev: Upgraded my old Debian box to Kernel 2.6.32, but missing sensors datas and can't compile the latest stable NVIDIA driver.
Next: Logs and dumps for kernel panics to collect and analyze?

From: Yousuf Khan on 9 Mar 2010 18:43

Vlad_Inhaler wrote:
> I would have no hesitation in creating a special partition for panic
> dumps, hell - if standard Linux filesystems are that sensitive I'd
> even make it VFAT or whatever else is necessary.
> I have reproducible kernel hangs under a certain kind of load, they
> are *not* temperature related and I have no way of working out what
> the hell is going on. Oh, the machine is dual-boot and I don't have
> these problems under XP.
>
> Going further into that here would be hijacking this thread, and I
> have tried that before now anyway without success.
>
> Having some sensible way of taking dumps for further analysis would be
> a really *good thing* - hell, I'd even put an additional old IDE drive
> in there as a destination device if that was what it took. Sorry, but
> that is a 'safety feature' I am not that happy with. Windows can do
> it, mainframe OSs can do it . . .

I'm not as familiar with Linux systems, at least in this case, but I
have a background with Solaris systems, and kernel dumps are written to
the swap partition prior to system restart. Then after restart, a
process runs that detects the presence of a memory dump in the swap
partition and writes it out as a file into the filesystem. The
presumption being that whatever caused the kernel dump during the last
session will not immediately affect the new session after the reboot.

Also the idea behind writing to the swap partition rather than to the
filesystem directly is that it's more likely that a bug will have
affected the filesystem driver, but not the raw disk system driver.

Yousuf Khan

From: Arno on 10 Mar 2010 08:20

In comp.sys.ibm.pc.hardware.storage David Brown <david.brown(a)hesbynett.removethisbit.no> wrote:
> Rod Speed wrote:
>> Ant wrote
>>> Arno wrote
>>
>>>> On the other hand, the serial interface is simple, so console
>>>> output, including error messages, will still be written to it.
>>>> If you need that output, connect a different computer to
>>>> the serial port, activate the serial console and capture
>>>> its output. I have done this a number of times, mostly to
>>>> try out experimental kernels on a cluster, but also to debug
>>>> kernel panics.
>>
>>> Can I use my old serial external dial-up modem for this?
>>

> It should be possible if the connection was up and running in advance -
> I doubt if you'd be able to get a new connection after a disaster.

That is a modem function and you will. The PC will just not be
sending any data after the crash and the modem will not store it.

>> Nope, you need a serial cable between the PCs.
>>

> That's the best idea.

It is.

>> It would be a lot better if Linux allowed a dump to a USB stick if
>> you are happy to risk the contents of the USB stick on a kernal panic.
>>

> It's the price you pay for flexibility - most of Linux doesn't know that
> you have a USB stick attached. It's all just files.

And there is the problem that after a kernel panic the device IDs
are wrong and the wrong device gets written to. Writing to a serial
line typically does not destroy anything and can be accomplished with
a very simple and small pice of assembler code. Also the mapping
ttyS<x> and the actual interface hardware adress is static, unlike
disk drives and can only describe a hardware UART.

In addition, the kernel command line gets patched into the kernel,
and hence is immune to data memory area corruption.

And there is one additional problem: The serial console cannot
caus a kernel panix, the filesystem (needed to write that USB
stick) can, so the USB stick is an incomplete solution anyways
as it cannot reliably log filesystem panics.

Arno
--
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno(a)wagner.name
GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

From: Arno on 10 Mar 2010 08:29

In comp.sys.ibm.pc.hardware.storage Darren Salt <news(a)youmustbejoking.demon.cu.invalid> wrote:
> I demand that Arno may or may not have written...

>> In comp.sys.ibm.pc.hardware.storage Ant <ant(a)zimage.comant> wrote:
>>> On 3/7/2010 8:56 AM PT, Yousuf Khan typed:
>>>>>> HOWTO enable core-dumps - LinuxReviews
>>>>>> http://en.linuxreviews.org/HOWTO_enable_core-dumps
>>>>> Thanks. Isn't this for program crashes, not kernel panics? I wonder
>>>>> why it was removed because I used to see those core files from crashes.
>>>> You may want to ask in a Linux newsgroup for more details.
>>> I am already am. ;)

>> You don't need to,

> What ? ask in a Linux newsgroup? ;-)

> (No, I'm not going to not post this to c.o.l.h.)

;-)

>> no disk access is possible after a kernel panic, hence no logging. The only
>> thing you can do, is to look at the screen or to enable the serial console
>> output and log that on another machine.

> I normally use netconsole for that.

> http://www.mjmwired.net/kernel/Documentation/networking/netconsole.txt

Nice, I was not aware of this. Should read the documentation
I post more carefully, the hint was actually in the excerpt
from kernel-parameters.txt I quoted....

I guess this will not work if the ethernet chip driver causes the
panic though. But at least that would also identity the problematic
component.

Arno
--
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno(a)wagner.name
GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

From: Ant on 10 Mar 2010 10:10

On 3/9/2010 9:07 AM PT, Rod Speed typed:

>>> On the other hand, the serial interface is simple, so console
>>> output, including error messages, will still be written to it.
>>> If you need that output, connect a different computer to
>>> the serial port, activate the serial console and capture
>>> its output. I have done this a number of times, mostly to
>>> try out experimental kernels on a cluster, but also to debug
>>> kernel panics.
>
>> Can I use my old serial external dial-up modem for this?
>
> Nope, you need a serial cable between the PCs.
>
> It would be a lot better if Linux allowed a dump to a USB stick if
> you are happy to risk the contents of the USB stick on a kernal panic.

Yes, I have no problems with a USB flash drive/stick. I can reformat. ;)
--
"I love ants. Do they have uncles? Ha Ha!" --Elmo from Sesame Street
(unknown episode)
/\___/\
/ /\ /\ \ Phil./Ant @ http://antfarm.ma.cx (Personal Web Site)
| |o o| | Ant's Quality Foraged Links: http://aqfl.net
\ _ / Nuke ANT from e-mail address: philpi(a)earthlink.netANT
( ) or ANTant(a)zimage.com
Ant is currently not listening to any songs on his home computer.

From: Vlad_Inhaler on 10 Mar 2010 13:03

On Mar 9, 7:14 pm, Arno <m...(a)privacy.net> wrote:
> In comp.sys.ibm.pc.hardware.storage Vlad_Inhaler <andrew.willi...(a)t-online.de> wrote:
>
>
>
>
> And Linux can do it. It just dumps to console instead of disk and
> this choice is resonable because fo data safety, albeit sometimes
> inconvenient in cheap setups. (Nothing against cheap setups, but
> they are a bit limited on the hardware side and that sometimes is
> inconvenient.)
>
> You are supposed to have more than one of these boxes in one place
> and then there is no issue. You can also use a number of
> serial-over-internet devices to record logs. Or a laptop with
> serial interface placed next to the offending machine. Or a modem.
> Or a serial data recorder, for example the Logomatic v2 Serial
> SD Datalogger (-> Google), which costs about 50 EUR.
>
> The cheapest solution is usually just a serial crossover cable to
> the next box in the rack that is under your control. Remember
> that this is a sercer OS we are talking about here, not an
> MS single-user-no-network OS that has over the course of time
> been heavily extended.
>
> Side note: With server PC hardware you get an IPMI console that
> also gives you the output, so the comparison with big iron is not
> fair. The serial console is the low-low-cost solution.
>
> I should also add that a "soft panic" (which is closest to a blue
> screen) typically dumps to /var/log/messages. It is only a hard panic
> that is limited to the console. A hard panic corresponds to a lockup
> without blue screen on windows.
>
> Arno
>
> --
> Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: a...(a)wagner.name
> GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
> ----
> Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

Nah, the NT family was designed to be on a network from the very
start. When you say 'single-user-no-network' you are talking about
3.1. Even the Win95/98/ME line was expecting to be hooked up although
the network support was just an add-on.
I will have to take the time next week to study this area (dumping
over serial interfaces). Of course, then I need to be able to
understand the dump :-(

Yousuf Khan's comment about how Solaris does it was very interesting.
My day-job is on mainframes (not IBM) and when you boot one of them,
they always ask if you want a dump of the previous session. That
would be rather annoying here but it is a good starting point.
Dumping after a previous crash landing would be useful, at least as an
option which could be turned on in some way.

First | Prev |
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: Upgraded my old Debian box to Kernel 2.6.32, but missing sensors datas and can't compile the latest stable NVIDIA driver.
Next: Logs and dumps for kernel panics to collect and analyze?