HDD problems causing Kernel panics in Linux/Debian? [Storage]

Prev: [fw] Oracle Drops Hitachi Data Storage Arrays
Next: Any way to revive a dropped 1.5TB Seagate drive? MUST GET IT TO SPIN UP

From: Rod Speed on 9 Mar 2010 17:00

David Brown wrote:
> Rod Speed wrote:
>> Ant wrote
>>> Arno wrote
>>
>>>> On the other hand, the serial interface is simple, so console
>>>> output, including error messages, will still be written to it.
>>>> If you need that output, connect a different computer to
>>>> the serial port, activate the serial console and capture
>>>> its output. I have done this a number of times, mostly to
>>>> try out experimental kernels on a cluster, but also to debug
>>>> kernel panics.
>>
>>> Can I use my old serial external dial-up modem for this?
>>
>
> It should be possible if the connection was up and running in advance
> - I doubt if you'd be able to get a new connection after a disaster.
>
>> Nope, you need a serial cable between the PCs.
>>
>
> That's the best idea.
>
>> It would be a lot better if Linux allowed a dump to a USB stick if
>> you are happy to risk the contents of the USB stick on a kernal
>> panic.
>
> It's the price you pay for flexibility - most of Linux doesn't know
> that you have a USB stick attached. It's all just files.

It isnt most of linux that matters, its just what does the dump that needs to know about it.

From: Darren Salt on 9 Mar 2010 18:35

I demand that Arno may or may not have written...

> In comp.sys.ibm.pc.hardware.storage Ant <ant(a)zimage.comant> wrote:
>> On 3/7/2010 8:56 AM PT, Yousuf Khan typed:
>>>>> HOWTO enable core-dumps - LinuxReviews
>>>>> http://en.linuxreviews.org/HOWTO_enable_core-dumps
>>>> Thanks. Isn't this for program crashes, not kernel panics? I wonder
>>>> why it was removed because I used to see those core files from crashes.
>>> You may want to ask in a Linux newsgroup for more details.
>> I am already am. ;)

> You don't need to,

What – ask in a Linux newsgroup? ;-)

(No, I'm not going to not post this to c.o.l.h.)

> no disk access is possible after a kernel panic, hence no logging. The only
> thing you can do, is to look at the screen or to enable the serial console
> output and log that on another machine.

I normally use netconsole for that.

http://www.mjmwired.net/kernel/Documentation/networking/netconsole.txt

[snip]
--
| Darren Salt | linux at youmustbejoking | nr. Ashington, | Doon
| using Debian GNU/Linux | or ds ,demon,co,uk | Northumberland | Army
| + http://www.youmustbejoking.demon.co.uk/ & http://tlasd.wordpress.com/

I've given up reading books; I find that it takes my mind off myself.

From: Yousuf Khan on 9 Mar 2010 18:43

Vlad_Inhaler wrote:
> I would have no hesitation in creating a special partition for panic
> dumps, hell - if standard Linux filesystems are that sensitive I'd
> even make it VFAT or whatever else is necessary.
> I have reproducible kernel hangs under a certain kind of load, they
> are *not* temperature related and I have no way of working out what
> the hell is going on. Oh, the machine is dual-boot and I don't have
> these problems under XP.
>
> Going further into that here would be hijacking this thread, and I
> have tried that before now anyway without success.
>
> Having some sensible way of taking dumps for further analysis would be
> a really *good thing* - hell, I'd even put an additional old IDE drive
> in there as a destination device if that was what it took. Sorry, but
> that is a 'safety feature' I am not that happy with. Windows can do
> it, mainframe OSs can do it . . .

I'm not as familiar with Linux systems, at least in this case, but I
have a background with Solaris systems, and kernel dumps are written to
the swap partition prior to system restart. Then after restart, a
process runs that detects the presence of a memory dump in the swap
partition and writes it out as a file into the filesystem. The
presumption being that whatever caused the kernel dump during the last
session will not immediately affect the new session after the reboot.

Also the idea behind writing to the swap partition rather than to the
filesystem directly is that it's more likely that a bug will have
affected the filesystem driver, but not the raw disk system driver.

Yousuf Khan

From: Arno on 10 Mar 2010 08:20

In comp.sys.ibm.pc.hardware.storage David Brown <david.brown(a)hesbynett.removethisbit.no> wrote:
> Rod Speed wrote:
>> Ant wrote
>>> Arno wrote
>>
>>>> On the other hand, the serial interface is simple, so console
>>>> output, including error messages, will still be written to it.
>>>> If you need that output, connect a different computer to
>>>> the serial port, activate the serial console and capture
>>>> its output. I have done this a number of times, mostly to
>>>> try out experimental kernels on a cluster, but also to debug
>>>> kernel panics.
>>
>>> Can I use my old serial external dial-up modem for this?
>>

> It should be possible if the connection was up and running in advance -
> I doubt if you'd be able to get a new connection after a disaster.

That is a modem function and you will. The PC will just not be
sending any data after the crash and the modem will not store it.

>> Nope, you need a serial cable between the PCs.
>>

> That's the best idea.

It is.

>> It would be a lot better if Linux allowed a dump to a USB stick if
>> you are happy to risk the contents of the USB stick on a kernal panic.
>>

> It's the price you pay for flexibility - most of Linux doesn't know that
> you have a USB stick attached. It's all just files.

And there is the problem that after a kernel panic the device IDs
are wrong and the wrong device gets written to. Writing to a serial
line typically does not destroy anything and can be accomplished with
a very simple and small pice of assembler code. Also the mapping
ttyS<x> and the actual interface hardware adress is static, unlike
disk drives and can only describe a hardware UART.

In addition, the kernel command line gets patched into the kernel,
and hence is immune to data memory area corruption.

And there is one additional problem: The serial console cannot
caus a kernel panix, the filesystem (needed to write that USB
stick) can, so the USB stick is an incomplete solution anyways
as it cannot reliably log filesystem panics.

Arno
--
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno(a)wagner.name
GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

From: Arno on 10 Mar 2010 08:29

In comp.sys.ibm.pc.hardware.storage Darren Salt <news(a)youmustbejoking.demon.cu.invalid> wrote:
> I demand that Arno may or may not have written...

>> In comp.sys.ibm.pc.hardware.storage Ant <ant(a)zimage.comant> wrote:
>>> On 3/7/2010 8:56 AM PT, Yousuf Khan typed:
>>>>>> HOWTO enable core-dumps - LinuxReviews
>>>>>> http://en.linuxreviews.org/HOWTO_enable_core-dumps
>>>>> Thanks. Isn't this for program crashes, not kernel panics? I wonder
>>>>> why it was removed because I used to see those core files from crashes.
>>>> You may want to ask in a Linux newsgroup for more details.
>>> I am already am. ;)

>> You don't need to,

> What ? ask in a Linux newsgroup? ;-)

> (No, I'm not going to not post this to c.o.l.h.)

;-)

>> no disk access is possible after a kernel panic, hence no logging. The only
>> thing you can do, is to look at the screen or to enable the serial console
>> output and log that on another machine.

> I normally use netconsole for that.

> http://www.mjmwired.net/kernel/Documentation/networking/netconsole.txt

Nice, I was not aware of this. Should read the documentation
I post more carefully, the hint was actually in the excerpt
from kernel-parameters.txt I quoted....

I guess this will not work if the ethernet chip driver causes the
panic though. But at least that would also identity the problematic
component.

Arno
--
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno(a)wagner.name
GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10 11
Prev: [fw] Oracle Drops Hitachi Data Storage Arrays
Next: Any way to revive a dropped 1.5TB Seagate drive? MUST GET IT TO SPIN UP