From: John Stumbles on
I have a server box with a raid setup using mdadm. It currently only has
one drive (that's another story ...) but was working OK as a degraded
array. However since rebooting the system last night it's no longer coming
up.

On bootup after the PC's normal F1-for-bios message (where I can go
into BIOS setup) I get a Tab-for-raid-setup, and if I do that I can see
that the SATA card's VIA BIOS is seeing the drive.

Normal Linux bootup (this is Debian stable) halts as fsck fails because it
can't find /dev/md0.

dmesg seems to be seeing the drive and reports
sd 3:0:0:0 [sdb] Attached SCSI disk

Normally mdadm should Just Work and create /dev/md0 from the physical
drive. Any suggestions what's going wrong, or what to look for next?


--
John Stumbles

If we'd known how much fun grandchildren are
we'd have had them first
From: Aragorn on
On Thursday 04 February 2010 13:32 in comp.os.linux.misc, somebody
identifying as John Stumbles wrote...

> I have a server box with a raid setup using mdadm. It currently only
> has one drive (that's another story ...) [...

But one that may be relevant. :-) If the other hard disk(s) have
failed, then there is a chance that your now last remaining disk is
starting to fail as well.

> ...] but was working OK as a degraded array. However since rebooting
> the system last night it's no longer coming up.

Hmm... You don't keep this server running 24/7?

> On bootup after the PC's normal F1-for-bios message (where I can go
> into BIOS setup) I get a Tab-for-raid-setup, and if I do that I can
> see that the SATA card's VIA BIOS is seeing the drive.

This is irrelevant, though. Motherboard RAID implementations are
typically FakeRAID solutions. The RAID functionality of those chipsets
is limited to real mode only so that an operating system can boot from
such a RAID array and then implement its software RAID solution on it
once the drivers have been loaded.

In other words, activating the RAID in the BIOS would normally only work
until Linux is taking over, which in this case, it appears to be
failing to do.

> Normal Linux bootup (this is Debian stable) halts as fsck fails
> because it can't find /dev/md0.

Did you by any chance install, remove or update any software on this
system recently without that it had been shut down or rebooted since?
If so, it might be a problem with /udev/ - which could also be related
to /sysfs/ of course. Modern GNU/Linux systems usually use /udev/ to
create the device special files on demand.

> dmesg seems to be seeing the drive and reports
> sd 3:0:0:0 [sdb] Attached SCSI disk

So the kernel sees the drive, but somehow the device special file is not
created. Again, with the knowledge I have at the moment, things seem
to point at /udev/ and/or /sysfs/ - is it mounted from within the
initrd?

> Normally mdadm should Just Work and create /dev/md0 from the physical
> drive. Any suggestions what's going wrong, or what to look for next?

Have you tried booting up with a rescue CD or a live CD to see whether
the drive is recognized and/or whether the device special files for it
are created?

--
*Aragorn*
(registered GNU/Linux user #223157)
From: Nigel Wade on
On Thu, 04 Feb 2010 16:18:05 +0100, Aragorn wrote:

> On Thursday 04 February 2010 13:32 in comp.os.linux.misc, somebody
> identifying as John Stumbles wrote...

>
>> Normally mdadm should Just Work and create /dev/md0 from the physical
>> drive. Any suggestions what's going wrong, or what to look for next?
>
> Have you tried booting up with a rescue CD or a live CD to see whether
> the drive is recognized and/or whether the device special files for it
> are created?

Whilst booted to a rescue environment, verify from the partition table
that the relevant partitions have type 'fd' (Linux raid autodetect).

Also check that the magic numbers match those /etc/mdadm.conf. The md
system only automatically starts raids which are defined in /etc/
mdadm.conf (at least it does on RedHat and OpenSUSE).

--
Nigel Wade
From: John Stumbles on
On Thu, 04 Feb 2010 16:18:05 +0100, Aragorn wrote:

> On Thursday 04 February 2010 13:32 in comp.os.linux.misc, somebody
> identifying as John Stumbles wrote...
>
>> I have a server box with a raid setup using mdadm. It currently only
>> has one drive (that's another story ...) [...
>
> But one that may be relevant. :-) If the other hard disk(s) have
> failed, then there is a chance that your now last remaining disk is
> starting to fail as well.

True, or the SATA card playing up. I've had 2 similar cards (eBay
cheapies) go bad so I'm not too confident of this one. The SATA
card BIOS sees the drive, Linux also sees it, the drive itself is OK in
another machine, and an identical (make/model/formatting) drive which
works in another machine similarly isn't recognised in this one.


>> ...] but was working OK as a degraded array. However since rebooting
>> the system last night it's no longer coming up.
>
> Hmm... You don't keep this server running 24/7?

Normally I do. The machine also has an external USB drive attached which
is used for backups and that had stopped working. I had physically moved
the drive (gently, trying to keep the connections intact) so assumed I
must have inadvertently interrupted the connection to that as it became
unmounted so rebooted to try to sort it out.
Which is when the fun started :-(

>> On bootup after the PC's normal F1-for-bios message (where I can go
>> into BIOS setup) I get a Tab-for-raid-setup, and if I do that I can see
>> that the SATA card's VIA BIOS is seeing the drive.
>
> This is irrelevant, though. Motherboard RAID implementations are
> typically FakeRAID solutions.

Yes, I'm using mdadm, not the SATA BIOS RAID: I just referred to the
latter to show that the drive seemed to be recognised by the system.



>> Normal Linux bootup (this is Debian stable) halts as fsck fails because
>> it can't find /dev/md0.
>
> Did you by any chance install, remove or update any software on this
> system recently without that it had been shut down or rebooted since?

No.

>> dmesg seems to be seeing the drive and reports sd 3:0:0:0 [sdb]
>> Attached SCSI disk
>
> So the kernel sees the drive, but somehow the device special file is not
> created. Again, with the knowledge I have at the moment, things seem to
> point at /udev/ and/or /sysfs/ - is it mounted from within the initrd?

You've lost me there. There isn't a /udev or a /sysfs (or were you
italicising those names?). And is what mounted from within initrd, and how
would I know?

>> Normally mdadm should Just Work and create /dev/md0 from the physical
>> drive. Any suggestions what's going wrong, or what to look for next?
>
> Have you tried booting up with a rescue CD or a live CD to see whether
> the drive is recognized and/or whether the device special files for it
> are created?

knoppix sees the drive (as /dev/sdb - /dev/sda is a small drive the
system boots off) but debian doesn't (any more).

Hmmm......



--
John Stumbles

The rain, it rains upon the Just, and on the Unjust fella
But more upon the Just because the Unjust's got the Just's umbrella
From: Aragorn on
On Friday 05 February 2010 00:03 in comp.os.linux.misc, somebody
identifying as John Stumbles wrote...

> On Thu, 04 Feb 2010 16:18:05 +0100, Aragorn wrote:
>
>> On Thursday 04 February 2010 13:32 in comp.os.linux.misc, somebody
>> identifying as John Stumbles wrote...
>>
>>> I have a server box with a raid setup using mdadm. It currently only
>>> has one drive (that's another story ...) [...
>>
>> But one that may be relevant. :-) If the other hard disk(s) have
>> failed, then there is a chance that your now last remaining disk is
>> starting to fail as well.
>
> True, or the SATA card playing up. I've had 2 similar cards (eBay
> cheapies) go bad so I'm not too confident of this one. The SATA
> card BIOS sees the drive, Linux also sees it, the drive itself is OK
> in another machine, and an identical (make/model/formatting) drive
> which works in another machine similarly isn't recognised in this one.

Well, it is possible for two specimens of the same RAID controller to
differently format the disks so that they are only usable on the
controller they were formatted on.

On the other hand, if you've had trouble with this kind of controllers
before...

>>> ...] but was working OK as a degraded array. However since rebooting
>>> the system last night it's no longer coming up.
>>
>> Hmm... You don't keep this server running 24/7?
>
> Normally I do. The machine also has an external USB drive attached
> which is used for backups and that had stopped working.

This is another thing to investigate. Could be related...

> I had physically moved the drive (gently, trying to keep the
> connections intact) so assumed I must have inadvertently interrupted
> the connection to that as it became unmounted so rebooted to try to
> sort it out. Which is when the fun started :-(

Normally, the reconnection and subsequent reboot ought to guarantee
normal operation again. You might be running into a forced filesystem
check upon boot due to the unclean shutdown, but all should normally
work as before again.

Can you, by means of your Knoppix CD, peruse the "/var/log/messages" on
the hard disk for any possible error messages?

>>> On bootup after the PC's normal F1-for-bios message (where I can go
>>> into BIOS setup) I get a Tab-for-raid-setup, and if I do that I can
>>> see that the SATA card's VIA BIOS is seeing the drive.
>>
>> This is irrelevant, though. Motherboard RAID implementations are
>> typically FakeRAID solutions.
>
> Yes, I'm using mdadm, not the SATA BIOS RAID: I just referred to the
> latter to show that the drive seemed to be recognised by the system.

Okay, so the SATA card sees the disk and the kernel also sees it. My
guess at this stage would be filesystem damage which may have erased
your "/etc/mdadm.conf" or damaged your boot-up scripts.

>>> dmesg seems to be seeing the drive and reports sd 3:0:0:0 [sdb]
>>> Attached SCSI disk
>>
>> So the kernel sees the drive, but somehow the device special file is
>> not created. Again, with the knowledge I have at the moment, things
>> seem to point at /udev/ and/or /sysfs/ - is it mounted from within
>> the initrd?
>
> You've lost me there. There isn't a /udev or a /sysfs (or were you
> italicising those names?).

They were italicized, yes. udev mounts a dynamic, tmpfs-based device
filesystem on "/dev", and sysfs is an in-kernel pseudofilesystem,
which - as a spinoff from procfs - is mounted on "/sys". The udev
system uses the information in "/sys" to create or delete device
special files in "/dev" as required. (Or at least, it should. It's
not quite perfect yet.)

> And is what mounted from within initrd, and how would I know?

Well, considering the modular nature of most stock binary distribution
kernels, udev is usually already activated from there - at least,
RedHat and derivatives used to do it like that for a while; don't know
whether they still are - so as to have a "/dev" population already
available before the actual on-disk root filesystem is mounted.

Other approaches - e.g. in Gentoo - are to launch udev from the init
scripts and have it already mount the dynamic "/dev" population from
there, along with sysfs on "/sys" and devpts on "/dev/pts". This as
opposed to having "/sys", "/dev" and "/dev/pts" mounted at a later
stage (at mount time of the additional local filesystems) by
the "mount -a" command and the information in "/etc/fstab".

If your init scripts are still intact, you should be able to ascertain
in the boot scripts at which stage udev is started and "/dev/"
and "/sys" are loaded.

>>> Normally mdadm should Just Work and create /dev/md0 from the
>>> physical drive. Any suggestions what's going wrong, or what to look
>>> for next?
>>
>> Have you tried booting up with a rescue CD or a live CD to see
>> whether the drive is recognized and/or whether the device special
>> files for it are created?
>
> knoppix sees the drive (as /dev/sdb - /dev/sda is a small drive the
> system boots off) but debian doesn't (any more).
>
> Hmmm......

Anything else you were able to ascertain while perusing the on-disk root
filesystem from the Knoppix environment?

--
*Aragorn*
(registered GNU/Linux user #223157)