From: Alan Cox on
> A reboot brings the disks back to life. So in theory, Linux should be
> able to restore life into these drives by doing the right magic with
> the hardware bits...

We don't have power control of the drives. If the firmware crashes or a
drive flakes out due to power problems or something similar occurs its
game over until you hit the switch.

> ata5: hard resetting link
> ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 370)

We tried the biggest hammer we had

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Cox on
> Is one of these modules the driver for this controller? I think it's
> AHCI: lshw says it uses ports cc00 ... and a bunch of others, and
> those ports are claimed by ahci according to /proc/ioports. Ah! I need
> better eyes. lshw already mentions that it's ahci...

AHCI will be driving it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rogier Wolff on
On Tue, Jun 15, 2010 at 11:07:48AM +0100, Alan Cox wrote:
> > A reboot brings the disks back to life. So in theory, Linux should be
> > able to restore life into these drives by doing the right magic with
> > the hardware bits...

> We don't have power control of the drives. If the firmware crashes
> or a drive flakes out due to power problems or something similar
> occurs its game over until you hit the switch.

The thing is, the power didn't cycle. I just typed "reboot" from a
remote location. (Yes, in most cases leading up to yesterday's/this
morning's event I thought I had to powercycle to bring them back, but
I tried "just the reboot" this morning and it worked!)

The controller has TWO drives connected. BOTH drives became
inaccessible at exactly the same point in time. This has happened
before, with BOTH drives disappearing at the same moment.

The RAID superblocks on BOTH drives had info like:
RAID disk 1/8, raid is up 8/8
say for disk numbers 1,2.

All six other drives had
RAID disk 4/8, raid is broken 6/8
say, for disk numbers 0, 3,4,5,6,7

Next time this happens, I'll try removing and reinserting all the sata
modules (the machine is a file-server. It's NFS-root so it doesn't
depend on the storage modules for it's root fs.... :-) )

sata_nv 20758 0
ahci 36037 6

Is one of these modules the driver for this controller? I think it's
AHCI: lshw says it uses ports cc00 ... and a bunch of others, and
those ports are claimed by ahci according to /proc/ioports. Ah! I need
better eyes. lshw already mentions that it's ahci...

> > ata5: hard resetting link
> > ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 370)
>
> We tried the biggest hammer we had

Not big enough! De BIOS manages a bigger one!

Roger.

--
** R.E.Wolff(a)BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan on
On Tue, 2010-06-15 at 16:01 +0100, Alan Cox wrote:
> > Is one of these modules the driver for this controller? I think it's
> > AHCI: lshw says it uses ports cc00 ... and a bunch of others, and
> > those ports are claimed by ahci according to /proc/ioports. Ah! I need
> > better eyes. lshw already mentions that it's ahci...
>
> AHCI will be driving it.

I have seen this problem with the 2.6.33 kernel in Fedora 13. The
problem goes away in 2.6.35-rc3. (Though networking is fubared for me on
that kernel, so I have not migrated to it.)

My understanding is the "fix" in the driver was to blacklist ncq for
that controller. I have not verified that yet.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/