nvidia controller failed command, possibly related to SMART selftest (2.6.32) [Kernel]

Prev: first installment of $8,600.00 USD
Next: hackbench regression due to commit 9dfc6e68bfe6e

From: Tejun Heo on 25 Mar 2010 03:20

Hello,

On 03/15/2010 01:16 AM, Robert Hancock wrote:
>> If it's of any relevance, the problems also occured with 2.6.26, but
>> the RAID code didn't always eject the disks on that kernel; the
>> first time I encountered a degraded array due to this was shortly
>> after the upgrade to 2.6.32. However, this is speculation, I have
>> not verified the causality.

nv reset code has received several changes during that time frame one
of which being avoiding hardreset unless it's a hotplug situation.
This was necessary because some controllers fail to re-recognize the
attached drive after a hardreset. This decision was made as losing
drives which can be recovered by SRST is less dangerous than losing
drives which require hardreset after a failure. NV reset protocols
are very messed up and at this point I don't think it's possible to
make it behave as well as other controllers. If you're on earlier
NVs, losing disk after an exception condition is something which can
happen from time to time.

>> Generally, SMART self-tests should be a transparent operation that
>> doesn't affect the operating system's use of the devices, right? Is
>> it conceivable or even common that the disks' own controllers are
>> broken to the point where they fall over SMART tests?

Yeah, sure, it definitely is possible. A good hardreset usually would
put some sense back into the firmware but NV can't do that safely, so
it loses the drive.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

|
Pages: 1
Prev: first installment of $8,600.00 USD
Next: hackbench regression due to commit 9dfc6e68bfe6e