From: Ortwin Glück on
From time to time this nVidia SATA controller chokes on a FLUSH CACHE.

1. why does the kernel not try to HARD reset the link?
2. it would be nice to have the possibility to manually force a (hard) reset or
to re-initialize the device. Other than rebooting I mean :-)

Thanks.
Ortwin

Kernel is basically a 2.6.32.8 with a cherry picked fix:
http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob;f=queue-2.6.33/enable-retries-for-syncronize_cache-commands-to-fix-i-o-error.patch;h=2401e54b05502803889d4ece2afefc3e2b64995f;hb=117d7c078957b2e200e3fcf06c182422366764b0

Jun 23 12:06:48 gollum kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0
action 0x6 frozen
Jun 23 12:06:48 gollum kernel: ata2.00: failed command: FLUSH CACHE
Jun 23 12:06:48 gollum kernel: ata2.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0
tag 0
Jun 23 12:06:48 gollum kernel: res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4
(timeout)
Jun 23 12:06:48 gollum kernel: ata2.00: status: { DRDY }
Jun 23 12:06:53 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:06:53 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: device not ready (errno=-16), forcing hardreset
Jun 23 12:07:59 gollum kernel: ata2: device not ready (errno=-16), forcing hardreset
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: link is slow to respond, please be patient
(ready=0)
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: soft resetting link
Jun 23 12:07:59 gollum kernel: ata2: SRST failed (errno=-16)
Jun 23 12:07:59 gollum kernel: ata2: reset failed, giving up
Jun 23 12:07:59 gollum kernel: ata2.00: disabled
Jun 23 12:07:59 gollum kernel: ata2.00: disabled
Jun 23 12:07:59 gollum kernel: ata2.01: disabled
Jun 23 12:07:59 gollum kernel: ata2.01: disabled
Jun 23 12:07:59 gollum kernel: ata2.00: device reported invalid CHS sector 0
Jun 23 12:07:59 gollum kernel: ata2.00: device reported invalid CHS sector 0
Jun 23 12:07:59 gollum kernel: ata2: EH complete
Jun 23 12:07:59 gollum kernel: ata2: EH complete
Jun 23 12:07:59 gollum kernel: end_request: I/O error, dev sdb, sector 58604962
Jun 23 12:07:59 gollum kernel: md: super_written gets error=-5, uptodate=0
Jun 23 12:07:59 gollum kernel: md: super_written gets error=-5, uptodate=0
Jun 23 12:07:59 gollum kernel: raid1: Disk failure on sdb3, disabling device.
Jun 23 12:07:59 gollum kernel: raid1: Operation continuing on 1 devices.
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3
Jun 23 12:07:59 gollum kernel: disk 1, wo:1, o:0, dev:sdb3
Jun 23 12:07:59 gollum kernel: disk 1, wo:1, o:0, dev:sdb3
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: RAID1 conf printout:
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: --- wd:1 rd:2
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3
Jun 23 12:07:59 gollum kernel: disk 0, wo:0, o:1, dev:sda3


lspci:
00:1e.0 PCI bridge: nVidia Corporation nForce2 AGP (rev c1)
00:1e.0 0604: 10de:01e8 (rev c1)

ATA initialization:
Jun 23 18:46:38 gollum kernel: ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max
UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max
UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.00: 58605120 sectors, multi 16: LBA
Jun 23 18:46:38 gollum kernel: ata2.00: 58605120 sectors, multi 16: LBA
Jun 23 18:46:38 gollum kernel: ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel
DVD-115 0127, E1.27, max UDMA/33
Jun 23 18:46:38 gollum kernel: ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel
DVD-115 0127, E1.27, max UDMA/33
Jun 23 18:46:38 gollum kernel: ata2: nv_mode_filter: 0x3f39f&0x3f39f->0x3f39f,
BIOS=0x3f000 (0xc700c6c0) ACPI=0x3f01f (20:60:0x1f)
Jun 23 18:46:38 gollum kernel: ata2: nv_mode_filter: 0x739f&0x739f->0x739f,
BIOS=0x7000 (0xc700c6c0) ACPI=0x701f (20:60:0x1f)
Jun 23 18:46:38 gollum kernel: ata2.00: configured for UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.00: configured for UDMA/100
Jun 23 18:46:38 gollum kernel: ata2.01: configured for UDMA/33
Jun 23 18:46:38 gollum kernel: ata2.01: configured for UDMA/33
Jun 23 18:46:38 gollum kernel: scsi 1:0:0:0: Direct-Access ATA
IC25N030ATCS04-0 CA3O PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: scsi 1:0:0:0: Direct-Access ATA
IC25N030ATCS04-0 CA3O PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] 58605120 512-byte logical
blocks: (30.0 GB/27.9 GiB)
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] 58605120 512-byte logical
blocks: (30.0 GB/27.9 GiB)
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read
cache: enabled, doesn't support DPO or FUA
Jun 23 18:46:38 gollum kernel: sdb:
Jun 23 18:46:38 gollum kernel: sdb:
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0
Jun 23 18:46:38 gollum kernel: sdb1 sdb2 sdb3
Jun 23 18:46:38 gollum kernel: sdb1 sdb2 sdb3
Jun 23 18:46:38 gollum kernel: scsi 1:0:1:0: CD-ROM PIONEER DVD-ROM
DVD-115F 1.27 PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: scsi 1:0:1:0: CD-ROM PIONEER DVD-ROM
DVD-115F 1.27 PQ: 0 ANSI: 5
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Attached SCSI disk
Jun 23 18:46:38 gollum kernel: sd 1:0:0:0: [sdb] Attached SCSI disk
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tejun Heo on
Hello,

On 06/23/2010 07:08 PM, Ortwin Gl�ck wrote:
>>From time to time this nVidia SATA controller chokes on a FLUSH CACHE.
>
> 1. why does the kernel not try to HARD reset the link?

Because hardreset sometimes brings the link completely offline on
sata_nv's. Hardreset on sata_nv controllers is quite fragile.

> 2. it would be nice to have the possibility to manually force a
> (hard) reset or to re-initialize the device. Other than rebooting I
> mean :-)

Maybe we can use hardreset as the last resort before ditching the
device. Something like the following. Can you please try it and post
the kernel log? Thanks.

diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
index 2116113..5105951 100644
--- a/drivers/ata/sata_nv.c
+++ b/drivers/ata/sata_nv.c
@@ -1587,7 +1587,7 @@ static int nv_hardreset(struct ata_link *link, unsigned int *class,
* comment above port ops for details.
*/
if (!(link->ap->pflags & ATA_PFLAG_LOADING) &&
- !ata_dev_enabled(link->device))
+ (!ata_dev_enabled(link->device) || ehc->tries[0] == 1))
sata_link_hardreset(link, sata_deb_timing_hotplug, deadline,
NULL, NULL);
else {

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tejun Heo on
On 06/23/2010 07:28 PM, Tejun Heo wrote:
> Maybe we can use hardreset as the last resort before ditching the
> device. Something like the following. Can you please try it and post
> the kernel log? Thanks.

Meh, it won't work. It's failing softreset so we should be checking
reset try counts. I'll try to write up something tomorrow.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ortwin Glück on
On 23.06.2010 19:31, Tejun Heo wrote:
> Meh, it won't work. It's failing softreset so we should be checking
> reset try counts. I'll try to write up something tomorrow.

I am happy to try patches. The problem shows up maybe once a month only,
however. Interestingly it only occurs on ata2 with a IBM hdd, never on the first
channel with a maxtor. I can also try and attach the dvd on the first channel
and see if that makes any difference.

ata1.00: ATA-5: MAXTOR 6L020J1, A93.0500, max UDMA/133
ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max UDMA/100
ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel DVD-115 0127, E1.27, max UDMA/33

Thanks.
Ortwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tejun Heo on
Hello,

Patch attached, but please see below.

On 06/24/2010 09:13 AM, Ortwin Gl�ck wrote:
> On 23.06.2010 19:31, Tejun Heo wrote:
>> Meh, it won't work. It's failing softreset so we should be checking
>> reset try counts. I'll try to write up something tomorrow.
>
> I am happy to try patches. The problem shows up maybe once a month
> only, however. Interestingly it only occurs on ata2 with a IBM hdd,
> never on the first channel with a maxtor. I can also try and attach
> the dvd on the first channel and see if that makes any difference.

Problems like this definitely can depend on the specific drive.

> ata1.00: ATA-5: MAXTOR 6L020J1, A93.0500, max UDMA/133
> ata2.00: ATA-5: IC25N030ATCS04-0, CA3OA71A, max UDMA/100
> ata2.01: ATAPI: Pioneer DVD-ROM ATAPIModel DVD-115 0127, E1.27, max UDMA/33

Is it PATA? Why do you have 2.01? Can you please attach full boot
log?

Thanks.

--
tejun