From: Stephan Diestelhorst on
Rafael J. Wysocki wrote:
> On Saturday, July 10, 2010, Stephan Diestelhorst wrote:
> > Rafael J. Wysocki wrote:
> > > On Friday, July 09, 2010, Stephan Diestelhorst wrote:
> > > > I wrote:
> > > > > I have an issue with suspend to RAM and I/O load on a disk. Symptoms
> > > > > are that the disk does not respond to requests when woken up, producing
> > > > > only I/O errors on all tested kernels (newest 2.6.35-rc4 (Ubuntu
> > > > > mainline PPA build)):
> > > > >
> > > > <snip>
> > > >
> > > > > This can be triggered most reliably with multiple "direct" writes to
> > > > > disk, I create the load with the attached script. If the issue is
> > > > > triggered, suspend (through pm-suspend) takes very long.
> > > >
> > > > > IMHO the interesting log output during suspend is:
> > > > > [ 1674.700125] ata1.00: qc timeout (cmd 0xec)
>
> I have a box where this problem is kind of reproducible, but it happens _very_
> rarely. Also I can't reproduce it on demand running suspend-resume in a tight
> loop. Are you able to reproduce it more regurarly?

For me it is much more reproducible. If I run multiple direct writing
dd-s to the disk in question I trigger it rather reliably (~75% or
higher). See the attached script from an earlier email.
Maybe that helps triggering your case more reliabl, too?

> Also, what kind of disk do you use?

It is a Samsung HM321HI in a Samsung Eikee R525 notebook, please also
see my smartctl -a log, attached earlier.

Interesting, I have a similar symptom on one of my home servers,
which has a *Samsung* SpinPoint F1 and it went away with different
disks. So maybe these disks are either faulty themselves or they
trigger the issue more often?

I also have a LVM on top of LUKS on the disk. So the I/O will also
add some computational overhead for encryption.

Stephan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Tejun Heo on
On 07/10/2010 08:50 AM, Stephan Diestelhorst wrote:
>> I have a box where this problem is kind of reproducible, but it happens _very_
>> rarely. Also I can't reproduce it on demand running suspend-resume in a tight
>> loop. Are you able to reproduce it more regurarly?
>
> For me it is much more reproducible. If I run multiple direct writing
> dd-s to the disk in question I trigger it rather reliably (~75% or
> higher). See the attached script from an earlier email.
> Maybe that helps triggering your case more reliabl, too?

Can you please try the following git tree?

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git libata-irq-expect

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Saturday, July 10, 2010, Stephan Diestelhorst wrote:
> Rafael J. Wysocki wrote:
> > On Saturday, July 10, 2010, Stephan Diestelhorst wrote:
> > > Rafael J. Wysocki wrote:
> > > > On Friday, July 09, 2010, Stephan Diestelhorst wrote:
> > > > > I wrote:
> > > > > > I have an issue with suspend to RAM and I/O load on a disk. Symptoms
> > > > > > are that the disk does not respond to requests when woken up, producing
> > > > > > only I/O errors on all tested kernels (newest 2.6.35-rc4 (Ubuntu
> > > > > > mainline PPA build)):
> > > > > >
> > > > > <snip>
> > > > >
> > > > > > This can be triggered most reliably with multiple "direct" writes to
> > > > > > disk, I create the load with the attached script. If the issue is
> > > > > > triggered, suspend (through pm-suspend) takes very long.
> > > > >
> > > > > > IMHO the interesting log output during suspend is:
> > > > > > [ 1674.700125] ata1.00: qc timeout (cmd 0xec)
> >
> > I have a box where this problem is kind of reproducible, but it happens _very_
> > rarely. Also I can't reproduce it on demand running suspend-resume in a tight
> > loop. Are you able to reproduce it more regurarly?
>
> For me it is much more reproducible. If I run multiple direct writing
> dd-s to the disk in question I trigger it rather reliably (~75% or
> higher). See the attached script from an earlier email.
> Maybe that helps triggering your case more reliabl, too?
>
> > Also, what kind of disk do you use?
>
> It is a Samsung HM321HI in a Samsung Eikee R525 notebook, please also
> see my smartctl -a log, attached earlier.
>
> Interesting, I have a similar symptom on one of my home servers,
> which has a *Samsung* SpinPoint F1 and it went away with different
> disks. So maybe these disks are either faulty themselves or they
> trigger the issue more often?

They may be doing something that causes the issue to appear.

That said, on my test box this only happens during suspend and it's an Intel
SSD (INTEL SSDSA2M080G2GC, 2CV102HD to be precise).

> I also have a LVM on top of LUKS on the disk. So the I/O will also
> add some computational overhead for encryption.

There are only ext3/ext4 partitions on the disk in my case.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Rafael J. Wysocki on
On Saturday, July 10, 2010, Tejun Heo wrote:
> On 07/10/2010 08:50 AM, Stephan Diestelhorst wrote:
> >> I have a box where this problem is kind of reproducible, but it happens _very_
> >> rarely. Also I can't reproduce it on demand running suspend-resume in a tight
> >> loop. Are you able to reproduce it more regurarly?
> >
> > For me it is much more reproducible. If I run multiple direct writing
> > dd-s to the disk in question I trigger it rather reliably (~75% or
> > higher). See the attached script from an earlier email.
> > Maybe that helps triggering your case more reliabl, too?
>
> Can you please try the following git tree?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git libata-irq-expect

Well, for now I got this:

[ 36.833075] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 36.833085] ata1.00: failed command: SMART
[ 36.833099] ata1.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
[ 36.833101] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 36.833107] ata1.00: status: { DRDY }
[ 36.833118] ata1: hard resetting link
[ 37.316053] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 37.316840] ata1.00: configured for UDMA/133
[ 37.316888] ata1: EH complete

during initialization. Apart from this it seems to work fine.

But in fact I'll only be able to say it helps if it survives a week-or-so
without suspend failure.

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Maciej Rutecki on
On piÄ…tek, 9 lipca 2010 o 17:50:04 Stephan Diestelhorst wrote:
> Hi,
> I have n issue with suepnd to RAM and I/O load on a disk. Symptoms
> are that the disk does not respond to requests when woken up, producing
> only I/O errors on all tested kernels (newest 2.6.35-rc4 (Ubuntu
> mainline PPA build)):
>

I created a Bugzilla entry at
https://bugzilla.kernel.org/show_bug.cgi?id=16370
for your bug report, please add your address to the CC list in there, thanks!

--
Maciej Rutecki
http://www.maciek.unixy.pl
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/