From: Alan Cox on
> Penchala Narasimha Reddy Chilakala, ERS-HCLTech (1):
> aacraid: fix File System going into read-only mode

If aacraid is actually getting patches then see
also http://bugzilla.kernel.org/show_bug.cgi?id=11120 which I found
bugzilla tidyying.

Contains a patch and test confirmations


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Wed, 2010-01-27 at 22:24 +0000, Alan Cox wrote:
> > Penchala Narasimha Reddy Chilakala, ERS-HCLTech (1):
> > aacraid: fix File System going into read-only mode
>
> If aacraid is actually getting patches then see
> also http://bugzilla.kernel.org/show_bug.cgi?id=11120 which I found
> bugzilla tidyying.
>
> Contains a patch and test confirmations

So the patch it contains is almost certainly wrong in general; Mark was
just suggesting it as a trial ... it might work for specific adapter
versions but reducing the queue depth by half globally will impact
performance noticeably. The bug report does rather sound like cabling
issues are leading to a firmware related problem.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Cox on
On Wed, 27 Jan 2010 16:33:29 -0600
James Bottomley <James.Bottomley(a)suse.de> wrote:

> On Wed, 2010-01-27 at 22:24 +0000, Alan Cox wrote:
> > > Penchala Narasimha Reddy Chilakala, ERS-HCLTech (1):
> > > aacraid: fix File System going into read-only mode
> >
> > If aacraid is actually getting patches then see
> > also http://bugzilla.kernel.org/show_bug.cgi?id=11120 which I found
> > bugzilla tidyying.
> >
> > Contains a patch and test confirmations
>
> So the patch it contains is almost certainly wrong in general; Mark was
> just suggesting it as a trial ... it might work for specific adapter
> versions but reducing the queue depth by half globally will impact
> performance noticeably. The bug report does rather sound like cabling
> issues are leading to a firmware related problem.

Odd then that they worked reliably until the numbers were increased.
Sorry but having worked on the aacraid for a long time in the past I
don't buy that explanation. Cabling issues would get logged by the driver
and the controller. Secondly I don't buy it because the reporter was
Matthias Ulrichs, who to borrow a hitchhikers term "really knows where his
towel is".

The patch isn't a halving the queue size - its a returning to the known
working state from a regression (unfixed).

The story is pretty simple

Worked until the kernel changed
Didn't work with kernel change
Worked after the kernel changed back.

Kernel's dont go in and fix your cables (much as I wish they did) and
there are two folks who've actually found the bug report specifically
confirming it.

When you have a cable fault on the aacraid you can get hangs on crappier
firmware sets (normally in the BIOS boot though) but it's not dependant
on queue size - it either works or it doesn't. On good firmware you get
nice logged errors and it recovers if possible (or multipaths if you've
got the right bits).

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on
On Wed, 2010-01-27 at 22:46 +0000, Alan Cox wrote:
> On Wed, 27 Jan 2010 16:33:29 -0600
> James Bottomley <James.Bottomley(a)suse.de> wrote:
>
> > On Wed, 2010-01-27 at 22:24 +0000, Alan Cox wrote:
> > > > Penchala Narasimha Reddy Chilakala, ERS-HCLTech (1):
> > > > aacraid: fix File System going into read-only mode
> > >
> > > If aacraid is actually getting patches then see
> > > also http://bugzilla.kernel.org/show_bug.cgi?id=11120 which I found
> > > bugzilla tidyying.
> > >
> > > Contains a patch and test confirmations
> >
> > So the patch it contains is almost certainly wrong in general; Mark was
> > just suggesting it as a trial ... it might work for specific adapter
> > versions but reducing the queue depth by half globally will impact
> > performance noticeably. The bug report does rather sound like cabling
> > issues are leading to a firmware related problem.
>
> Odd then that they worked reliably until the numbers were increased.
> Sorry but having worked on the aacraid for a long time in the past I
> don't buy that explanation. Cabling issues would get logged by the driver
> and the controller. Secondly I don't buy it because the reporter was
> Matthias Ulrichs, who to borrow a hitchhikers term "really knows where his
> towel is".
>
> The patch isn't a halving the queue size - its a returning to the known
> working state from a regression (unfixed).

What regression? The 32 bit queue depth has always been 256 since 2005
(when it was reduced from 512) ... it's never been 127.

> The story is pretty simple
>
> Worked until the kernel changed
> Didn't work with kernel change
> Worked after the kernel changed back.
>
> Kernel's dont go in and fix your cables (much as I wish they did) and
> there are two folks who've actually found the bug report specifically
> confirming it.

But we have two bug reports for all of the aacraids over the last five
years ... the patch would reduce the maximum transfer length from 128k
to 63.5k.

Linux tends to send down the largest transfer size it can, suggesting
that most of the aacraids in the field are happy with 128k.

The maximum transfer length critically impacts I/O throughput and
performance ... I can't just penalise everyone for the sake of two bug
reports.

This value can already be altered on the fly using the

/sys/block/<dev>/queue/max_sectors_kb

Setting that should work for the two reporters without impacting anyone
else.

> When you have a cable fault on the aacraid you can get hangs on crappier
> firmware sets (normally in the BIOS boot though) but it's not dependant
> on queue size - it either works or it doesn't. On good firmware you get
> nice logged errors and it recovers if possible (or multipaths if you've
> got the right bits).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Alan Cox on
> This value can already be altered on the fly using the
>
> /sys/block/<dev>/queue/max_sectors_kb
>
> Setting that should work for the two reporters without impacting anyone
> else.

Ok I'll close it WONTFIX
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/