Prev: KVM, Fix QEMU-KVM is killed by guest SRAO MCE (resend)
Next: drivers/watchdog: Eliminate a NULL pointer dereference
From: James Bottomley on 1 Jun 2010 09:30 On Tue, 2010-06-01 at 12:30 +0200, Christof Schmitt wrote: > What is the best strategy to continue with the invalid guard tags on > write requests? Should this be fixed in the filesystems? For write requests, as long as the page dirty bit is still set, it's safe to drop the request, since it's already going to be repeated. What we probably want is an error code we can return that the layer that sees both the request and the page flags can make the call. > Another idea would be to pass invalid guard tags on write requests > down to the hardware, expect an "invalid guard tag" error and report > it to the block layer where a new checksum is generated and the > request is issued again. Basically implement a retry through the whole > I/O stack. But this also sounds complicated. No, no ... as long as the guard tag is wrong because the fs changed the page, the write request for the updated page will already be queued or in-flight, so there's no need to retry. We still have to pass checksum failures on in case the data changed because of some HW (or SW) cockup. The check for this is page dirty. If we get a checksum error back and the page is still clean, we know nothing in the OS changed it, therefore it's a real bit flip error. James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Chris Mason on 1 Jun 2010 09:40 On Tue, Jun 01, 2010 at 01:27:56PM +0000, James Bottomley wrote: > On Tue, 2010-06-01 at 12:30 +0200, Christof Schmitt wrote: > > What is the best strategy to continue with the invalid guard tags on > > write requests? Should this be fixed in the filesystems? > > For write requests, as long as the page dirty bit is still set, it's > safe to drop the request, since it's already going to be repeated. What > we probably want is an error code we can return that the layer that sees > both the request and the page flags can make the call. I'm afraid this isn't entirely true. The FS tends to do this: change the page <---------> truck sized race right here where the page is clean mark the page dirty -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: James Bottomley on 1 Jun 2010 09:50 On Tue, 2010-06-01 at 09:33 -0400, Chris Mason wrote: > On Tue, Jun 01, 2010 at 01:27:56PM +0000, James Bottomley wrote: > > On Tue, 2010-06-01 at 12:30 +0200, Christof Schmitt wrote: > > > What is the best strategy to continue with the invalid guard tags on > > > write requests? Should this be fixed in the filesystems? > > > > For write requests, as long as the page dirty bit is still set, it's > > safe to drop the request, since it's already going to be repeated. What > > we probably want is an error code we can return that the layer that sees > > both the request and the page flags can make the call. > > I'm afraid this isn't entirely true. The FS tends to do this: > > change the page > <---------> truck sized race right here where the page is clean > mark the page dirty Would it be too much work in the fs to mark the page dirty before you begin altering it (and again after you finish, just in case some cleaner noticed and initiated a write)? Or some other flag that indicates page under modification? All the process controlling the writeout (which is pretty high up in the stack) needs to know is if we triggered the check error by altering the page while it was in flight. I agree that a block based retry would close all the holes ... it just doesn't look elegant to me that the fs will already be repeating the I/O if it changed the page and so will block. James James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Chris Mason on 1 Jun 2010 10:00 On Tue, Jun 01, 2010 at 08:40:37AM -0500, James Bottomley wrote: > On Tue, 2010-06-01 at 09:33 -0400, Chris Mason wrote: > > On Tue, Jun 01, 2010 at 01:27:56PM +0000, James Bottomley wrote: > > > On Tue, 2010-06-01 at 12:30 +0200, Christof Schmitt wrote: > > > > What is the best strategy to continue with the invalid guard tags on > > > > write requests? Should this be fixed in the filesystems? > > > > > > For write requests, as long as the page dirty bit is still set, it's > > > safe to drop the request, since it's already going to be repeated. What > > > we probably want is an error code we can return that the layer that sees > > > both the request and the page flags can make the call. > > > > I'm afraid this isn't entirely true. The FS tends to do this: > > > > change the page > > <---------> truck sized race right here where the page is clean > > mark the page dirty > > Would it be too much work in the fs to mark the page dirty before you > begin altering it (and again after you finish, just in case some cleaner > noticed and initiated a write)? Or some other flag that indicates page > under modification? All the process controlling the writeout (which is > pretty high up in the stack) needs to know is if we triggered the check > error by altering the page while it was in flight. I expect that once we went down that path we would end up waiting for the IO to finish before changing the page. Maybe there is a less complex way, but I sure didn't see it. > > I agree that a block based retry would close all the holes ... it just > doesn't look elegant to me that the fs will already be repeating the I/O > if it changed the page and so will block. We might not ever repeat the IO. We might change the page, write it, change it again, truncate the file and toss the page completely. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on 1 Jun 2010 10:00
>>>>> "James" == James Bottomley <James.Bottomley(a)suse.de> writes: James> Would it be too much work in the fs to mark the page dirty before James> you begin altering it (and again after you finish, just in case James> some cleaner noticed and initiated a write)? Or some other flag James> that indicates page under modification? All the process James> controlling the writeout (which is pretty high up in the stack) James> needs to know is if we triggered the check error by altering the James> page while it was in flight. James> I agree that a block based retry would close all the holes ... it James> just doesn't look elegant to me that the fs will already be James> repeating the I/O if it changed the page and so will block. I experimented with this approach a while back. However, I quickly got into a situation where frequently updated blocks never made it to disk because the page was constantly being updated. And all writes failed with a guard tag error. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |