Prev: KVM, Fix QEMU-KVM is killed by guest SRAO MCE (resend)
Next: drivers/watchdog: Eliminate a NULL pointer dereference
From: Nick Piggin on 3 Jun 2010 23:20 On Thu, Jun 03, 2010 at 09:46:02PM -0400, Martin K. Petersen wrote: > >>>>> "Nick" == Nick Piggin <npiggin(a)suse.de> writes: > > Nick> Also I don't think we can deal with memory errors and scribbles > Nick> just by crcing dirty data. The calculations generating the data > Nick> could get corrupted. > > Yep, the goal is to make the window as small as possible. > > > Nick> Data can be corrupted on its way back from the device to > Nick> userspace. > > We also get a CRC back from the storage. So the (integrity-aware) > application is also able to check on read. Well that's nice :) > Nick> Obviously this feature is being pushed by databases and such that > Nick> really want to pass checksums all the way from userspace. Block > Nick> retrying is _not_ needed or wanted here of course. > > Nope. The integrity error is bubbled all the way up to the database and > we can decide to retry, recreate or error out depending on what we find > when we do validation checks on the data buffer and the integrity > metadata. By block retrying, I just meant the bounce / re-checksum approach. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Jan Kara on 4 Jun 2010 11:40 On Fri 04-06-10 12:02:43, Dave Chinner wrote: > On Thu, Jun 03, 2010 at 11:46:34AM -0400, Chris Mason wrote: > > On Wed, Jun 02, 2010 at 11:41:21PM +1000, Nick Piggin wrote: > > > Closing the while it is dirty, while it is being written back window > > > still leaves a pretty big window. Also, how do you handle mmap writes? > > > Write protect and checksum the destination page after every store? Or > > > leave some window between when the pagecache is dirtied and when it is > > > written back? So I don't know whether it's worth putting a lot of effort > > > into this case. > > > > So, changing gears to how do we protect filesystem page cache pages > > instead of the generic idea of dif/dix, btrfs crcs just before writing, > > which does leave a pretty big window for the page to get corrupted. > > The storage layer shouldn't care or know about that though, we hand it a > > crc and it makes sure data matching that crc goes to the media. > > I think the only way to get accurate CRCs is to stop modifications > from occurring while the page is under writeback. i.e. when a page > transitions from dirty to writeback we need to unmap any writable > mappings on the page, and then any new modifications (either by the > write() path or through ->fault) need to block waiting for > page writeback to complete before they can proceed... Actually, we already write-protect the page in clear_page_dirty_for_io so the first part already happens. Any filesystem can do wait_on_page_writeback() in its ->page_mkwrite function so even the second part shouldn't be hard. I'm just a bit worried about the performance implications / hidden deadlocks... Also we'd have to wait_on_page_writeback() in ->write_begin function to protect against ordinary writes but that's the easy part... Honza -- Jan Kara <jack(a)suse.cz> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on 7 Jun 2010 12:30 >>>>> "Dave" == Dave Chinner <david(a)fromorbit.com> writes: >> Didn't you use to wait_on_page_writeback() in page_mkwrite()? Dave> The generic implementation of ->page_mkwrite Dave> (block_page_mkwrite()) which XFS uses has never had a Dave> wait_on_page_writeback() call in it. There's no call in the Dave> generic write paths, either, hence my comment that only direct IO Dave> on XFS will work. I guess that wait_on_page_writeback() was something I added when I used XFS for DIF testing. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Boaz Harrosh on 7 Jun 2010 13:30 On 06/07/2010 07:20 PM, Martin K. Petersen wrote: >>>>>> "Dave" == Dave Chinner <david(a)fromorbit.com> writes: > >>> Didn't you use to wait_on_page_writeback() in page_mkwrite()? > > Dave> The generic implementation of ->page_mkwrite > Dave> (block_page_mkwrite()) which XFS uses has never had a > Dave> wait_on_page_writeback() call in it. There's no call in the > Dave> generic write paths, either, hence my comment that only direct IO > Dave> on XFS will work. > > I guess that wait_on_page_writeback() was something I added when I used > XFS for DIF testing. > Do you remember some performance numbers that show degradation / sameness? What type of work loads? Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
From: Martin K. Petersen on 7 Jun 2010 13:50
>>>>> "Boaz" == Boaz Harrosh <bharrosh(a)panasas.com> writes: Boaz> Do you remember some performance numbers that show degradation / Boaz> sameness? Boaz> What type of work loads? I haven't been using XFS much for over a year. I'm using an internal async I/O tool and btrfs for most of my DIX/DIF testing these days. But my original changes were along the lines of what Jan mentioned earlier (hooking into page_mkwrite and waiting for writeback. I could have sworn that I only did it for ext[23] and that XFS waited out of the box but git proves me wrong). Anyway, I'll try to get some benchmarking happening later this week. This won't fix things completely, though. ext2fs, for instance, frequently changes metadata buffers in flight so it trips the guard check in no time. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo(a)vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ |